Forum > General

Issue with TStringList implementation when Sorted and dupIgnore/dupAccept

(1/1)

jmm72:
Greetings. First post here.

I'm polishing my Object Pascal skills after almost nine years of leaving Delphi. Lazarus/FPC is such a great combination for someone who likes Object Pascal like me. I'm doing a personal pet project, nothing serious, but I'm hitting something that Google doesn't answer (or that I can't find the right keyword combination). I believe that either I'm looking at this so wrongly, or I've found a problem in the documentation or the specific FPC implementation. If it's the former case, please ignore this and excuse me. In the latter case, maybe this can be useful.

I'm using a TStringList, it's sorted and with dupIgnore, and I add several words to it. What I want to do is to know at adding time if the word I'm adding was inserted, and where, or if the word was already in the list and it was ignored.

According to the documentation at http://lazarus-ccr.sourceforge.net/docs/rtl/classes/tstringlist.add.html:

--- Quote ---If the list is sorted and the string S is already present in the list and TStringList.Duplicates is dupError then an EStringListError exception is raised. If Duplicates is set to dupIgnore then the return value is underfined.
--- End quote ---

The "undefined" part is what bugs me. TStringList.Add returns an integer. Let's look at the code of this method, in rtl\objpas\classes\stringl.inc:

--- Code: ---begin
  If Not Sorted then
    Result:=FCount
  else
    If Find (S,Result) then
      Case DUplicates of
        DupIgnore : Exit;
        DupError : Error(SDuplicateString,0)
      end;
   InsertItem (Result,S);
end;

--- End code ---

So with a sorted TStringList, and in both cases of dupIgnore and dupAccept, TStringList.Add always returns an integer in the range of 0 to Count - 1. So, as the documentation says, with dupIgnore, there's no string added as there's an early Exit; but contrary to what the documentation says, the return of the method *is* defined, and moreover, it's a valid value since it points to the place where the string was found or where the string was inserted. So just with the result of the method we can't know if the string was inserted, or if since it was a duplicate, it was ignored.

Consider this code:

--- Code: ---procedure Test;
var
  sl: TStringList;
  r: integer;

begin
  sl := TStringList.Create;
  sl.Sorted := true;
  sl.Duplicates := dupIgnore;
  sl.Add('a');
  sl.Add('b');
  sl.Add('c');
  r := sl.Add('a');
  if (r >= 0) or (r < sl.Count) then
    writeln('valid result')
  else
    writeln('invalid result');
end;

--- End code ---

The 'else' part would never be executed. There are workarounds. For example, dupError and using exceptions, or try to find the string beforehand. After all, it should be quick since the TStringList is sorted and a search should be quick, but why using two searches when the documentation says that with one (the one implemented within the .Add method) should suffice? Another option is comparing if the string at the result is the same as the string we tried to add. But the documentation says "undefined", so we shouldn't rely on the result of the method!

I don't know if it's an implementation error, or if the documentation should be corrected. If the implementation is what it should be (is its to mimic the Delphi implementation?), then the documentation should be corrected, to reflect the real behaviour. Either someone should do it, or someone should point me to the right direction to change the documentation myself, for the benefit of everyone using FPC. I say this because "the return value is undefined" is vague, ambiguous. I'm not nitpicking, this is documentation for programming, and should be specific. A sentence like "the return value shouldn't be used to check if the string was ignored or not" would describe the current behaviour much better.

Anyway I would rewrite the entire Description block like this:

--- Quote ---Add will add S to the list. If TStringList.Sorted is false, the string will be added to the end of the list, and the return value is the index of the last element of the list, where the string has been placed.

If TStringList.Sorted is true, and the string is not present in the list, then it will be inserted at its alphabetical position, and the return value will be the index of such position in the list. If the string is present in the list, the exact behaviour depends on the value of TStringList.Duplicates:
* If Duplicates is set to dupAccept,  then the string is still added, and the return value is where the string was inserted.
* If Duplicates is set to dupIgnore then the return value is where the string was found.
* If Duplicates is dupError then an EStringListError exception is raised.
--- End quote ---



If the implementation is wrong or incomplete, then the Exit; statement should be changed to something else. Maybe Exit(-1); and state so in the documentation (instead of the vague "undefined"). This way the .Add method is much more useful in the case of dupIgnore. Probably anyone in the FPC dev team will have a good suggestion as to what to change and how, if needed.

Maybe I'm missing some important piece and misunderstanding the whole situation. If so, please tell me so, because I want to learn and stop looking like an idiot  %)

Comments?

Regards, JMM.

typo:
TStringList is Delphi compatible, if any point of Lazarus TStringList differs from the Delphi one, please correct it.

Anyway, there is TDictionaryStringList in Lazarus, which can be used to deduplicate string lists without changing the sorting state. Delphi compatible TStringList only can ignore duplications for sorted string lists. TDictionaryStringList is also optimized for lookup.

http://wiki.lazarus.freepascal.org/LazUtils

jmm72:
I already fixed my code to deal with this limitation, wether accidental or intentional. Just adding a previous search before adding the string solves the issue, and I placed it in an inline function. But you can't use the result of the first search (even if it's blazingly fast) for the addition of the string, so you have to do a .

I just pointed out, in the most specific manner, a problem I see with either the documentation or the implementation of the real behaviour. I'd like the behaviour to be changed, but that would break Delphi compatibilty, which I guess would be an undesired effect.

What I see now (and didn't find earlier because I was stubborn about getting the real code and not the documentation) is how the Delphi TStringList implementation works. According to http://docwiki.embarcadero.com/Libraries/XE7/en/System.Classes.TStringList.Add, in case of a sorted list and dupIgnore, then the return value is the index of the existing entry.

So according to that, and not having looked at the exact code (I don't have any recent Delphi with the source), what is wrong is the documentation, which I insist is vague and ambiguous, and in light of what the Delphi documentation says, it is also wrong as the return value *is* defined.

I would change the documentation myself but I don't see how I could do it. If anyone with the powers reads this, please fix it.

Edit: Ah well, it was as easy as to fill in a report for the bugtracker with category "documentation" ;D Done.

Regards, JMM.

Navigation

[0] Message Index

Go to full version