Recent

Author Topic: Natural string comparison?  (Read 31700 times)

Blaazen

  • Hero Member
  • *****
  • Posts: 3241
  • POKE 54296,15
    • Eye-Candy Controls
Re: Natural string comparison?
« Reply #15 on: May 03, 2014, 11:20:39 am »
@ typo

But this one is WIndows-only. I meant something cross-platform what does really sort itself (like engkin's solution).
Lazarus 2.3.0 (rev main-2_3-2863...) FPC 3.3.1 x86_64-linux-qt Chakra, Qt 4.8.7/5.13.2, Plasma 5.17.3
Lazarus 1.8.2 r57369 FPC 3.0.4 i386-win32-win32/win64 Wine 3.21

Try Eye-Candy Controls: https://sourceforge.net/projects/eccontrols/files/

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #16 on: May 03, 2014, 03:51:16 pm »
Here it is.

Languages which have diacritics like the mine need to sort the words as if they don't exist. Here a cross-platform almost perfect sorting.

And why it is not perfect? Because according to this sorting a = á, so the sorting is made randomly. A perfect sorting, AFAICS, would put the word 'e' before the word 'é' all the times as it is made on language dictionaries.
« Last Edit: May 03, 2014, 05:06:52 pm by typo »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #17 on: May 03, 2014, 05:26:53 pm »
AnsiNaturalCompare does not handle empty strings. To make it suitable as a general procedure that needs to be changed. Maybe:
Code: [Select]
function AnsiNaturalCompare(str1, str2: string; vCaseSensitive: boolean = False): integer;
var
  l1, l2: integer; //Str length
  n1, n2: integer; //numrical part
  i1, i2: integer; //index in Str
  d: integer;
begin
  l1 := Length(str1);
  l2 := Length(str2);

//Handle empty strings
  if (l1 = 0) and (l2 = 0) then exit(0);
  if (l1 = 0) then exit(-1);
  if (l2 = 0) then exit(1);

  if not vCaseSensitive then
  begin
    str1 := UpperCase(str1);
    str2 := UpperCase(str2);
  end;

  i1 := 1;
  i2 := 1;
  while i1 <= l1 do
  begin
    //Compare non-numbers
    d := Ord(str1[i1]) - Ord(str2[i2]);
    if not (str1[i1] in ['0'..'9']) then
    begin
      if (d <> 0) then
      begin
        Result := d;
        exit;
      end;
    end
    else
    begin
      //Convert a section of str1 to a number
      n1 := 0;
      repeat
        n1 := 10 * n1 + Ord(str1[i1]) - Ord('0');
        Inc(i1);
      until (i1 > l1) or not (str1[i1] in ['0'..'9']);

      //Convert a section of str2 to a number
      n2 := 0;
      repeat
        n2 := 10 * n2 + Ord(str2[i2]) - Ord('0');
        Inc(i2);
      until (i2 > l2) or not (str2[i2] in ['0'..'9']);

      //Compare numbers naturally
      d := n1 - n2;
      if d <> 0 then
      begin
        Result := d;
        exit;
      end
      else
        Continue;
    end;
    Inc(i1);
    Inc(i2);
  end;
  Result := (i1 - l1) - (i2 - l2);
end;

StrCmpLogicalW is slower when used with ANSI strings and I could not find an ANSI version for it.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #18 on: May 03, 2014, 08:45:46 pm »
Code: [Select]
function AlmostPerfectSortCompare(List: TStringList;  Index1, Index2: Integer): Integer;
var
  Str1, Str2 :string;
begin
  // almost perfect sort compare
  Str1 := RemoveDiacritics(List[Index1]);
  Str2 := RemoveDiacritics(List[Index2]);
  {$IFDEF WINDOWS}
  Result := StrCmpLogical(Str1, Str2);
  {$ELSE}
  Result := AnsiNaturalCompare(Str1, Str2);
  {$ENDIF}

  // virtually perfect sorting
  // places unsigned words (without diacritics) before the signed ones (with diacritics) when equal
  if Str1 = Str2 then
    if List[Index1] <> Str1 then
      Result := 1
    else if List[Index2] <> Str2 then
      Result := -1;
end;           

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #19 on: May 04, 2014, 08:19:54 pm »
Here is an implementation with out conversion to integer. Tried to make it without string to number conversion and with the minimum number of iterations.

Code: [Select]
function EvsCompareNatural_A(const S1, S2 :AnsiString; aCaseSensitive : Boolean = True):integer;
var
  vChr1, vChr2 : PChar;
  vCmp1, vCmp2 : string;

  function Sign(aNo:integer):integer; inline;
  begin
    Result := 0;
    if aNo > 0 then Result := 1
    else if aNo < 0 then Result := -1;
  end;

  function NumCharLen(const aStart:PAnsiChar):Integer;
  var
    vEnd  : PAnsiChar;
    vNext : PAnsiChar;
  begin
    vEnd  := aStart;
    vNext := aStart;
    repeat
      inc(vNext);
    until (vNext^ = #0) or (not (vNext^ in ['0'..'9']));
    Result := vNext-aStart;
  end;

  function CompToChar(var aStr:PAnsiChar; const aChar:Char; aCount:Integer):Integer;inline;
  begin            // compares the next aCount characters of aStr with aChar and returns 0 if they are all the same
    Result := 0;   // or <>0 if they differ. It is used to avoid padding a string with zeros.
    repeat
      Result := sign(Ord(aStr^) - ord(aChar));
      Dec(aCount); Inc(aStr);
    until (Result <> 0) or (aStr^=#0) or (aCount = 0);
  end; //when checking numeric characters[0..9] against zero it always returns a positive number.

  function NumComp:Integer;
  var
    vNl : Integer;
  begin
    Result := -2;
    vNl := NumCharLen(vChr1) - NumCharLen(vChr2);
    if vNl < 0 then Result := CompToChar(vChr2, '0', abs(vNl))
    else if vNl > 0 then Result := CompToChar(vChr1, '0', vNl);
    if (Result > 0) then begin
      Result := Sign(vNl);
      Exit;
    end;
    repeat
      Result := sign(ord(vChr1^) - ord(vChr2^));
      inc(vChr1); inc(vChr2);
    until ((vChr1^=#0) or (vChr2^=#0))   //end of string has been reached
       or (Result <> 0)                  //conclusion has been reached
       or (not (vChr1^ in ['0'..'9']))   //numeric characters end here
       or (not (vChr2^ in ['0'..'9']));  //numeric characters end here

    if Result = 0 then begin
      if vChr1^ in ['0'..'9'] then Result := 1
      else if vChr2^ in ['0'..'9'] then Result := -1;
    end;
  end;

begin
  //s1<s2 = -1, S1=S2 =0, S1>S2 = 1;
  if aCaseSensitive then begin
    vChr1 := @S1[1]; vChr2 := @S2[1]
  end else begin
    vCmp1 := LowerCase(S1); vCmp2 := LowerCase(S2);
    vChr1 := @vCmp1[1];     vChr2 := @vCmp2[1];
  end;

  repeat
    if (vChr1^ in ['0'..'9']) and (vChr2^ in ['0'..'9']) then
      Result := NumComp // it exits ready in the next position
    else begin
      Result := Sign(ord(vChr1^)- ord(vChr2^));
      inc(vChr1); inc(vChr2);
    end;
  until (vChr1^=#0) or (vChr2^=#0) or (Result <> 0);
  if (Result = 0) then Result := Sign(ord(vChr1^) - Ord(vChr2^));
end;

Tested it using typo's application on both the test cases on the site he linked http://www.davekoelle.com/alphanum.html. Haven't speed test all 3 of the solution posted here together yet, to see if all this effort made a difference, any way it is out of my system now. Keep in mind that it works on single byte strings. I might convert it to widestrings later but for now I'm done.
« Last Edit: May 04, 2014, 08:28:41 pm by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1263
Re: Natural string comparison?
« Reply #20 on: May 04, 2014, 08:59:13 pm »
Well impressed with the effort of everyone here.  Many thanks all.
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #21 on: May 04, 2014, 11:38:44 pm »
well I'm back. You missed me? pfffffft. I had a nasty bug in the function and engkin's comment about his routine not been able to handle empty strings slapped me in to reality, yes mine did too. Here it is corrected
Code: [Select]
function EvsCompareNatural_A(const S1, S2 :AnsiString; aCaseSensitive : Boolean = True):integer;
var
  vChr1, vChr2 : PChar;
  vCmp1, vCmp2 : string;

  function Sign(aNo:integer):integer; inline;
  begin
    Result := 0;
    if aNo > 0 then Result := 1
    else if aNo < 0 then Result := -1;
  end;

  function NumCharLen(const aStart:PAnsiChar):Integer;
  var
    vNext : PAnsiChar;
  begin
    vNext := aStart;
    repeat
      inc(vNext);
    until (vNext^ = #0) or (not (vNext^ in ['0'..'9']));
    Result := vNext-aStart;
  end;

  function CompToChar(var aStr:PAnsiChar; const aChar:Char; aCount:Integer):Integer;inline;
  begin            // compares the next aCount characters of aStr with aChar and returns 0 if they are all the same
    Result := 0;   // or <>0 if they differ. It is used to avoid padding a string with zeros.
    repeat
      Result := sign(Ord(aStr^) - ord(aChar));
      Dec(aCount); Inc(aStr);
    until (Result <> 0) or (aStr^=#0) or (aCount = 0);
  end; //when checking numeric characters[0..9] against zero it always returns a positive number.

  function NumComp:Integer;
  var
    vNl : Integer;
  begin
    Result := -2;
    vNl := NumCharLen(vChr1) - NumCharLen(vChr2);
    if vNl < 0 then Result := CompToChar(vChr2, '0', abs(vNl))
    else if vNl > 0 then Result := CompToChar(vChr1, '0', vNl);
    if (Result > 0) then begin
      Result := Sign(vNl);
      Exit;
    end;
    repeat
      Result := sign(ord(vChr1^) - ord(vChr2^));
      inc(vChr1); inc(vChr2);
    until ((vChr1^=#0) or (vChr2^=#0))   //end of string has been reached
       or (Result <> 0)                  //conclusion has been reached
       or (not (vChr1^ in ['0'..'9']))   //numeric characters end here
       or (not (vChr2^ in ['0'..'9']));  //numeric characters end here

    if Result = 0 then begin
      if vChr1^ in ['0'..'9'] then Result := 1
      else if vChr2^ in ['0'..'9'] then Result := -1;
    end;
  end;

begin
  //s1<s2 = -1, S1=S2 =0, S1>S2 = 1;
  if aCaseSensitive then begin
    vChr1 := @S1[1]; vChr2 := @S2[1]
  end else begin
    vCmp1 := LowerCase(S1); vCmp2 := LowerCase(S2);
    vChr1 := @vCmp1[1];     vChr2 := @vCmp2[1];
  end;

  repeat
    if (vChr1^ in ['0'..'9']) and (vChr2^ in ['0'..'9']) then
      Result := NumComp // it exits ready in the next position
    else begin
      Result := Sign(ord(vChr1^)- ord(vChr2^));
      if vChr1^ <> #0 then inc(vChr1);
      if vChr2^ <> #0 then inc(vChr2);
    end;
  until (vChr1^=#0) or (vChr2^=#0) or (Result <> 0);
  if (Result = 0) then Result := Sign(ord(vChr1^) - Ord(vChr2^));
end;

And on top of that I added a small speed test here is the routine I used
Code: [Select]
procedure TForm1.Button4Click(Sender :TObject);
var
  vEvs  :QWord;
  vTypo :QWord;
  vEng  :QWord;
  vRes  :Integer;
  vCntr :Integer;
  vStr2, vStr1 : string; //no access delays.
begin
  vStr2 := 'Callisto Morphamax 6000 SE'; vStr1 := 'Callisto Morphamax 6000 SE2';

  Screen.Cursor := crHourGlass;
  try
    vEvs := GetTickCount64;
    for vCntr := 0 to SpinEdit1.Value do
      vRes := EvsCompareNatural_A(vStr2, vStr1);
    vEvs := GetTickCount64 - vEvs;

    vTypo := GetTickCount64;
    for vCntr := 0 to SpinEdit1.Value do
      vRes := StrCmpLogical(vStr2, vStr1);
    vTypo := GetTickCount64 - vTypo;

    vEng := GetTickCount64;
    for vCntr := 0 to SpinEdit1.Value do
      vRes := AnsiNaturalCompare(vStr2, vStr1);
    vEng := GetTickCount64 - vEng;

    StatusBar1.Panels[0].Text := 'Evs : ' + IntToStr(vEvs);
    StatusBar1.Panels[1].Text := 'Typo : ' + IntToStr(vTypo);
    StatusBar1.Panels[2].Text := 'Engkin : ' + IntToStr(vEng);
  finally
    Screen.Cursor := crDefault;
  end;
end;         
and attached you will find the results for a 5,000,000 iteration loop, at the statusbar.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1263
Re: Natural string comparison?
« Reply #22 on: May 04, 2014, 11:41:19 pm »
And the winner is.....   both @Taazz and everyone who wants to use that code :-)  Nice work.

Quote
but for now I'm done.
I didn't believe you then anyway :-)
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #23 on: May 04, 2014, 11:55:21 pm »
you know trigger happy minds and last minute problems, its like trying to get a bone from a dog never gonna happen unless the dog eats the bone or the minds solve the problem. :D . But now ...... naa not twice in the same day so see you all later :P
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #24 on: May 05, 2014, 12:38:23 am »
@Taazz, your code is impressive as usual. I vote for your code :) for more than one reason.  Just a small note. Can you repeat the speed test with similar default values for case sensitivety:
Quote
function EvsCompareNatural_A(...aCaseSensitive : Boolean = True):integer;

Quote
function AnsiNaturalCompare(...vCaseSensitive: boolean = False): integer;

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #25 on: May 05, 2014, 12:43:35 am »
Well, StrCmpLogical is not mine, it is from Windows.

But even this is fast enough to make a 100,000 lines memo sorting completely feasible.
« Last Edit: May 05, 2014, 12:52:18 am by typo »

BeniBela

  • Hero Member
  • *****
  • Posts: 928
    • homepage
Re: Natural string comparison?
« Reply #26 on: May 05, 2014, 01:03:45 am »
Haven't speed test all 3 of the solution posted here together yet, to see if all this effort made a difference,

More than 3 solutions if you count functions on linked pages

I run your benchmark with my strCompareClever, and it was just as fast...


typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #27 on: May 05, 2014, 01:11:52 am »
So no reason to continue to use standard QuickSort (ASCII compare).
« Last Edit: May 05, 2014, 01:23:09 am by typo »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #28 on: May 05, 2014, 02:11:21 am »
@Taazz, your code is impressive as usual. I vote for your code :) for more than one reason.  Just a small note. Can you repeat the speed test with similar default values for case sensitivety:
Quote
function EvsCompareNatural_A(...aCaseSensitive : Boolean = True):integer;

Quote
function AnsiNaturalCompare(...vCaseSensitive: boolean = False): integer;

Well you are right I missed that sorry. Here are two more screen shots case sensitive and insensitive. I'm guessing that the windows is always one of those, making sure that everybody gets a fair assessment.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #29 on: May 05, 2014, 02:14:05 am »
I run your benchmark with my strCompareClever, and it was just as fast...
Am I right that strCompareClever judges numbers based on their length not value. Which results in 0001 bigger than 20 for instance:
Code: [Select]
function strCompareClever(const s1, s2: string): integer;
...
      while (i<=length(t1)) and (t1[i] in ['0'..'9']) do inc(i);
      while (j<=length(t2)) and (t2[j] in ['0'..'9']) do inc(j);
      if i-ib<j-jb then begin
        result:=-1; //find longer number
        exit;
      end;
      if i-ib>j-jb then begin
        result:=1;
        exit;
      end;

 

TinyPortal © 2005-2018