Recent

Author Topic: Natural string comparison?  (Read 31701 times)

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #30 on: May 05, 2014, 02:23:10 am »
Well you are right I missed that sorry. Here are two more screen shots case sensitive and insensitive. I'm guessing that the windows is always one of those, making sure that everybody gets a fair assessment.
Without going into details your code is better regardless of the speed. But if you really want to see speed difference, just adding const in front of the parameters doubled the speed (that's for case sensitive) because it eliminates string reference related code added by the compiler.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #31 on: May 05, 2014, 03:05:39 am »
you mean something like this
Code: [Select]
function AnsiNaturalCompare(const str1, str2: string; const vCaseSensitive: boolean = False): integer;
// by engkin
var
  l1, l2: integer; //Str length
  n1, n2: integer; //numrical part
  i1, i2: integer; //index in Str
  d: integer;
  vStr1, vStr2 : string;
begin
  if not vCaseSensitive then
  begin
    vstr1 := UpperCase(str1);
    vstr2 := UpperCase(str2);
  end else begin
    vstr1 := str1;
    vstr2 := str2;
  end;
? If not do post what changes you want and I'll test it again.

Without going into details your code is better regardless of the speed.
Well I do try to understand first and for most, everything else follows. So I'm going to do detail profiling on each piece of code to find out what is going on behind the scene and see if I can gain a bit more speed although I assume that most of the speed is lost on ifs and it will be very difficult to make it faster.

 In any case I wrote it because I couldn't take it out of my mind not to create a candidate for FPC which I don't mind either way. If it is included grate if it is not grate also. I'm in search of understanding not glory ;)
« Last Edit: May 05, 2014, 03:14:18 am by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #32 on: May 05, 2014, 03:36:27 am »
Well fpc2.6.2 32bits seems not so impressed by the change. :D Which means that I either did not understand you or you use a newer version with better optimization on const params.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #33 on: May 05, 2014, 03:55:46 am »
Well fpc2.6.2 32bits seems not so impressed by the change. :D Which means that I either did not understand you or you use a newer version with better optimization on const params.
Here you go. Notice that 3rd parameter is useless here:
Code: [Select]
function AnsiNaturalCompare_CaseSensitive(const str1, str2: string; vCaseSensitive: boolean = False): integer;
var
  l1, l2: integer; //Str length
  n1, n2: integer; //numrical part
  i1, i2: integer; //index in Str
  d: integer;
begin
  l1 := Length(str1);
  l2 := Length(str2);

  if (l1 = 0) and (l2 = 0) then exit(0);
  if (l1 = 0) then exit(-1);
  if (l2 = 0) then exit(1);

  if not vCaseSensitive then
  begin
{    str1 := UpperCase(str1);
    str2 := UpperCase(str2);//}
  end;

  i1 := 1;
  i2 := 1;
  while i1 <= l1 do
  begin
    //Compare non-numbers
    d := Ord(str1[i1]) - Ord(str2[i2]);
    if not (str1[i1] in ['0'..'9']) then
    begin
      if (d <> 0) then
      begin
        Result := d;
        exit;
      end;
    end
    else
    begin
      //Convert a section of str1 to a number
      n1 := 0;
      repeat
        n1 := 10 * n1 + Ord(str1[i1]) - Ord('0');
        Inc(i1);
      until (i1 > l1) or not (str1[i1] in ['0'..'9']);

      //Convert a section of str2 to a number
      n2 := 0;
      repeat
        n2 := 10 * n2 + Ord(str2[i2]) - Ord('0');
        Inc(i2);
      until (i2 > l2) or not (str2[i2] in ['0'..'9']);

      //Compare numbers naturally
      d := n1 - n2;
      if d <> 0 then
      begin
        Result := d;
        exit;
      end
      else
        Continue;
    end;
    Inc(i1);
    Inc(i2);
  end;
  Result := (i1 - l1) - (i2 - l2);
end;

What really happens is visible in the assembly file. Add -al in your Other Options to keep the generated assembly files which has .s extension.
« Last Edit: May 05, 2014, 03:59:51 am by engkin »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #34 on: May 05, 2014, 06:07:22 am »
Congratulations for the speed.

Juha Manninen should take a look at this and include it on LazUtils.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #35 on: May 05, 2014, 06:59:35 am »
If speed matters then I would suggest replacing UpperCase/LowerCase with some equivalent code directly inside the procedure:
Code: [Select]
//deleted

Edit:
Wrong code!
« Last Edit: May 05, 2014, 08:10:53 am by engkin »

BeniBela

  • Hero Member
  • *****
  • Posts: 928
    • homepage
Re: Natural string comparison?
« Reply #36 on: May 05, 2014, 10:18:07 am »

Am I right that strCompareClever judges numbers based on their length not value. Which results in 0001 bigger than 20 for instance:


No, because I have changed that yesterday...

Besides, I do not expect leading zeros in natural text

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #37 on: May 05, 2014, 10:37:06 am »
No, because I have changed that yesterday...

Besides, I do not expect leading zeros in natural text
I had seen scanners that add leading zeros, two IIRC.
Quote
Filename-001.jpg
Filename-002.jpg
...
Filename-027.jpg

And after bulk modification we get another set of files in the same folder:
Quote
Filename-1.jpg
Filename-2.jpg
...
Filename-27.jpg

Naturally Windows would sort them like:
Quote
Filename-001.jpg
Filename-1.jpg
Filename-002.jpg
Filename-2.jpg
...
Filename-027.jpg
Filename-27.jpg

If you are using Windows you can try it yourself.

Edit:
By the way, I'm impressed by your InternetTools. I learned about it from this link
« Last Edit: May 05, 2014, 10:53:14 am by engkin »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #38 on: May 05, 2014, 11:04:41 am »
Well fpc2.6.2 32bits seems not so impressed by the change. :D Which means that I either did not understand you or you use a newer version with better optimization on const params.
Here you go. Notice that 3rd parameter is useless here:
<snip...>
Attached a 90M iteration run for const vs byval. Impressive to say at least.
What really happens is visible in the assembly file. Add -al in your Other Options to keep the generated assembly files which has .s extension.

Thanks, appreciate the info I'll take a look on it but my assembly knowledge is zero. The only thing I can see is a bunch of letters and numbers in weird shapes.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #39 on: May 05, 2014, 05:15:25 pm »
According to my tests, AnsiNaturalCompare and EvsCompareNatural_A give different results to the following list of strings:

pic1
pic1a2
pic10
pic2
pic20
pic1a3
pic3

EvsCompareNatural_A result:
pic1
pic1a2
pic1a3
pic2
pic3
pic10
pic20

AnsiNaturalCompare result:
pic1a2
pic1a3
pic1
pic2
pic3
pic10
pic20

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #40 on: May 06, 2014, 12:58:39 am »
My procedure works as designed.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Natural string comparison?
« Reply #41 on: May 06, 2014, 01:08:22 am »
Yes.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #42 on: May 06, 2014, 05:13:45 am »
You might want to consider this one.

Code: [Select]
function AnsiNaturalCompare_v3_3(const str1, str2: string; vCaseSensitive: boolean = False): integer;
//v3.3
var
  l1, l2, l: integer;    //Str length
  n1, n2: int64;         //numrical part
  nl: integer;
  nl1, nl2: integer;     //length of numrical part
  i: integer;
  d: int64;
  pc,pt : PChar;         //to change case of const params
  pc1, pc2: PChar;
  pb1: PByte absolute pc1;
  pb2: PByte absolute pc2;
  pe1, pe2: PChar;       //pointer to end of str
  sign: integer;

  function lowcase(const c : char) : byte;
  begin
    if (c in ['A'..'Z']) then
      lowcase := byte(c)+32
    else
      lowcase := byte(c);
  end;

  function CorrectedResult(vRes: integer): integer;
  begin
    //to correct the result when we switch vars due for
    //pc1, pe1 need to point at shorter string, always
    Result := sign * vRes;
  end;

begin
  l1 := Length(str1);
  l2 := Length(str2);

  //Any empty str?
  if (l1 = 0) and (l2 = 0) then exit(0);
  if (l1 = 0) then exit(-1);
  if (l2 = 0) then exit(1);

  //pc1, pe1 point at the shorter string, always
  if l1<=l2 then
  begin
    pc1 := @str1[1];
    pc2 := @str2[1];

    sign := 1;
  end
  else
  begin
    pc1 := @str2[1];
    pc2 := @str1[1];

    l := l1;
    l1 := l2;
    l2 := l;

    sign := -1;
  end;

  //end of strs
  pe1 := pc1 + l1;
  pe2 := pc2 + l2;

  nl1 := 0;
  nl2 := 0;

  while (pc1 < pe1) do
  begin
    if not (pc1^ in ['0'..'9']) or not (pc2^ in ['0'..'9']) then
    begin
      //Compare non-numbers
      if vCaseSensitive then
        d := pb1^ - pb2^
      else
        d := lowcase(pc1^) - lowcase(pc2^);//}

      if (d <> 0) then exit(CorrectedResult(d));
    end
    else
    begin
      //Convert a section of str1 to a number (correct for 16 digits)
      n1 := 0; nl1 := 0;
      repeat
        n1 := (n1 shl 4) or (pb1^ - Ord('0'));
        Inc(pb1); inc(nl1);
      until (pc1 >= pe1) or not (pc1^ in ['0'..'9']);

      //Convert a section of str2 to a number (correct for 16 digits)
      n2 := 0; nl2 := 0;
      repeat
        n2 := (n2 shl 4) or (pb2^ - Ord('0'));
        Inc(pb2); inc(nl2);
      until (pc2 >= pe2) or not (pc2^ in ['0'..'9']);

      //Compare numbers naturally
      d := n1 - n2;
      if d <> 0 then
         exit(CorrectedResult(d))
      else
      begin
        //Switch to shortest string based on remaining characters
        if (pe1 - pc1) > (pe2 - pc2) then
        begin
          pc := pc1;
          pc1 := pc2;
          pc2 := pc;

          pc := pe1;
          pe1 := pe2;
          pe2 := pc;

          nl := nl1;
          nl1 := nl2;
          nl2 := nl;

          sign := -sign;
        end;
        Continue;
      end;
    end;
    Inc(pc1);
    Inc(pc2);
  end;
  //str with longer remaining part is bigger (abc1z>abc1)
  //Result := CorrectedResult((pe1 - pc1) - (pe2 - pc2));
  Result := (pe1 - pc1) - (pe2 - pc2);
  if Result=0 then
  //strings are naturllay identical then:
  //consider str with longer last numerical section to be bigger (a01bc0001>a001bc1)
    Result := CorrectedResult(nl1-nl2)
  else
    Result := CorrectedResult(Result);
end;

I tried to make it consistent when it compares. So the following items:
Quote
z0000.doc
z00.doc
z0000.doC
z000.doc

Would be sorted  more or less like:
Quote
z00.doc
z000.doc
z0000.doc
z0000.doC
Last two items might not be in that order though.

Your tests and criticism are appreciated  :P
« Last Edit: May 06, 2014, 05:17:08 am by engkin »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Natural string comparison?
« Reply #43 on: May 06, 2014, 07:33:33 am »
on case sensitive compare they should always be on that order, on case insensitive its a coin flip based on the sort function used and if it is stable or not. This might be of help just add the new cases Typo mentioned.

Code: [Select]
unit TestNaturalCompare;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, fpcunit, testutils, testregistry;

type

  TNaturalCompareTest = class(TTestCase)
  published
    procedure FirstTest;
    procedure SecondTest;
    procedure ThirdTest;
    procedure FourthTest;
    procedure FifthTest;
    procedure SixthTest;
    procedure SeventhTest;
    procedure EighthTest;
    procedure NinthTest;
  end;

implementation
uses math;
const
  cItems : array[1..23] of string = ({01}'Callisto Morphamax',
                                     {02}'Callisto Morphamax 500',
                                     {03}'Callisto Morphamax 5000',
                                     {04}'Callisto Morphamax 600',
                                     {05}'1000X Radonius Maximus',
                                     {06}'10X Radonius',
                                     {07}'200X Radonius',
                                     {08}'20X Radonius',
                                     {09}'Allegia 50 Clasteron',
                                     {10}'Allegia 500 Clasteron',
                                     {11}'Allegia 50B Clasteron',
                                     {12}'Allegia 51 Clasteron',
                                     {13}'Allegia 6R Clasteron',
                                     {14}'z20.doc',
                                     {15}'z100.doc',
                                     {16}'z10.doc',
                                     {17}'z50.doc',
                                     {18}'1000X Radonius Maximus',
                                     {19}'10X Radonius',
                                     {20}'200X Radonius',
                                     {21}'0005',
                                     {22}'05',
                                     {23}'005 ');

function EvsCompareNatural_A(const S1, S2 :AnsiString; aCaseSensitive : Boolean = True):integer;
var
  vChr1, vChr2 : PChar;
  vCmp1, vCmp2 : string;

  function Sign(aNo:integer):integer; inline;
  begin
    Result := 0;
    if aNo > 0 then Result := 1
    else if aNo < 0 then Result := -1;
  end;

  function NumCharLen(const aStart:PAnsiChar):Integer;
  var
    vEnd  : PAnsiChar;
    vNext : PAnsiChar;
  begin
    vEnd  := aStart;
    vNext := aStart;
    repeat
      inc(vNext);
    until (vNext^ = #0) or (not (vNext^ in ['0'..'9']));
    Result := vNext-aStart;
  end;

  function CompToChar(var aStr:PAnsiChar; const aChar:Char; aCount:Integer):Integer;inline;
  begin            // compares the next aCount characters of aStr with aChar and returns 0 if they are all the same
    Result := 0;   // or <>0 if they differ. It is used to avoid padding a string with zeros.
    repeat
      Result := sign(Ord(aStr^) - ord(aChar));
      Dec(aCount); Inc(aStr);
    until (Result <> 0) or (aStr^=#0) or (aCount = 0);
  end; //when checking numeric characters[0..9] against zero it always returns a positive number.

  function NumComp:Integer;
  var
    vNl : Integer;
  begin
    Result := -2;
    vNl := NumCharLen(vChr1) - NumCharLen(vChr2);
    if vNl < 0 then Result := CompToChar(vChr2, '0', abs(vNl))
    else if vNl > 0 then Result := CompToChar(vChr1, '0', vNl);
    if (Result > 0) then begin
      Result := Sign(vNl);
      Exit;
    end;
    repeat
      Result := sign(ord(vChr1^) - ord(vChr2^));
      inc(vChr1); inc(vChr2);
    until ((vChr1^=#0) or (vChr2^=#0))   //end of string has been reached
       or (Result <> 0)                  //conclusion has been reached
       or (not (vChr1^ in ['0'..'9']))   //numeric characters end here
       or (not (vChr2^ in ['0'..'9']));  //numeric characters end here

    if Result = 0 then begin
      if vChr1^ in ['0'..'9'] then Result := 1
      else if vChr2^ in ['0'..'9'] then Result := -1;
    end;
  end;

begin
  //s1<s2 = -1, S1=S2 =0, S1>S2 = 1;
  if aCaseSensitive then begin
    vChr1 := @S1[1]; vChr2 := @S2[1]
  end else begin
    vCmp1 := LowerCase(S1); vCmp2 := LowerCase(S2);
    vChr1 := @vCmp1[1];     vChr2 := @vCmp2[1];
  end;

  repeat
    if (vChr1^ in ['0'..'9']) and (vChr2^ in ['0'..'9']) then
      Result := NumComp // it exits ready in the next position
    else begin
      Result := Sign(ord(vChr1^)- ord(vChr2^));
      inc(vChr1); inc(vChr2);
    end;
  until (vChr1^=#0) or (vChr2^=#0) or (Result <> 0);
  if (Result = 0) then Result := Sign(ord(vChr1^) - Ord(vChr2^));
end;

procedure TNaturalCompareTest.FirstTest;
var
  vTmp1,vTmp2 : Integer;
begin
  vTmp1 := EvsCompareNatural_A(cItems[1],cItems[2]);
  vTmp2 := Sign(CompareStr(cItems[1], cItems[2]));
  Check( vTmp1 = vTmp2,format('%s = %s; Expected %D returned %D',[cItems[1], cItems[2], vTmp2,vTmp1]));
end;

procedure TNaturalCompareTest.SecondTest;
var
  vRes  : Integer;
  vRes1 : Integer;
begin
  vRes  := EvsCompareNatural_A(cItems[2],cItems[3]);
  vRes1 := CompareStr(cItems[2],cItems[3]);
  Check(vRes = vRes1, format('%s = %s; Expected %D returned %D',[cItems[2], cItems[3], -1, vRes]));
end;

procedure TNaturalCompareTest.ThirdTest;
var
  vRes : Integer;
  vRes1: Integer;
begin
  vRes := EvsCompareNatural_A(cItems[3],cItems[4]);
  vRes1 := CompareStr(cItems[3],cItems[4]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[3], cItems[4], 1, vRes]));
end;

procedure TNaturalCompareTest.FourthTest;
var
  vTmp1,vTmp2 : Integer;
begin
  vTmp1 := EvsCompareNatural_A(cItems[1],cItems[9]);
  vTmp2 := Sign(CompareStr(cItems[1], cItems[9]));
  Check( vTmp1 = vTmp2,format('%s = %s; Expected %D returned %D',[cItems[1], cItems[9], vTmp2,vTmp1]));
end;

procedure TNaturalCompareTest.FifthTest;
var
  vRes : Integer;
begin
  vRes := EvsCompareNatural_A(cItems[10],cItems[11]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[10], cItems[11], 1, vRes]));
end;

procedure TNaturalCompareTest.SixthTest;
var
  vRes : Integer;
begin
  vRes := EvsCompareNatural_A(cItems[19],cItems[20]);
  Check(vRes = -1, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
end;

procedure TNaturalCompareTest.SeventhTest;
var
  vRes : Integer;
begin
  vRes := EvsCompareNatural_A(cItems[09], cItems[13]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
  vRes := EvsCompareNatural_A(cItems[10], cItems[13]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
  vRes := EvsCompareNatural_A(cItems[11], cItems[13]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
  vRes := EvsCompareNatural_A(cItems[12], cItems[13]);
  Check(vRes = 1, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
end;

procedure TNaturalCompareTest.EighthTest;
var
  vRes : Integer;
begin
  vRes := EvsCompareNatural_A(cItems[21], cItems[22]);
  Check(vRes = 0, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
end;

procedure TNaturalCompareTest.NinthTest;
var
  vRes : Integer;
begin
  vRes := EvsCompareNatural_A(cItems[21], cItems[22]);
  Check(vRes = 0, format('%s = %s; Expected %D returned %D',[cItems[19], cItems[20], -1, vRes]));
  vRes := EvsCompareNatural_A(cItems[21], cItems[23]);
  Check(vRes = -1, format('%s = %s; Expected %D returned %D',[cItems[21], cItems[23], -1, vRes]));
end;


initialization
  RegisterTest(TNaturalCompareTest);
end.

« Last Edit: May 06, 2014, 07:38:07 am by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Natural string comparison?
« Reply #44 on: May 06, 2014, 08:29:44 am »
@Taazz, You are right, I better double check and make sure it works that way on case insensitive only. Thanks for the test unit.

 

TinyPortal © 2005-2018