Recent

Author Topic: CompareText improvement  (Read 2417 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 8892
  • Debugger - SynEdit - and more
    • wiki
Re: CompareText improvement
« Reply #30 on: March 31, 2023, 06:05:19 pm »
I had a case where like half of  the time was spent in case-insensitive text comparison. But it was doing a case-insensitive Pos. For just 10 kilobytes of text, it would call the text comparison 10000 times. It became much faster when I changed it to compare the first character before calling the full comparison function

Could you not have lowercased the entire string once, perform the search/pos and take the results on the mixed case string?

Of course that does not work for Unicode where a lowercase char may have a different byte-len than its upper char.  But since the "dec(c1, 32);" only works for English anyway...

Maybe even more clever (again leaving out stuff like normalization), get the "lookup char" (for wich you want the pos in the long string) in both: upper and lower (that can still be done via true upper/lower handling all languages). Then compare each pos in the string against the 2 lookup chars. That should save almost all upper/lower conversions.


BeniBela

  • Hero Member
  • *****
  • Posts: 870
    • homepage
Re: CompareText improvement
« Reply #31 on: April 01, 2023, 01:25:35 am »
Could you not have lowercased the entire string once, perform the search/pos and take the results on the mixed case string?

But string allocations are also slow

And it would need to lower case both strings.




Maybe even more clever (again leaving out stuff like normalization), get the "lookup char" (for wich you want the pos in the long string) in both: upper and lower (that can still be done via true upper/lower handling all languages). Then compare each pos in the string against the 2 lookup chars. That should save almost all upper/lower conversions.

That is what I did. Which is why checking the first char of the searched string worked so well.


BeniBela

  • Hero Member
  • *****
  • Posts: 870
    • homepage
Re: CompareText improvement
« Reply #32 on: April 04, 2023, 05:54:04 pm »
This is generally fast:

Code: Pascal  [Select][+][-]
  1. function CompareTextFast(const S1, S2: string): Integer; overload;
  2.  
  3. var
  4.   i, count, count1, count2, AlignedCount, CharByCharCount: sizeint;
  5.   Chr1, Chr2: byte;
  6.   P1, P2: PChar;
  7.   Block1, Block2: qword;
  8. begin
  9.   Count1 := Length(S1);
  10.   Count2 := Length(S2);
  11.   if (Count1>Count2) then
  12.     Count := Count2
  13.   else
  14.     Count := Count1;
  15.   i := 0;
  16.   if count>0 then
  17.     begin
  18.       AlignedCount := count - count and 7;
  19.       P1 := @S1[1];
  20.       P2 := @S2[1];
  21.       while i < Count do
  22.         begin
  23.           while i < AlignedCount do begin
  24.             Block1 := unaligned(PQWord(P1)^);
  25.             Block2 := unaligned(PQWord(P2)^);
  26.             if Block1 = Block2 then begin
  27.               inc(p1, 8);
  28.               inc(p2, 8);
  29.               inc(i, 8);
  30.             end else break;
  31.           end;
  32.           CharByCharCount := i + 512;
  33.           if CharByCharCount > count then
  34.             CharByCharCount := count;
  35.  
  36.           while i < CharByCharCount do begin
  37.             Chr1 := byte(p1^);
  38.             Chr2 := byte(p2^);
  39.             if Chr1 <> Chr2 then
  40.               begin
  41.                 if Chr1 in [97..122] then
  42.                   dec(Chr1,32);
  43.                 if Chr2 in [97..122] then
  44.                   dec(Chr2,32);
  45.                 if Chr1 <> Chr2 then
  46.                   Break;
  47.               end;
  48.             Inc(P1); Inc(P2); Inc(I);
  49.           end;
  50.         end;
  51.     end;
  52.   if i < Count then
  53.     result := Chr1-Chr2
  54.   else
  55.     // CAPSIZEINT is no-op if Sizeof(Sizeint)<=SizeOF(Integer)
  56.     result:=(Count1-Count2);
  57. end;
  58.  

Even faster than CompareByte.

 

TinyPortal © 2005-2018