CompareText improvement

Martin_fr

Administrator
Hero Member
Posts: 9908
Debugger - SynEdit - and more

Re: CompareText improvement

« Reply #30 on: March 31, 2023, 06:05:19 pm »

Quote from: BeniBela on March 31, 2023, 05:13:58 pm

I had a case where like half of the time was spent in case-insensitive text comparison. But it was doing a case-insensitive Pos. For just 10 kilobytes of text, it would call the text comparison 10000 times. It became much faster when I changed it to compare the first character before calling the full comparison function

Could you not have lowercased the entire string once, perform the search/pos and take the results on the mixed case string?

Of course that does not work for Unicode where a lowercase char may have a different byte-len than its upper char. But since the "dec(c1, 32);" only works for English anyway...

Maybe even more clever (again leaving out stuff like normalization), get the "lookup char" (for wich you want the pos in the long string) in both: upper and lower (that can still be done via true upper/lower handling all languages). Then compare each pos in the string against the 2 lookup chars. That should save almost all upper/lower conversions.

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

BeniBela

Hero Member
Posts: 908

Re: CompareText improvement

« Reply #31 on: April 01, 2023, 01:25:35 am »

Quote from: Martin_fr on March 31, 2023, 06:05:19 pm

Could you not have lowercased the entire string once, perform the search/pos and take the results on the mixed case string?

But string allocations are also slow

And it would need to lower case both strings.

Quote from: Martin_fr on March 31, 2023, 06:05:19 pm

Maybe even more clever (again leaving out stuff like normalization), get the "lookup char" (for wich you want the pos in the long string) in both: upper and lower (that can still be done via true upper/lower handling all languages). Then compare each pos in the string against the 2 lookup chars. That should save almost all upper/lower conversions.

That is what I did. Which is why checking the first char of the searched string worked so well.

Logged

https://www.benibela.de/index_en.html
https://github.com/benibela

BeniBela

Hero Member
Posts: 908

Re: CompareText improvement

« Reply #32 on: April 04, 2023, 05:54:04 pm »

This is generally fast:

Code: Pascal [Select][+]

function CompareTextFast(const S1, S2: string): Integer; overload;
 
var
  i, count, count1, count2, AlignedCount, CharByCharCount: sizeint;
  Chr1, Chr2: byte;
  P1, P2: PChar;
  Block1, Block2: qword;
begin
  Count1 := Length(S1);
  Count2 := Length(S2);
  if (Count1>Count2) then
    Count := Count2
  else
    Count := Count1;
  i := 0;
  if count>0 then
    begin
      AlignedCount := count - count and 7;
      P1 := @S1[1];
      P2 := @S2[1];
      while i < Count do
        begin
          while i < AlignedCount do begin
            Block1 := unaligned(PQWord(P1)^);
            Block2 := unaligned(PQWord(P2)^);
            if Block1 = Block2 then begin
              inc(p1, 8);
              inc(p2, 8);
              inc(i, 8);
            end else break;
          end;
          CharByCharCount := i + 512;
          if CharByCharCount > count then
            CharByCharCount := count;
 
          while i < CharByCharCount do begin
            Chr1 := byte(p1^);
            Chr2 := byte(p2^);
            if Chr1 <> Chr2 then
              begin
                if Chr1 in [97..122] then
                  dec(Chr1,32);
                if Chr2 in [97..122] then
                  dec(Chr2,32);
                if Chr1 <> Chr2 then
                  Break;
              end;
            Inc(P1); Inc(P2); Inc(I);
          end;
        end;
    end;
  if i < Count then
    result := Chr1-Chr2
  else
    // CAPSIZEINT is no-op if Sizeof(Sizeint)<=SizeOF(Integer)
    result:=(Count1-Count2);
end;
  

Even faster than CompareByte.

Logged

https://www.benibela.de/index_en.html
https://github.com/benibela

Lazarus

Bookstore

Search

Recent

Author Topic: CompareText improvement (Read 4656 times)

Martin_fr

Re: CompareText improvement

BeniBela

Re: CompareText improvement

BeniBela

Re: CompareText improvement

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook