Recent

Author Topic: CompareText improvement  (Read 4412 times)

edgarrod71

  • Jr. Member
  • **
  • Posts: 68
Re: CompareText improvement
« Reply #15 on: March 25, 2023, 01:59:40 am »
You're right, I had in mind that.  Nevertheless the best way to test is restarting the machine and changing the order of functions before testing.  Nevertheless it gives me the same results.

domasz

  • Sr. Member
  • ****
  • Posts: 413
Re: CompareText improvement
« Reply #16 on: March 25, 2023, 10:50:08 am »
@domasz, looking above the lake, water seems to be delicious, nevertheless, going deep, I found that this calls all of these:

I kinda think we should have multiple different CompareText for different uses. If I want to compare strings eg. containing SHA-1 hashes then I don't need all these fancy code.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9754
  • Debugger - SynEdit - and more
    • wiki
Re: CompareText improvement
« Reply #17 on: March 25, 2023, 11:45:22 am »
I kinda think we should have multiple different CompareText for different uses. If I want to compare strings eg. containing SHA-1 hashes then I don't need all these fancy code.

Then you use CompareByte. It already exists.

And if you end up having so many calls to CompareWhateverText that speed get a noticeable issue, then maybe you should use an entirely different solution? Trees, Hashes, ...

domasz

  • Sr. Member
  • ****
  • Posts: 413
Re: CompareText improvement
« Reply #18 on: March 25, 2023, 12:39:36 pm »

Then you use CompareByte. It already exists.
Interesting, thanks! I don't know anything like this from my Delphi times.

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: CompareText improvement
« Reply #19 on: March 25, 2023, 01:14:59 pm »
Stupid questions get stupid answers.
It is highly platform dependent how to handle this.
Specialize a type, not a var.

BrunoK

  • Sr. Member
  • ****
  • Posts: 452
  • Retired programmer
Re: CompareText improvement
« Reply #20 on: March 26, 2023, 02:50:58 pm »
Hi, I'm always trying to improve the language I love, so I want to share a faster function for comparing strings, would you add it to sysstrh.inc? 
1 - I could not compile your code.
2 - After making a few changes to compile it the results are very mixed, with no evident gain of one of the functions.
3 - Your TextComp returns TRUE for equal or FALSE for different. It should return <0, 0 or >0 depending on the comparison. Conclusion, your function does not do what is expected.

Stefan Glienke

  • New Member
  • *
  • Posts: 23
Re: CompareText improvement
« Reply #21 on: March 27, 2023, 12:29:42 pm »
By the way, where is your attachment in toll lazarus version?
Sorry, I cannot parse that.

WRT CompareText if you want to knock out some speed there then compare 4 (32bit) or 8 (under 64bit) bytes at once.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: CompareText improvement
« Reply #22 on: March 27, 2023, 01:57:28 pm »

Then you use CompareByte. It already exists.
Interesting, thanks! I don't know anything like this from my Delphi times.

Some functions to do certain intrinsic operations were identified as nice to haves when FPC went multi architecture in the 2003-2005 timeframe. Using these in lowlevel and string routines cut back the need for assembler significantly without compromising performance significantly.

Things like compare a block of memory (compare* with *= byte/word/dword/qword,), scan for a byte/word/dword(index*) in a block of memory, fill a block of memory (fill*  ) etc.

edgarrod71

  • Jr. Member
  • **
  • Posts: 68
Re: CompareText improvement
« Reply #23 on: March 30, 2023, 06:04:03 pm »
Quote
This has nothing to do with ego, but with maintainability which is the most important factor in a project like this, even more important than performance! There will only be one CompareText function and that must allow for sort comparison or it's absolutely useless as a replacement. Any performance improvement must be secondary to that requirement.

This kind of thinking makes C++ programmers to stay there and not taking pascal as an option.  I love pascal, but I keep in mind that we must find better coding to make it faster.  Industry is seeking for performance, that's how everybody succeeds.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: CompareText improvement
« Reply #24 on: March 30, 2023, 09:16:55 pm »
This kind of thinking makes C++ programmers to stay there and not taking pascal as an option.  I love pascal, but I keep in mind that we must find better coding to make it faster.  Industry is seeking for performance, that's how everybody succeeds.

For most applications performance isn't that important. The main point of Pascal is ease of use and that you can easily create cross platform applications.

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: CompareText improvement
« Reply #25 on: March 31, 2023, 12:31:51 am »
An actually fast version would need to use SSE/AVX and be optimized for specific processor versions


Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: CompareText improvement
« Reply #26 on: March 31, 2023, 10:24:24 am »
And a sane answer would leave out any cpu reference,
Specialize a type, not a var.

jcmontherock

  • Full Member
  • ***
  • Posts: 234
Re: CompareText improvement
« Reply #27 on: March 31, 2023, 10:56:05 am »
Did you compare your version with all we already have ?
Code: Pascal  [Select][+][-]
  1. WideCompareStr(WideString(s1), WideString(s2));
  2. WideCompareText(WideString(s1), WideString(s2));
  3. AnsiStrLIComp(PChar(s1), PChar(s2), Length(s1));
  4. UTF8CompareText(s1, s2);
  5. UTF8CompareLatinTextFast(s1, s2);
  6. UTF8CompareStrCollated(s1, s2);
  7. ...
  8. and the windows version with 'shlwapi.dll'
« Last Edit: March 31, 2023, 10:57:41 am by jcmontherock »
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9754
  • Debugger - SynEdit - and more
    • wiki
Re: CompareText improvement
« Reply #28 on: March 31, 2023, 10:59:41 am »
The real question is, how much time does an application actually spent in text comparison? (And that is case insensitive in this case)
And how much of this time should actually be spent in Unicode normalized text comparison?

Take the IDE. By all probability the place that would be most affected is the text search in the editor. But that is more than fast enough. And you have to take into account, that even that is not spending all of its time is TextCompare. It has to look up each line (the text is not a continuos blob), store results, ...
If you do search on disk, it comes down to disk speed rather than search speed.
And after all, searching for text like that (if done case insensitive) should do Unicode normalization (which is missing). So it would have to use an even more complex processing. And hence not even gaining from the proposal.

There are other bits of code that currently call TextCompare. Not all of them should do that, or at least they should only do a small percentage of the calls that they do. But no one has bothered to do the real optimization on them.
Here the question is, if we decided they are to slow, should we give them a 1% or 2% uplift (which is what would very optimistically remain of the 10%, if we consider that CompareText only is a fraction of what they do), or should we change the logic the use, and gain an actual 2 digit percentage?

And any if we really were to provide some code that can compare "only English alphabet" case insensitive, why do we decide to only optimize for such a small part of the world? Then we should also have "only Chinese" and "only Arabic" and "only ...." versions (which all would be faster for the respective languages).

One example was comparing UUID. Well, I don't see why that needs a "English only" compare text. If I have plenty of UUID to compare, I make sure I store them all uppercase (I.e. convert them before storing), and then compare them binary. Which is even faster.
In fact, I would not even store them as text, I may consider them a base-36 number, convert them to a series of QWord, and have even less data to compare.

The same if I have to compare (maybe sort) large amount of text. Uppercase it once, then sort it. This will work for all languages, and speed up the work more than using an "English only" version. (assuming a large enough amount of text, but if the text is small then there is no need for speed up)

All that said, yes there can be very special use cases that would benefit. But they are not common cases. They don't require the RTL to provide them with specialized pre-written code. They can easily have there own function doing such a very specialized comparison. And if they do have their own code, they may be able to tweak it even more than any rtl provided code could be.
« Last Edit: March 31, 2023, 11:03:38 am by Martin_fr »

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: CompareText improvement
« Reply #29 on: March 31, 2023, 05:13:58 pm »
I had a case where like half of  the time was spent in case-insensitive text comparison. But it was doing a case-insensitive Pos. For just 10 kilobytes of text, it would call the text comparison 10000 times. It became much faster when I changed it to compare the first character before calling the full comparison function

This one is faster if the strings are completely equal:

Code: Pascal  [Select][+][-]
  1. function CompareTextIsEqual(p1, p2: pansichar; const l: SizeInt): boolean;
  2. var i: SizeInt = 0;
  3.     alignedlen: sizeint;
  4.     c1, c2:integer;
  5.     block1, block2: qword;
  6. begin
  7.   alignedlen := l - l and 7;
  8.   result := true;
  9.   while i < l do begin
  10.     while i < alignedlen do begin
  11.       block1 := unaligned(PQWord(p1)^);
  12.       block2 := unaligned(PQWord(p2)^);
  13.       if block1 = block2 then begin
  14.         inc(p1, 8);
  15.         inc(p2, 8);
  16.         inc(i, 8);
  17.       end else break;
  18.     end;
  19.     c1 := ord(p1^);
  20.     c2 := ord(p2^);
  21.     if c1 <> c2 then begin
  22.       if c1 in [97..122] then dec(c1, 32);
  23.       if c2 in [97..122] then dec(c2, 32);
  24.       if c1 <> c2 then begin result := false; exit; end;
  25.     end;
  26.     inc(p1); inc(p2); inc(i);
  27.   end;
  28. end;  

unfortunately. it is slower when the strings are case-sensitively unequal and case-insensitvely equal

 

TinyPortal © 2005-2018