Recent

Author Topic: Strange result of text sorting in TFPSList.Sort  (Read 1164 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11792
  • Debugger - SynEdit - and more
    • wiki
Re: Strange result of text sorting in TFPSList.Sort
« Reply #15 on: November 04, 2025, 02:17:13 pm »
>>> ord('a')
97
>>> ord('R')
82

so 'a' must be after 'R'.

Sorting does not happen on ordinal values (well it can, but that is only one of many ways). There are regional different "sort orders" (collation).

Germany for example has several, the "ä" can be sorted after "z" or it can be sorted as equal to "ae" (IIRC there is a 3rd option).


Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #16 on: November 04, 2025, 02:41:22 pm »
That is not relevant, Martin. Alex is asking for consistency across platforms for an rtl provided function.
That is a perfectly legal request. It would break his cross-platform code otherwise.
I totally see his point and it should be consistent out of the box.
We are not talking collations platform level, but collations RTL level.
On the RTL level the results should always be the same. (or pluggable, which it is not, well, a bit)

(Furthermore I sense, but have not debugged yet, that the BASM code is the cause)

In fact, it would be is ludicrous to interpret it otherwise. ( in general a  >:D >:( to the rtl, can't help myself.. O:-) )

Remember the Freepascal adagium.
« Last Edit: November 04, 2025, 02:54:22 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11792
  • Debugger - SynEdit - and more
    • wiki
Re: Strange result of text sorting in TFPSList.Sort
« Reply #17 on: November 04, 2025, 03:16:15 pm »
That is not relevant, Martin. Alex is asking for consistency across platforms for an rtl provided function.

Maybe he added that, but its not his original question:
i see 'ea' before 'eR' which is not per Unicode sort order: 'a' must be after 'R'.

And also, for "consistent cross platform" the ordinal values are not of any interest. Nor would it matter if lower is before upper. Yet he wrote:
>>> ord('a')
97
>>> ord('R')
82

so 'a' must be after 'R'.

It would equally be consistent if all platform would sort in a non-ordinal manner. And all platforms would sort "R" before "a".

Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #18 on: November 04, 2025, 03:18:39 pm »
And all platforms would sort "R" before "a".
Then why the question? it is obvious from the original question it does not!!!
Windows doesn't. You did not test, did you? Me and Alex did.....
I respect you, but that is a big thumb down.

See my screenshot earlier.

BTW: this is not a minor issue, nor platform, it is rtl.
« Last Edit: November 04, 2025, 03:21:55 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11792
  • Debugger - SynEdit - and more
    • wiki
Re: Strange result of text sorting in TFPSList.Sort
« Reply #19 on: November 04, 2025, 03:37:40 pm »
Windows doesn't. You did not test, did you? Me and Alex did.....

Well, how would I test, if it complies with
which is not per Unicode sort order
?

If I can't find the documentation he refers too.


I did not disagree, nor otherwise comment on any of the cross platform indifferences.

I answered to the quoted claim. And to test (verify) it, I need to know where it comes from.

If he does not care about that (but only about cross platform), then he can say that. So far he said that he does care about the above too.

Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #20 on: November 04, 2025, 04:02:25 pm »
Simple: I made AlexTP's question a compilable example. (FGL sort, using rtl Unicode sort)
Compared different platforms.
Concluded AlexTP was right.
And concluded the collation order is different on Windows, but only on Windows, and unnatural: not what you and Alex and me expected.
Provided a solution.

Otherwise I have no opinion except for what I saw.
The RTL should be consistent.
« Last Edit: November 04, 2025, 04:08:53 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

AlexTP

  • Hero Member
  • *****
  • Posts: 2615
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #21 on: November 04, 2025, 08:21:03 pm »
yes, RTL function should return same compare value for all my test-lines on all OSes, please. //// my English is weak so I don't battle with Martin.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11792
  • Debugger - SynEdit - and more
    • wiki
Re: Strange result of text sorting in TFPSList.Sort
« Reply #22 on: November 04, 2025, 09:33:03 pm »
yes, RTL function should return same compare value for all my test-lines on all OSes, please. //// my English is weak so I don't battle with Martin.
Sorry, I do not mean to "battle".
I did not disagree to the "should be the same on all targets" (well, unless documented as intentional different).

It simple looked like (maybe not indented, but anyway) that on top you wanted a specific order because of some documentation that I don't know (on the UNICODE standard). That said, there is a lot I don't know about Unicode, but this would have interested me.

ASerge

  • Hero Member
  • *****
  • Posts: 2464
Re: Strange result of text sorting in TFPSList.Sort
« Reply #23 on: November 05, 2025, 03:39:05 am »
I think the comparison depends on the locale, not even for unicode.
On my Windows x64:
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3. uses Windows;
  4.  
  5. const
  6.   LowerStr1 = 'a';
  7.   UpperStr2 = 'R';
  8.   CSTR_LESS_THAN = 1;
  9.  
  10. begin
  11.   Writeln(LowerStr1, ' < ', UpperStr2, ' direct: ', LowerStr1 < UpperStr2);
  12.   Writeln(LowerStr1, ' < ', UpperStr2, ' lstrcmpA: ', lstrcmpA(LowerStr1, UpperStr2) < 0);
  13.   Writeln(LowerStr1, ' < ', UpperStr2, ' CompareStringA with LOCALE_INVARIANT: ',
  14.     CompareStringA(LOCALE_INVARIANT, 0, LowerStr1, -1, UpperStr2, -1) = CSTR_LESS_THAN);
  15.   Readln;
  16. end.
Output is
Code: Text  [Select][+][-]
  1. a < R direct: FALSE
  2. a < R lstrcmpA: TRUE
  3. a < R CompareStringA with LOCALE_INVARIANT: TRUE

Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #24 on: November 05, 2025, 05:29:27 am »
@ASerge
Locale for unicode? There are no means to set a locale for unicode.
Also Alex and I are in different locales and the same issue is there in both.
« Last Edit: November 05, 2025, 05:35:50 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

paweld

  • Hero Member
  • *****
  • Posts: 1484
Re: Strange result of text sorting in TFPSList.Sort
« Reply #25 on: November 05, 2025, 07:43:51 am »
In Windows, the UnicodeCompareStr function performs the CompareStringW function (fpc sources\rtl\win\sysutils.pp), which requires a locale to be specified (user locale is default). And in Windows, in most locales, a is smaller than R.
The only locales for which a is greater than R are: Norwegian (Bokmal), Norwegian (Nynorsk), Greenlandic, and Danish.
Best regards / Pozdrawiam
paweld

AlexTP

  • Hero Member
  • *****
  • Posts: 2615
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #26 on: November 05, 2025, 07:48:17 am »
I ended with this func, changed @Thaddy's func ..
now my CudaText sorts consistent across all OS'es.

Code: Pascal  [Select][+][-]
  1. function _MyUnicodeCompareStr(const S1, S2: UnicodeString): Integer;
  2. var
  3.   C1, C2: UnicodeChar;
  4.   I, Len: SizeInt;
  5. begin
  6.   Len := Min(Length(S1), Length(S2));
  7.   for I := 1 to Len do
  8.   begin
  9.     C1 := S1[I];
  10.     C2 := S2[I];
  11.     if C1 <> C2 then
  12.     begin
  13.       Result := Ord(C1) - Ord(C2);
  14.       Exit;
  15.     end;
  16.   end;
  17.   Result := Length(S1) - Length(S2);
  18. end;

avk

  • Hero Member
  • *****
  • Posts: 814
Re: Strange result of text sorting in TFPSList.Sort
« Reply #27 on: November 05, 2025, 08:04:50 am »
It seems that exactly the same sorting order would be provided by a comparator
Code: Pascal  [Select][+][-]
  1. function _MyUnicodeCompareStr(const S1, S2: UnicodeString): Integer;
  2. begin
  3.   Result := StrComp(PWideChar(S1), PWideChar(S2))
  4. end;
  5.  

Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #28 on: November 05, 2025, 08:13:33 am »
@avk
(My code IS a comparator, it is just that the default comparator for unicode is different from all others.)
Yes. Why do you always come up with the easy way out?  :D :)

@AlexTP
avk's solution is only necessary for Windows and is probably faster too.
(Ran the same tests as yesterday)
« Last Edit: November 05, 2025, 08:23:04 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Thaddy

  • Hero Member
  • *****
  • Posts: 18305
  • Here stood a man who saw the Elbe and jumped it.
Re: Strange result of text sorting in TFPSList.Sort
« Reply #29 on: November 05, 2025, 08:29:40 am »
The only locales for which a is greater than R are: Norwegian (Bokmal), Norwegian (Nynorsk), Greenlandic, and Danish.
Then it is wrong by default. E.g. Dutch, Lithuanian and German and many more western languages expect Uppercase to sort before lowercase.
And locales are only for Ansi upper half.
Collations are different.
« Last Edit: November 05, 2025, 08:37:28 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

 

TinyPortal © 2005-2018