Recent

Author Topic: Strange result of text sorting in TFPSList.Sort  (Read 1967 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 2673
    • UVviewsoft
Strange result of text sorting in TFPSList.Sort
« on: November 04, 2025, 08:08:28 am »
Code: Pascal  [Select][+][-]
  1.   TATStringItemList = class(TFPSList)
  2.   public
  3.     constructor Create;
  4.     function GetItem(AIndex: SizeInt): PATStringItem;
  5.     procedure Deref(Item: Pointer); override; overload;
  6.     procedure SortRange(L, R: SizeInt; Compare: TFPSListCompareFunc);
  7.   end;
  8.  
  9. var
  10.   FList: TATStringItemList;
  11.  
  12. function TATStrings.Compare_Asc(Key1, Key2: Pointer): Integer;
  13. var
  14.   P1, P2: PATStringItem;
  15. begin
  16.   P1:= PATStringItem(Key1);
  17.   P2:= PATStringItem(Key2);
  18.   Result:= UnicodeCompareStr(P1^.Line, P2^.Line);
  19. end;
  20.  
  21. //
  22. Func:= @Compare_Asc;
  23. FList.Sort(Func);
  24.  

for lines
Quote
e-
e g
e0
ea
eR

gives this ^^^ order of case sensitive sort. i see 'ea' before 'eR' which is not per Unicode sort order: 'a' must be after 'R'.


Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #1 on: November 04, 2025, 08:14:59 am »
Are you trying to use an UTF16 sort on UTF8 strings?....   Because that is what it looks like.
You should use an UTF8 sort.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

AlexTP

  • Hero Member
  • *****
  • Posts: 2673
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #2 on: November 04, 2025, 08:20:26 am »
no, i am trying to use UnicodeCompareStr which takes UnicodeString params, on UnicodeString values P1^.Line, P2^.Line (Line is UnicodeString property).

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #3 on: November 04, 2025, 08:49:48 am »
For which platforms? Windows differs from Unix for the widestring manager.
Assuming 64 bit intel/amd:
Although: both use x86_64.inc code here.
It does a simple word by word numerical comparison using compareword.
Did not check other cpu's nor 32 bit intel, but will test AARCH64/linux64 later.
« Last Edit: November 04, 2025, 09:10:03 am by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Strange result of text sorting in TFPSList.Sort
« Reply #4 on: November 04, 2025, 09:45:47 am »
...
 i see 'ea' before 'eR' which is not per Unicode sort order: 'a' must be after 'R'.

IIRC UnicodeCompareStr() gives the alphabetical order of the strings.

Awkward

  • Full Member
  • ***
  • Posts: 154
Re: Strange result of text sorting in TFPSList.Sort
« Reply #5 on: November 04, 2025, 10:05:16 am »
Has similar problem when used TStringList sort for localized strings, for ANSI and UTF8 string encodings.
  sl.Add(('- Все -'));
  sl.Add(('-- А Все ли? --'));
  sl.Add(('-- Блин, не Все --'));

depending of source and string encoding result can be different, sometime even right but order often was wrong. And if replace '-' by...'$' for example, sort result was changed. Still didn't understand why problem exists if first signs are pure ansi coded.

AlexTP

  • Hero Member
  • *****
  • Posts: 2673
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #6 on: November 04, 2025, 10:12:58 am »
@Thaddy, platform is Win32 (my tests) and user has Linux x64. why 'ea' is before 'eR' , if Unicode for 'a' is after Unicode for 'R'? this is case sensitive sort.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12142
  • Debugger - SynEdit - and more
    • wiki
Re: Strange result of text sorting in TFPSList.Sort
« Reply #7 on: November 04, 2025, 10:14:59 am »
i see 'ea' before 'eR' which is not per Unicode sort order: 'a' must be after 'R'.

I couldn't find that documented? (that Unicode requires upper to be sorted before lower?)

https://www.unicode.org/reports/tr10/#Case_Comparisons says that both orders are supported.

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #8 on: November 04, 2025, 10:27:58 am »
@AlexTP

Tested with:
Code: Pascal  [Select][+][-]
  1. {$mode delphiunicode}
  2. uses fgl,sysutils,classes{$ifdef unix},cwstring{$endif};
  3.  
  4. type
  5.   TUnicodeStrList = TFPGlist<unicodestring>;
  6. var
  7.   S:TUnicodeStrList;
  8.   i:integer;
  9. begin
  10.   S:= TUnicodeStrList.Create;
  11.   S.Add('eR');
  12.   S.Add('e-');
  13.   S.Add('e a');
  14.   S.Add('ea');
  15.   S.Add('e0');
  16.   s.sort(@UnicodeCompareStr);
  17.   for i := 0 to S.Count-1 do writeln(S[i]);
  18.   s.free;
  19. end.

Linux64+cwstring is correct:
Code: Text  [Select][+][-]
  1. e a
  2. e-
  3. e0
  4. eR
  5. ea
Windows11-64:
Code: Text  [Select][+][-]
  1. e-
  2. e a
  3. e0
  4. ea
  5. eR

So the culprit is Windows sec, since you used 32 bit. Windows is "wrong"?
Well, no, see Martin's remark. But you can change the collation order.
« Last Edit: November 04, 2025, 01:19:01 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #9 on: November 04, 2025, 10:44:44 am »
Compatible, changes the collation order.:
Code: Pascal  [Select][+][-]
  1. {$mode delphiunicode}
  2. uses fgl{$ifdef unix},cwstring{$endif};
  3.  
  4. type
  5.   TUnicodeStrList = TFPGlist<unicodestring>;
  6. {$if not defined(min)} //prevent math
  7. function min(const a,b:cardinal):cardinal;
  8. begin
  9.   result := a;
  10.   if a > b then result := b;
  11. end;
  12. {$endif}
  13. function CompareUpperBeforeLower(const S1, S2: UnicodeString): Integer;
  14. var
  15.   C1, C2: UnicodeChar; // or widechar
  16.   I, Len: Integer;
  17. begin
  18.   Len := Min(Length(S1), Length(S2));
  19.   for I := 1 to Len do
  20.   begin
  21.     C1 := S1[I];
  22.     C2 := S2[I];
  23.     if C1 <> C2 then
  24.     begin
  25.       // Prioritize uppercase over lowercase
  26.       if (UpCase(C1) = UpCase(C2)) then
  27.         Result := Ord(C2) - Ord(C1) // reverse order: uppercase first
  28.       else
  29.         Result := Ord(C1) - Ord(C2);
  30.       Exit;
  31.     end;
  32.   end;
  33.   Result := Length(S1) - Length(S2);
  34. end;
  35.  
  36. var
  37.   S:TUnicodeStrList;
  38.   i:integer;
  39. begin
  40.   S:= TUnicodeStrList.Create;
  41.   S.Add('eR');
  42.   S.Add('e-');
  43.   S.Add('e a');
  44.   S.Add('ea');
  45.   S.Add('e0');
  46.   s.sort(@CompareUpperBeforeLower);
  47.   for i := 0 to S.Count-1 do writeln(S[i]);
  48.   s.free;
  49. end.
Should be plug and play for you.
Renders the same result on both platforms I tested.
Optimizing it is up to you  :)
« Last Edit: November 04, 2025, 11:14:46 am by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #10 on: November 04, 2025, 12:35:10 pm »
Reported as #41483
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

AlexTP

  • Hero Member
  • *****
  • Posts: 2673
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #11 on: November 04, 2025, 12:47:23 pm »
i see 'ea' before 'eR' which is not per Unicode sort order: 'a' must be after 'R'.

I couldn't find that documented? (that Unicode requires upper to be sorted before lower?)

in Python console, enter this:

>>> ord('a')
97
>>> ord('R')
82

so 'a' must be after 'R'.

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #12 on: November 04, 2025, 12:49:22 pm »
No, that is based on your expectation, the Windows collation is just as valid.
Anyway, try my above function, that is consistent between platforms:Unix collation order. A before a.
You are right about the python order, that uses the default Unix collation.
(But that is not Unicode standard collation)

BTW If the Windows version is proven wrong, the error is in the inline assembler code.
« Last Edit: November 04, 2025, 12:57:37 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

AlexTP

  • Hero Member
  • *****
  • Posts: 2673
    • UVviewsoft
Re: Strange result of text sorting in TFPSList.Sort
« Reply #13 on: November 04, 2025, 12:52:27 pm »
@Thaddy
>Compatible, changes the collation order.:

bad for me, new code which i must support. i want to sort per UnicodeCompareStr() and have Unicode order.

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Re: Strange result of text sorting in TFPSList.Sort
« Reply #14 on: November 04, 2025, 01:00:31 pm »
@AlexTP
It only changes the collation order on Windows. It renders the same result on Linux (and Python language).
You simply have to make a choice, but let's see response to my bug report.
It should be consistent between platforms. Certainly for you. :D

This might be more palatable to your taste:
Code: Pascal  [Select][+][-]
  1. {$ifdef mswindows}
  2. {$if not defined(min)} //prevent math dependency
  3. function min(const a,b:cardinal):cardinal;inline;
  4. begin
  5.   result := a;
  6.   if a > b then result := b;
  7. end;
  8. {$endif}
  9.  
  10. function UnicodeCompareStr(const S1, S2: UnicodeString): Integer;inline;
  11. var
  12.   C1, C2: UnicodeChar;
  13.   I, Len: Integer;
  14. begin
  15.   Len := Min(Length(S1), Length(S2));
  16.   for I := 1 to Len do
  17.   begin
  18.     C1 := S1[I];
  19.     C2 := S2[I];
  20.     if C1 <> C2 then
  21.     begin
  22.       // Prioritize uppercase over lowercase
  23.       if (UpCase(C1) = UpCase(C2)) then
  24.         Result := Ord(C2) - Ord(C1) // reverse order: uppercase first
  25.       else
  26.         Result := Ord(C1) - Ord(C2);
  27.       Exit;
  28.     end;
  29.   end;
  30.   Result := Length(S1) - Length(S2);
  31. end;
  32. {$endif}
This will change the collation order only for Windows. For Linux it keeps the original, like Python.
It subsequently renders the same on all platforms.
The magic is line 24.
« Last Edit: November 04, 2025, 01:21:29 pm by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

 

TinyPortal © 2005-2018