Recent

Author Topic: Sorting special characters  (Read 2413 times)

IM314

  • New member
  • *
  • Posts: 5
Sorting special characters
« on: October 25, 2021, 09:34:46 am »
Very occasional, complete amateur hobbyist here, so please be patient with my ignorance. I've been struggling to get names with special characters to sort 'right'. The following is a *very* rough and simplified example of what I am dealing with.

someObject is a class with property name : String.

Code: Pascal  [Select][+][-]
  1. constructor someObject.create(s : string);
  2. begin
  3.    name := s;
  4. end;
  5.  
  6. function compare1(s1,s2 : pointer) : integer;
  7. begin
  8.    result := comparetext(someObject(s1).name,someObject(s2).name);
  9. end;
  10.  
  11. procedure TForm1.Button1Click(Sender: TObject);
  12. var
  13. i : integer;
  14. list : tfpList;
  15. begin
  16.   list := tfpList.Create;
  17.   list.add(someObject.create('Rêd'));
  18.   list.add(someObject.create('Rad'));
  19.   list.add(someObject.create('Rod'));
  20.   list.add(someObject.create('Rêzd'));
  21.   list.add(someObject.create('Rêad'));
  22.   list.Sort(@compare1);
  23.   memo1.clear;
  24.   for i := 0 to list.count-1 do
  25.    memo1.append(someObject(list[i]).name);
  26. end;  
  27.  

In this example (and my real-life project where I read names with special characaters from a file), the code above always sorts the special characters as if they come after z, so the result is always
Rad
Rod
Rêad
Rêd
Rêzd

How do I cast and/or compare these strings to get a more 'natural' sort order like every other piece of software I've tried (like Excel) seems to achieve with the same list: e.g:
Rad
Rêad
Rêd
Rêzd
Rod

Any advice much appreciated.

dseligo

  • Sr. Member
  • ****
  • Posts: 372
Re: Sorting special characters
« Reply #1 on: October 25, 2021, 10:10:26 am »
Maybe you can use this function to convert accented letters to non-accented ones and then sort: https://forum.lazarus.freepascal.org/index.php/topic,46804.msg334219.html#msg334219

IM314

  • New member
  • *
  • Posts: 5
Re: Sorting special characters
« Reply #2 on: October 25, 2021, 10:19:46 am »
Thanks! This certainly works, but it feels like an incredibly roundabout (if clever) way of doing this. From my admittedly amateur reading and understanding of the documentation, something like
Quote
ansiCompareText(string1,string2)
should do the same, but it does not appear to have any effect.

Zvoni

  • Hero Member
  • *****
  • Posts: 803
Re: Sorting special characters
« Reply #3 on: October 25, 2021, 10:38:07 am »
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

IM314

  • New member
  • *
  • Posts: 5
Re: Sorting special characters
« Reply #4 on: October 25, 2021, 10:55:05 am »
Ah, should have known there would be a word for it. Collation. Interesting, and it does look like it addresses the issue. I knew from the start different languages would sort special characters differently based on pronunciation, so this makes sense that you can define different sort schemes, as it were. I was just being lazy and hoping for a magical shortcut that takes the nearest* equivalent from the Latin alphabet and sorts ê like e and à like A, for instance. The previous commenter's suggested function does exactly that, so at least my project now works.

*Poorly-defined, I know. Hence collation being a thing.

Zvoni

  • Hero Member
  • *****
  • Posts: 803
Re: Sorting special characters
« Reply #5 on: October 25, 2021, 11:05:26 am »
Well, another workaround might be to read all your filenames into a SQLite-InMemory Database, load the collation you need, fire off a SELECT-Query with an ORDER BY according to the chosen Collation, move the returned "List" of (hopefully) correctly sorted strings to your List
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

wp

  • Hero Member
  • *****
  • Posts: 9008
Re: Sorting special characters
« Reply #6 on: October 25, 2021, 11:15:13 am »
There's NaturalCompareText() in unit StrUtils. When I call this in your Compare1() function this output is as expected:

Rad
Rêad
Rêd
Rêzd
Rod

Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

winni

  • Hero Member
  • *****
  • Posts: 2792
Re: Sorting special characters
« Reply #7 on: October 25, 2021, 11:16:05 am »
Hi!

The Unit LazUTF8 contains this function:

Code: Pascal  [Select][+][-]
  1. function UTF8CompareStr(const S1, S2: string): PtrInt;

Winni

IM314

  • New member
  • *
  • Posts: 5
Re: Sorting special characters
« Reply #8 on: October 25, 2021, 02:44:46 pm »
There's NaturalCompareText() in unit StrUtils. When I call this in your Compare1() function this output is as expected:

Rad
Rêad
Rêd
Rêzd
Rod

I was very excited to see this, but it does not have the same result for me... I still get the strange sort result from my first post.  I'm on Mac, btw. A bit of superficial reading shows all kinds of issues with character sets (UTF-8, etc) that could impact on this.

IM314

  • New member
  • *
  • Posts: 5
Re: Sorting special characters
« Reply #9 on: October 25, 2021, 02:46:13 pm »
Hi!

The Unit LazUTF8 contains this function:

Code: Pascal  [Select][+][-]
  1. function UTF8CompareStr(const S1, S2: string): PtrInt;

Winni

Thanks. However, as with the previous poster's NaturalCompareText, this seems to have no effect on my list being sorted in a different way. Not sure if the fact I'm on Mac could be an issue.

Alextp

  • Hero Member
  • *****
  • Posts: 1469
    • UVviewsoft
Re: Sorting special characters
« Reply #10 on: October 25, 2021, 02:58:47 pm »
Chars A-Z have low Unicode numbers (below $FF) and chars with accents have the high numbers in Unicode. So the sorting must be as you see.

Alextp

  • Hero Member
  • *****
  • Posts: 1469
    • UVviewsoft
Re: Sorting special characters
« Reply #11 on: October 25, 2021, 03:05:45 pm »
From reply above, get the function which sorts ignoring the accents. Name is SimpleCompare. Then

Code: Pascal  [Select][+][-]
  1. function compare1(s1,s2 : pointer) : integer;
  2. begin
  3.    result := SimpleCompare(someObject(s1).name,someObject(s2).name);
  4.    if result = 0 then
  5.      result := comparetext(someObject(s1).name,someObject(s2).name);
  6. end;
  7.  

wp

  • Hero Member
  • *****
  • Posts: 9008
Re: Sorting special characters
« Reply #12 on: October 25, 2021, 03:38:41 pm »
There's NaturalCompareText() in unit StrUtils.
I was very excited to see this, but it does not have the same result for me... I still get the strange sort result from my first post.  I'm on Mac, btw.
I tested on Windows, Linux/Ubuntu and macOS Mojave. Yes, it fails on mac, but it works on Win and Linux, so I think the idea is basically correct, but is not correctly implemented on cocoa. You should write an FPC bug report for NaturalCompareText on cocoa. Add your sources as a compilable project.
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

 

TinyPortal © 2005-2018