Recent

Author Topic: Sorting Middle Eastern and other Language Strings  (Read 13704 times)

Avishai

  • Hero Member
  • *****
  • Posts: 1021
Re: Sorting Middle Eastern and other Language Strings
« Reply #15 on: October 02, 2011, 05:13:04 pm »
Thanks typo,

It's looking like TStringList and others almost need to have 'property Language' added in the published section just to do a simple (not so simple) sort.

I was actually joking when I said this.  But maybe it's not a joke.
Lazarus Trunk / fpc 2.6.2 / Win32

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Sorting Middle Eastern and other Language Strings
« Reply #16 on: October 02, 2011, 07:06:05 pm »
CustomSort seems to be enough. I use this:

Code: [Select]
function UTF8Compare(List: TStringList; Index1, Index2: Integer): Integer;
begin
  Result := WideCompareText(UTF8Decode(List[Index1]),UTF8Decode(List[Index2]));
end;

procedure TForm1.Button7Click(Sender:TObject);
var
  sl:TStringList;
begin
 sl:=TStringList.Create;
 try
   sl.Text := Memo1.text;
   sl.CustomSort(@UTF8Compare);
   Memo2.Text := sl.Text;
 finally
   sl.Free;
 end;
end;     

Avishai

  • Hero Member
  • *****
  • Posts: 1021
Re: Sorting Middle Eastern and other Language Strings
« Reply #17 on: October 02, 2011, 07:35:32 pm »
Thanks typo,  this is also a nice routine.  It almost works in Hebrew, but it still has a problem with Hebrew "special case final letters".  The routine that volvo887 shared gets it perfect for English and Hebrew, and apparently for Russian and Arabic.
Lazarus Trunk / fpc 2.6.2 / Win32

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: Sorting Middle Eastern and other Language Strings
« Reply #18 on: October 04, 2011, 06:26:28 pm »
If you seek with your preferred search engine for keywords "sort digraph weights" you will find some solutions. I'm sure there is one that fits your issue but non that fits all. The questions should be asked if correct sorting is necessary. If you want fast access within a sorted list it doesn't matter which kind of sort mode you apply before.
Lazarus 1.7 (SVN) FPC 3.0.0

Avishai

  • Hero Member
  • *****
  • Posts: 1021
Re: Sorting Middle Eastern and other Language Strings
« Reply #19 on: October 05, 2011, 08:00:37 pm »
Typo, what languages did you use "function UTF8Compare" to sort?  I'm trying to use a case statement for CustomSort.  I need a more meaningful name than   "AnotherSort" :)  Added: OK, I changed it to LatinSort.

Code: [Select]
const
  StandardSort = 0;  //Arabic, English, Hebrew, Russian...
  LatinSort = 1;   //Portuguese...

function SortStandard(List: TStringList; Index1, Index2: Integer): Integer;
{Arabic, English, Hebrew, Russian and others that are not known yet}
begin
   Result := StriComp(PChar(List[Index1]),PChar(List[Index2]));
end;

function SortLatin(List: TStringList; Index1, Index2: Integer): Integer;
{UTF8Compare - Portuguese...}
begin
  Result := WideCompareText(UTF8Decode(List[Index1]),UTF8Decode(List[Index2]));
end;

function SortStringList(AStringList: TStringList; SortStyle: Integer): Integer;
begin
  case SortStyle of
    StandardSort: AStringList.CustomSort(@SortStandard);
    LatinSort: AStringList.CustomSort(@SortAnother);
    {More to come}
  end;
end;
[\CODE]


If anyone else has other Language specific Sorting information, it would be greatly appreciated. 
Mostly I'm looking for Spanish, French, German and Ukrainian for now.
« Last Edit: October 05, 2011, 08:51:06 pm by Avishai »
Lazarus Trunk / fpc 2.6.2 / Win32

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Sorting Middle Eastern and other Language Strings
« Reply #20 on: October 05, 2011, 08:03:19 pm »
Portuguese, but I think it works to other latin languages too, maybe accented languages in general.
« Last Edit: October 05, 2011, 08:06:16 pm by typo »

Avishai

  • Hero Member
  • *****
  • Posts: 1021
Re: Sorting Middle Eastern and other Language Strings
« Reply #21 on: October 05, 2011, 08:46:09 pm »
Bandbaz, if you have a Sort routine that works for Farsi or any other languages, I will add it to my code as well.  My hope is that I can get enough GOOD code that I can post it in the Wiki where people can add or correct it.  Even if they don't need all of the code, they will be able to copy the pieces that they need and have some hope that it will work correctly.  I had no idea it was this difficult to sort for different languages.  But the strangest part is the different controls in Lazarus give different sort results for the same language.
« Last Edit: October 05, 2011, 08:53:07 pm by Avishai »
Lazarus Trunk / fpc 2.6.2 / Win32

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Sorting Middle Eastern and other Language Strings
« Reply #22 on: October 05, 2011, 11:32:45 pm »
AFAIK, TListBox passes the task to OS, which has a better localized result.
« Last Edit: October 05, 2011, 11:38:34 pm by typo »

Avishai

  • Hero Member
  • *****
  • Posts: 1021
Re: Sorting Middle Eastern and other Language Strings
« Reply #23 on: October 05, 2011, 11:59:45 pm »
Typo, thanks for your input.  Maybe I'm doing something not right.  But TListBox gives a "less" properly sorted result than TComboBox for Hebrew.  Actually TComboBox gives the "closest" result for Hebrew and it is still not correct.  I also checked TTreeView and got a different result from the other controls I checked.  Every control I have checked gives a different result for sorting Hebrew.  %)  Even if they are wrong, they should all give the same result I would think.
Lazarus Trunk / fpc 2.6.2 / Win32

 

TinyPortal © 2005-2018