Recent

Author Topic: My NaturalSort unit  (Read 13552 times)

typo

  • Hero Member
  • *****
  • Posts: 3051
My NaturalSort unit
« on: April 24, 2015, 11:08:46 pm »
Attached. Please add your own desired behavior (for your specific charset) and post it here.

NaturalSort procedure sorts numbers in their proper natural position, even if they are inside strings.
« Last Edit: April 25, 2015, 12:27:23 am by typo »

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: My NaturalSort unit
« Reply #1 on: April 25, 2015, 12:44:23 am »
I only accessing via my phone currently but I wonder if this will help me with a stringgrid sort problem I have. The row count value is put in one of my columns, and all columns can be sorted using the grid colsort property, that is set to true, so when user clicks header the values are sorted.

But if they choose the row count they are sorted like this:

1
10
11
12
...
19
2
20
21
22
...
3
31
32
...
4
41
Etc

Will your unit allow me to correct this, typo?

Thanks for sharing regardless.



typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #2 on: April 25, 2015, 12:59:28 am »
Yes, I think so.

I have used your numbers, see the result in the attached JPG image.

The numbers are not a problem for the unit, indeed, but verify carefully the alphabetical order for your codepage.
« Last Edit: April 25, 2015, 01:13:40 am by typo »

Septe

  • Jr. Member
  • **
  • Posts: 68
Re: My NaturalSort unit
« Reply #3 on: April 25, 2015, 05:48:32 am »
Actually, what he's describing are not numbers but strings.  If you do have that accounted for, great.  I'd be interested in how you managed that.  The only way I can think of doing it is processing each line, check the front of string, read off the number portion leaving the original number in the string, convert it, then sort on the number.  Is there a better way?

I can't tell from reading the source code.  I'm no expert.  Btw, are you able to account for negative numbers sorting properly?  I wouldn't think it's a common thing to see negative number in a string but theoretically it just takes one person who would.

-2 What is this?
-1 Who is this?
0 Where is this?
1 When is this?
2 How is this?

I shudder when I see a list like this but it's possible.

Thanks.
« Last Edit: April 25, 2015, 05:55:00 am by Septe »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #4 on: April 25, 2015, 10:02:57 am »
Of course there is no list with negative numbers.

The minus signal before the numbers are interpreted as not being part of the numbers.

-1 Who is this?
-2 What is this?
0 Where is this?
1 When is this?
2 How is this?

The numeric parts of the strings are compared as numbers and not as strings.

1.1.6 When is this?
1.2.1 Who is this?
2.1.4 How is this?
2.1.20 Where is this?
2.2.1 What is this?
« Last Edit: April 25, 2015, 10:27:16 am by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #5 on: April 25, 2015, 02:40:22 pm »
In latin languages, diacritical marks don't alter the alphabetical order, the same occurs in German language. But it is not the case, for instance, in Swedish language, on which the diacritically marked letters go to the end of it.

So your main problem with this unit is to determine whether the alphabetical order is right or not and modify it accordingly.

The initialization part of the unit verifies the system codepage and it can behaves according to this value.
« Last Edit: April 25, 2015, 05:13:00 pm by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #6 on: April 25, 2015, 05:13:11 pm »
Before this I tried StrCmpLogicalW for Windows, but curiously this solutin is faster than that one.

Code: [Select]
function StrCmpLogicalW(P1, P2: PWideChar): Integer;  stdcall; external 'Shlwapi.dll';

It is not declared on Lazarus.

This Windows API function has a bug, it sorts zeros like this:

Code: [Select]
000
000_A
000_B
00
0

which was corrected in EvsCompareNatural_A.

« Last Edit: April 26, 2015, 01:08:36 am by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #7 on: April 27, 2015, 02:57:15 am »
Code: [Select]
function NaturalCompareStringUTF8(Str1, Str2: string): integer;
var
  Num1, Num2 :Double;
  pStr1, pStr2 :PChar;
  Len1, Len2 :integer;
  UTF8Char1, UTF8Char2 :string;
  lang :string = '';
  function RemoveDiacritics(const S: string): string;
  // by SilvioProg
  var
    F: Boolean;
    I: SizeInt;
    PS, PD: PChar;
  begin
    SetLength(Result, Length(S));
    PS := PChar(S);
    PD := PChar(Result);
    I := 0;
    while PS^ <> #0 do
    begin
      F := PS^ = #195;
      if F then
        case PS[1] of
          #128..#132: PD^ := #65;
          #135: PD^ := #67;        // letter Ç
          #136..#139: PD^ := #69;
          #140..#143: PD^ := #73;
          #145: PD^ := #78;        // letter Ñ
          #146..#150: PD^ := #79;
          #153..#156: PD^ := #85;
          #157: PD^ := #89;
          #160..#164: PD^ := #97;
          #167: PD^ := #99;        // letter ç
          #168..#171: PD^ := #101;
          #172..#175: PD^ := #105;
          #177: PD^ := #110;
          #178..#182: PD^ := #111;  // letter o
          #185..#188: PD^ := #117;
          #189..#191: PD^ := #121;
        else
          F := False;
        end;
      if F then
        Inc(PS)
      else
        PD^ := PS^;
      Inc(I);
      Inc(PD);
      Inc(PS);
    end;
    SetLength(Result, I);
  end;
  function IsNumber(ch :Char): Boolean;
  begin
     Result := ch in ['0'..'9'];
  end;
  function GetNumber(var pch: PChar;var Len: Integer): Double;
    var FoundPeriod:Boolean;
        Count:Integer;
  begin
     FoundPeriod := False;
     Result := 0;
     While (pch^ <> #0) and
        (IsNumber(pch^) or ((not FoundPeriod) and (pch^ = '.'))) do
     begin
        if pch^ = '.' then
        begin
          FoundPeriod := True;
          Count := 0;
        end
        else
        begin
           if FoundPeriod then
           begin
             Inc(Count);
             Result := Result + (ord(pch^) -ord('0'))*Power(10, -Count);
           end
           else Result := Result * 10 + ord(pch^) -ord('0');
        end;
        inc(Len);
        Inc(pch);
     end;
  end;
begin
  if (Str1<>'') and (Str2<>'') then
  begin
    pStr1 := @Str1[1];
    pStr2 := @Str2[1];
    Result := 0;
    while not ((pStr1^ = #0) or (pStr2^ = #0)) do
    begin
       Len1:=0; Len2:=0;
       while (pStr1^=' ') do
       begin
          Inc(pStr1);
          Inc(Len1)
       end;
       while (pStr2^=' ') do
       begin
          Inc(pStr2);
          Inc(Len2)
       end;
       if IsNumber(pStr1^) and IsNumber(pStr2^) then
       begin
          Num1 := GetNumber(pStr1, Len1);
          Num2 := GetNumber(pStr2, Len2);
          if Num1 < Num2 then Result := -1
          else if Num1 > Num2 then Result := 1
          else begin
             if Len1 < Len2 then Result := -1
             else if Len1 > Len2 then Result := 1;
          end;
          Dec(pStr1);
          Dec(pStr2);
       end
       else
       begin
         // UTF8 chars
         {$IFDEF WINDOWS}
         if CodePage = 1252 then
         {$ELSE}
         {$IFDEF LINUX}
         if Length(CodePageString) > 1 then
           lang := CodePageString[1] + CodePageString[2];
         if (lang = 'pt') or (lang = 'es') or (lang = 'fr') or (lang = 'it')
           or (lang = 'gl' {galician}) or (lang = 'gn' {guarani})
           or (lang = 'ay' {aymará}) then
         {$ENDIF}
         {$ENDIF}
         begin
           UTF8Char1 := RemoveDiacritics(UTF8Copy(pStr1, 1, 1));
           UTF8Char2 := RemoveDiacritics(UTF8Copy(pStr2, 1, 1));
         end
         else
         begin
           UTF8Char1 := UTF8Copy(pStr1, 1, 1);
           UTF8Char2 := UTF8Copy(pStr2, 1, 1);
         end;
         if UTF8Char1 <> UTF8Char2 then
           if UTF8Char1 < UTF8Char2 then
             Result := -1
           else if UTF8Char1 > UTF8Char2 then
             Result := 1;
       end;
       if Result <> 0 then Break;
       Inc(pStr1);
       Inc(pStr2);
    end;
  end;
  Num1 := Length(Str1);
  Num2 := Length(Str2);
  if (Result = 0) and (Num1 <> Num2) then
  begin
       if Num1 < Num2 then Result := -1
       else Result := 1;
  end;
end; 
« Last Edit: April 27, 2015, 03:38:25 pm by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #9 on: May 04, 2015, 09:36:35 pm »
AFAIK, StrCmpLogicalW does the job on Windows and naturalstrcmp does it on Linux.

bylaardt

  • Sr. Member
  • ****
  • Posts: 309
Re: My NaturalSort unit
« Reply #10 on: May 10, 2015, 02:08:48 am »
Hi typo,

several years ago i made a similar function to convert a string to another, removint al numerics and dates and reinserting in the same position with a constant size (unsigned) to prevent when the numeric order are insigned on a string, like so:

batch 2 code 3      -> batch0000000000000002code0000000000000003
Batch 10 Code 15 -> batch0000000000000010code0000000000000015
Batch 14 code 2    -> batch0000000000000014code0000000000000002

it's a litle slower,  no prevent negative numbers, and dates are considered only in dd/mm/[yy]yy format.

if you needed, i can search this functions on my old burn cd box.
or you can just implement this resources in your function ( not so hard to do).

I made this function because of the batch numbers of pharma products have batch names with sequencial numbers or dates, without standard.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #11 on: May 10, 2015, 02:13:44 am »
Thanks, but I have discarded the native routines and used only the system/language-aware functions for that, StrCmpLogicalW in Windows and naturalstrcmp in Linux.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #12 on: May 25, 2015, 02:27:08 am »
Current state of this unit:

1) Topic enumerators sort till 99 subtopics:
     1
     1.1.1
     1.1.2
     1.2.1
     1.99.99
     2
2) Integers sort:
    0
    00
    000
    1
    2
    10
3) Floating point numbers sort:
    0,99
    1
    1,01
    1,99
    2
Collated alpha sort:
    Alpha sorting task made by OS (Windows and Linux).
Thousand separated numbers sort:
    1198
    1.199
    1199,50
    1200
    1.201

Thousand and decimal separators are the system default ones.

It is fast. See the time against other functions:
      StrCmpLogicalW:   1
    WideCompareText:   1,238
              NaturalSort:    0,746

http://sourceforge.net/projects/lazarusfiles/files/naturalsort.zip/download
« Last Edit: May 25, 2015, 03:39:23 am by typo »

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: My NaturalSort unit
« Reply #13 on: May 25, 2015, 10:53:32 am »
Thanks a lot for your good work & for sharing it Typo. Can you please add sorting of IP addresses to it? I looked at the source & I can't seem to find it. I had posted such a function here in the forum http://forum.lazarus.freepascal.org/index.php/topic,23656.msg141508.html#msg141508

The function as I use it today in my programs is

Code: [Select]
    function CompareTextAsIPAddress(const s1, s2: string): integer;
        function CompareIPs(AString1, AString2: string): integer;
        // Original source: http:www.delphipages.com/forum/showthread.php?t=104191
        // Also on http://forum.lazarus.freepascal.org/index.php/topic,23656.msg141508.html#msg141508
        var
          a, b: TInAddr;
        begin
          //
          a := StrToHostAddr(PChar(AString1));
          b := StrToHostAddr(PChar(AString2));
          //
          if ntohl(a.S_addr) > ntohl(b.S_addr) then
            Result := -1
          else if ntohl(a.S_addr) < ntohl(b.S_addr) then
            Result := 1
          else
            Result := 0;
        end;

    begin
      Result := CompareIPs(s1, s2);
    end;

JD
« Last Edit: May 25, 2015, 10:56:56 am by JD »
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: My NaturalSort unit
« Reply #14 on: May 25, 2015, 11:05:49 am »
Thanks a lot for your good work & for sharing it Typo. Can you please add sorting of IP addresses to it?
IP addresses are nothing more than numbers they should be sorted correctly.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

TinyPortal © 2005-2018