Recent

Author Topic: naturalstrcmp function  (Read 28917 times)

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #15 on: May 12, 2015, 10:43:22 am »
Have your tried loading the libMut.so dynamically?

This works for me.
Code: [Select]
// don't load it via external
// function naturalstrcmp(const s: PChar; const t: PChar): Integer; external 'Mut.so' name 'naturalstrcmp';

procedure UseDLL;
type
  TMyFunc=function (const s: PChar; const t: PChar): Integer; cdecl;
var
  MyLibC: TLibHandle = dynlibs.NilHandle;
  MyFunc: TMyFunc;
  FuncResult: integer;
begin
  MyLibC := LoadLibrary('libMut.' + SharedSuffix);
  if MyLibC = dynlibs.NilHandle then
  begin
    Showmessage('.so not loaded');
    Exit;  //DLL was not loaded successfully
  end;
  MyFunc:= TMyFunc(GetProcedureAddress(MyLibC, 'naturalstrcmp'));

  FuncResult:= MyFunc(PChar('a2'), PChar('a03'));  //Executes the function
  Showmessage(IntToStr(FuncResult));

  FuncResult:= MyFunc(PChar('a03'), PChar('a2'));  //Executes the function
  Showmessage(IntToStr(FuncResult));

  if MyLibC <>  DynLibs.NilHandle then
    if FreeLibrary(MyLibC) then
      MyLibC:= DynLibs.NilHandle;  //Unload the lib, if already loaded
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  UseDLL;
end;

I've been reading this:
http://wiki.lazarus.freepascal.org/Lazarus/FPC_Libraries#Initialization

Maybe there is some problem loading this library on startup of a Lazarus program. Loading it dynamically solves that.

(learned a lot about Linux today :))

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #16 on: May 12, 2015, 11:29:12 am »
This worked for me, thanks.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #17 on: May 12, 2015, 03:16:11 pm »
I am disappointed with this function, it does not sort naturally neither sorts properly in local alphabetical order.

Am I doing something wrong?

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #18 on: May 12, 2015, 03:45:20 pm »
According to this:
http://www.unix.com/man-page/debian/3/NATURALSTRCMP/
Quote
naturalstrcmp is an alphanumerical comparison function that ensures x12 > x2 for example. First, the alphabetical part of the string is compared, using strcmp(3), then, if it has trailing numbers, they are compared using a numerical function.

It should give the same result for x3 > x2 as for x12 > x2.
But with this the result is different:
Code: [Select]
// Result x12 > x2 = -1 (false)
FuncResult:= MyFunc(PChar('x12'), PChar('x2'));  //Executes the function
Showmessage(IntToStr(FuncResult));

// Result x3 > x2 = 1 (true)
FuncResult:= MyFunc(PChar('x3'), PChar('x2'));  //Executes the function
Showmessage(IntToStr(FuncResult));         

Strange indeed.
x12 > x2 should give -1 with the normal strcmp, but in naturalstrcmp it gives the same.

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #19 on: May 12, 2015, 04:23:56 pm »
@typo... What are you planning on using this function for??

Does the strverscmp-function in libc.so not work for you?
(It's the same function used internally for versionsort, which in turn is used to sort directory entries)
http://man7.org/linux/man-pages/man3/strverscmp.3.html

Code: [Select]
function strverscmp(__s1:Pchar; __s2:Pchar):longint;cdecl;external 'libc.so' name 'strverscmp';

procedure TForm1.Button1Click(Sender: TObject);
var
  Rs: Integer;
begin

  // Result = 1
  Rs := strverscmp(pChar('x12'), pChar('x2'));
  Showmessage(IntToStr(Rs));

  // Result = 1
  Rs := strverscmp(pChar('x3'), pChar('x2'));
  Showmessage(IntToStr(Rs));

  // Result = -1
  Rs := strverscmp(pChar('a12'), pChar('b2'));
  Showmessage(IntToStr(Rs));

  // Result = 1
  Rs := strverscmp(pChar('c3'), pChar('b2'));
  Showmessage(IntToStr(Rs));

end;

For me, this gives the correct results.
(But maybe you have some special strings which strverscmp can't handle ??)

At least with strverscmp you won't need the alliance library :)
« Last Edit: May 12, 2015, 04:26:47 pm by rvk »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #20 on: May 12, 2015, 04:49:14 pm »
In my language, Portuguese, it sorts in a way that sends all accented chars to the end of the list, which is alphabetically incorrect.
« Last Edit: May 12, 2015, 04:55:04 pm by typo »

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #21 on: May 12, 2015, 05:01:51 pm »
You could try to translate the function to clean pascal. In that case you can adjust it to your liking.

Or was there a special reason you wanted it from a (standard) library?

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #22 on: May 12, 2015, 05:13:51 pm »
I am trying to improve my unit NaturalSort.

I will try to translate this function, which is a problem by itself, indeed.

I would like to find a system/language-aware natural compare function in Linux, with the same result (or better, since it has at least one bug) as StrCmpLogicalW in Windows.
« Last Edit: May 12, 2015, 05:18:25 pm by typo »

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #23 on: May 12, 2015, 05:28:10 pm »
Well, there are already lots ready-made solutions for pascal. Even in the topic from last year (in which I thought you already had a pascal-solution).
(Edit: Ah, you already mentioned that)

Another one I found (for plain ascii) which you could change:
Code: [Select]
uses Math;

function CompareStr(Str1, Str2: string): integer;
var
  Num1, Num2: double;
  pStr1, pStr2: PChar;
  Len1, Len2: integer;

  function IsNumber(ch: char): boolean;
  begin
    Result := ch in ['0'..'9'];
  end;

  function GetNumber(var pch: PChar; var Len: integer): double;
  var
    FoundPeriod: boolean;
    Count: integer;
  begin
    FoundPeriod := False;
    Result := 0;
    while (pch^ <> #0) and (IsNumber(pch^) or
        ((not FoundPeriod) and (pch^ = '.'))) do
    begin
      if pch^ = '.' then
      begin
        FoundPeriod := True;
        Count := 0;
      end
      else
      begin
        if FoundPeriod then
        begin
          Inc(Count);
          Result := Result + (Ord(pch^) - Ord('0')) * Power(10, -Count);
        end
        else
          Result := Result * 10 + Ord(pch^) - Ord('0');
      end;
      Inc(Len);
      Inc(pch);
    end;
  end;

begin
  if (Str1 <> '') and (Str2 <> '') then
  begin
    pStr1 := @Str1[1];
    pStr2 := @Str2[1];
    Result := 0;
    while not ((pStr1^ = #0) or (pStr2^ = #0)) do
    begin
      Len1 := 0;
      Len2 := 0;
      while (pStr1^ = ' ') do
      begin
        Inc(pStr1);
        Inc(Len1);
      end;
      while (pStr2^ = ' ') do
      begin
        Inc(pStr2);
        Inc(Len2);
      end;
      if IsNumber(pStr1^) and IsNumber(pStr2^) then
      begin
        Num1 := GetNumber(pStr1, Len1);
        Num2 := GetNumber(pStr2, Len2);
        if Num1 < Num2 then
          Result := -1
        else if Num1 > Num2 then
          Result := 1
        else
        begin
          if Len1 < Len2 then
            Result := -1
          else if Len1 > Len2 then
            Result := 1;
        end;
        Dec(pStr1);
        Dec(pStr2);
      end
      else if pStr1^ <> pStr2^ then
      begin
        if pStr1^ < pStr2^ then
          Result := -1
        else
          Result := 1;
      end;
      if Result <> 0 then
        Break;
      Inc(pStr1);
      Inc(pStr2);
    end;
  end;
  Num1 := length(Str1);
  Num2 := length(Str2);
  if (Result = 0) and (Num1 <> Num2) then
  begin
    if Num1 < Num2 then
      Result := -1
    else
      Result := 1;
  end;
end;
(from here)

I would like to find a system/language-aware natural compare function in Linux, with the same result (or better, since it has at least one bug) as StrCmpLogicalW in Windows.
In that case you could also make a hybrid-function. One that splits the string into parts (alpha and numeric) and use the system-functions (StrCmpLogicalW on Windows and strcoll on Linux)
« Last Edit: May 12, 2015, 05:32:31 pm by rvk »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #24 on: May 12, 2015, 05:59:33 pm »
Yes, a hybrid function, this is what I want to write, with numerical parts being handled by my function and alphabetical parts being handled by a language-aware function.
« Last Edit: May 12, 2015, 06:05:29 pm by typo »

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #25 on: May 12, 2015, 07:14:32 pm »
Was the bug in StrCmpLogicalW only for the combination alpha/numeric or is the bug also present with just sorting letters?

And I'm not sure if CompareString-api (Windows) and strcoll (Linux) handle the ordering the same. (we should test that before using that api) Otherwise the function should just sort in pascal after querying the Locale (which is probably not easy to do).

What do you consider the right order (1 or 2)?
1) e1 é1 e2 é2
2) e1 e2 é1 é2

(I think CompareString sees 1 as correct, which would be according to  linguistic rules, so é en e are equal)

Also... I saw in lazutf8.pas a function UTF8CompareStrCollated. It calls AnsiCompareStr for Windows (which should/could be locale-aware) but calls WideCompareStr for Linux (which I don't think is locale-aware). So in Linux it does not do as advertised.

Edit: Oo, wait
" result:=wcscoll(pwchar_t(hs1),pwchar_t(hs2));   "
It is locale aware in Linux (see the wcscoll, unicode aware locale-sort)

So you can just use UTF8CompareStrCollated for the text-part and you're good to go in Linux and Windows :) :)
(You just need to write the splitter and comparer)
« Last Edit: May 12, 2015, 07:25:44 pm by rvk »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #26 on: May 12, 2015, 07:26:56 pm »
I consider the first sequence (1) the correct one.

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #27 on: May 12, 2015, 07:28:57 pm »
I consider the first sequence (1) the correct one.
In that case you need to look if UTF8CompareStrCollated in lazutf8.pas works correctly on both platform. (It does on Windows, just tested it there) You can use that function in your overall function. Then there is no need to do api-calls yourself because the UTF8CompareStrCollated does it for you. Just need to split the alpha/numeric parts and start comparing from start.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: naturalstrcmp function
« Reply #28 on: May 12, 2015, 08:40:45 pm »
Unfortunatelly UTF8CompareStrCollated uses

Code: [Select]
AnsiCompareStr(UTF8ToSys(S1), UTF8ToSys(S2));

in order to make the collation and it results in a random order for accented chars in my language.
« Last Edit: May 12, 2015, 08:42:18 pm by typo »

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: naturalstrcmp function
« Reply #29 on: May 12, 2015, 09:02:26 pm »
Could you give an example list of same strings (with accented characters) and how you want to order then (for testing)?