Recent

Author Topic: Contest: fastest IsAnagram function  (Read 17966 times)

delphius

  • Jr. Member
  • **
  • Posts: 83
Re: Contest: fastest IsAnagram function
« Reply #105 on: October 25, 2024, 08:26:09 pm »
@delphius: Sure, but test which one is faster. Because I did and oddly it wasnt.
In my case it's faster about 100ms  ::)

Code: Pascal  [Select][+][-]
  1. function IsAnagram_delphius(const S1, S2: string; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
  2. const
  3.   zeroFreq: array[32..128] of Byte = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  4.                                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  5.                                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  6.                                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  7.                                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  8.                                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
  9. var
  10.   s1c, s1e, s2c, s2e: PChar;
  11.   freq: array[32..128] of Byte;
  12.   ch: Byte;
  13. begin
  14.   if not IgnoreSpaces and (Length(S1) <> Length(S2)) then
  15.     Exit(False);
  16.  
  17.   s1c := PChar(S1);
  18.   s1e := s1c + Length(S1);
  19.   s2c := PChar(S2);
  20.   s2e := s2c + Length(S2);
  21.  
  22.   FillQWord(freq, 12, 0);
  23.  
  24.   while s1c < s1e do
  25.   begin
  26.     ch := Ord(s1c^);
  27.     case ch of
  28.       32: if not IgnoreSpaces then Inc(freq[ch]);
  29.       65..90: Inc(freq[ch or $20]);
  30.       33..64, 91..127: Inc(freq[ch]);
  31.       else
  32.         if ExceptionOnError then
  33.           raise ERangeError.CreateFmt('Illegal character in S1 at position %d', [s1c - PChar(S1) + 1])
  34.             else Exit(False);
  35.     end;
  36.     Inc(s1c);
  37.   end;
  38.  
  39.   while s2c < s2e do
  40.   begin
  41.     ch := Ord(s2c^);
  42.     case ch of
  43.       32: if not IgnoreSpaces then Dec(freq[ch]);
  44.       65..90: Dec(freq[ch or $20]);
  45.       33..64, 91..127: Dec(freq[ch]);
  46.       else
  47.         if ExceptionOnError then
  48.           raise ERangeError.CreateFmt('Illegal character in S2 at position %d', [s2c - PChar(S2) + 1])
  49.             else Exit(False);
  50.     end;
  51.     Inc(s2c);
  52.   end;
  53.  
  54.   Result := CompareDWord(freq[32], zeroFreq[32], 24) = 0;
  55. end;
  56.  

Code: Pascal  [Select][+][-]
  1. *** ROUND 1 ***
  2. s1 = s tate
  3. s2 = t a  s t  e
  4.  
  5.             delphius IgnoreSpaces ExceptionOnError |  1531 ms | result 25000000
  6.                              delphius IgnoreSpaces |  1578 ms | result 25000000
  7.                                           delphius |   125 ms | result 0
  8.           Fibonacci2 IgnoreSpaces ExceptionOnError |  2235 ms | result 25000000
  9.                            Fibonacci2 IgnoreSpaces |  2250 ms | result 25000000
  10.                                         Fibonacci2 |   125 ms | result 0
  11.  
« Last Edit: October 25, 2024, 08:48:24 pm by delphius »
fpmtls - ssl/tls 1.3 implementation in pure pascal
fpmailsend - sending a simple email message
pascal-webui - use web browser as gui and fpc as backend

paweld

  • Hero Member
  • *****
  • Posts: 1560
Re: Contest: fastest IsAnagram function
« Reply #106 on: October 25, 2024, 09:13:10 pm »
Sorry: fails validitytest.
Code: [Select]
TestValidity for Paweld
FAIL to detect invalid character #0
Validitycheck FAIL for Paweld
Fixed:
Code: Pascal  [Select][+][-]
  1.     function IsAnagram_paweld(const S1, S2: String; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
  2.     var
  3.       i: Integer;
  4.       carr: array [32..127] of Byte;
  5.       b: Byte;
  6.     begin
  7.       Result := False;
  8.       FillChar(carr, SizeOf(carr), 0);
  9.      
  10.       for i := 1 to Length(S1) do
  11.       begin
  12.         b := ord(S1[i]);
  13.         case b of
  14.           32..64, 91..127: Inc(carr[b]);
  15.           65..90: Inc(carr[b + 32]);
  16.           else
  17.             if ExceptionOnError then
  18.               raise Exception.CreateFmt('IsAnagram: illegal character in S1 at position %d',[i])  
  19.  
  20.             else
  21.               exit;
  22.         end;
  23.       end;
  24.      
  25.       for i := 1 to Length(S2) do
  26.       begin
  27.         b := ord(S2[i]);
  28.         case b of
  29.           32..64, 91..127: Dec(carr[b]);
  30.           65..90: Dec(carr[b + 32]);
  31.           else
  32.             if ExceptionOnError then
  33.               raise Exception.CreateFmt('IsAnagram: illegal character in S1 at position %d',[i])  
  34.  
  35.            else
  36.               exit;
  37.         end;
  38.       end;
  39.      
  40.       if IgnoreSpaces then
  41.         carr[32] := 0;
  42.      
  43.       for i := Low(carr) to High(carr) do
  44.       begin
  45.         if carr[i] <> 0 then
  46.           exit;
  47.       end;
  48.      
  49.       Result := True;
  50.     end;  
  51.  
Best regards / Pozdrawiam
paweld

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Contest: fastest IsAnagram function
« Reply #107 on: October 25, 2024, 09:33:53 pm »
Slightly adjusted version:
Code: Pascal  [Select][+][-]
  1. function IsAnagram_warf(const S1, S2: String; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
  2. var
  3.   s1c, s1e, s2c, s2e: PChar;
  4.   ctr : array[31..127] of Word;
  5.   pcmp:PSizeInt;
  6. begin
  7.   s1c:=pchar(s1);
  8.   s1e:=s1c+length(s1);
  9.   s2c:=pchar(s2);
  10.   s2e:=s2c+length(s2);
  11.   pcmp:=@ctr;
  12.   while pcmp<@ctr+sizeof(ctr) do
  13.   begin
  14.     pcmp^:=0;
  15.     inc(pcmp);
  16.   end;
  17.   repeat
  18.     if ignorespaces then
  19.     begin
  20.       while (s1c<s1e) and (s1c^=#32) do
  21.         inc(s1c);
  22.       while (s2c<s2e) and (s2c^=#32) do
  23.         inc(s2c);
  24.     end;
  25.     if (s1c=s1e) or (s2c=s2e) then
  26.       break;
  27.     if not (s1c^ in [#32..#127]) then
  28.       if ExceptionOnError then
  29.         raise ERangeError.CreateFmt('IsAnagram: illegal character in S1 %c', [s1c^])
  30.       else Exit(False);
  31.     if not (s2c^ in [#32..#127]) then
  32.       if ExceptionOnError then
  33.         raise ERangeError.CreateFmt('IsAnagram: illegal character in S2 %c', [s2c^])
  34.       else Exit(False);
  35.     inc(ctr[Ord(s1c^) or $20*ord(s1c^ in [#65..#90])]);
  36.     dec(ctr[Ord(s2c^) or $20*ord(s2c^ in [#65..#90])]);
  37.     inc(s1c);
  38.     inc(s2c);
  39.   until false;
  40.   pcmp:=@ctr;
  41.   result:=(s1c=s1e) and (s2c=s2e);
  42.   while result and (pcmp<@ctr+sizeof(ctr)) do
  43.   begin
  44.     result:=pcmp^=0;
  45.     inc(pcmp);
  46.   end;
  47. end;
« Last Edit: October 25, 2024, 10:10:21 pm by Warfley »

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #108 on: October 25, 2024, 10:05:03 pm »
Slightly adjusted version:

Still fails:
Code: [Select]
TestValidity for Warfly
FAIL to detect invalid character #0
Validitycheck FAIL for Warfly

Bart

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Contest: fastest IsAnagram function
« Reply #109 on: October 25, 2024, 10:10:34 pm »
Fixed it, but have you posted the tests somehwere? Because this back and forth is annoying. I'd rather just plug the tests myself.

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #110 on: October 25, 2024, 10:11:17 pm »
Sorry: fails validitytest.
Code: [Select]
TestValidity for Paweld
FAIL to detect invalid character #0
Validitycheck FAIL for Paweld
Fixed:

Indeed.

Current standings:
ASerge, Paweld and BrunoK about the same speed (faster than my original one by appr a factor 1.5.
Note that rankings betweem then vary upon run.
So, basically too close to call.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #111 on: October 25, 2024, 10:15:22 pm »
Fixed it, but have you posted the tests somehwere? Because this back and forth is annoying. I'd rather just plug the tests myself.

Post 97 (https://forum.lazarus.freepascal.org/index.php/topic,69025.msg535203.html#msg535203)

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #112 on: October 25, 2024, 10:23:21 pm »
- There ought to be a compiler directive for alignment of procedures => set it to at least 32.
Happen to know which directive?

- Also, warm up each test once. So previous cache loads are less significant.

As in
-  run each one, then run speedtest on each one
or
- run one, then speedtest one, repat for all tests?

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #113 on: October 25, 2024, 10:27:10 pm »
Is it possible not to fill the second array with zeros, but just use const?
Code: Pascal  [Select][+][-]
  1. const
  2.   zeroFreq: array[0..255] of Byte = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  3.   <snip>
  4.  

Yes, but it makes no speed difference.
I have tested this.
Using the Default intrinsic is just as fast (and faster than FillChar in this particular case).
(I reported this before)

Bart

BrunoK

  • Hero Member
  • *****
  • Posts: 762
  • Retired programmer
Re: Contest: fastest IsAnagram function
« Reply #114 on: October 25, 2024, 10:29:23 pm »
My benchmark with routines that do pass without execution errors.

Some have been slightly changed to evaluate what factors may influence speed on my system. Win 10 usin QueryPerformance. Using GetTickCount makes thing a bit too dependent on eventual CPU context switches (ticks every 15.6 ms on my system).

Very good mention to entrant Delphius on my system. Code is simple understandable and fast.

Hope Delphius wont be upset with some trials I made with his code.

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #115 on: October 25, 2024, 10:30:27 pm »
Something like this...
Code: Pascal  [Select][+][-]
  1. function IsAnagram_delphius(const S1, S2: string; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
  2.  

Runs OK.
It's in the same realm as ASerge, BrunoK and Paweld.

Bart

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Contest: fastest IsAnagram function
« Reply #116 on: October 25, 2024, 10:44:57 pm »
Post 97 (https://forum.lazarus.freepascal.org/index.php/topic,69025.msg535203.html#msg535203)

Bart
Ok, my code now passes (edited above), last question, do you test on small or big anagrams? If it's less than 255 chars I can do an obvious optimization lol

Fibonacci

  • Hero Member
  • *****
  • Posts: 788
  • Internal Error Hunter
Re: Contest: fastest IsAnagram function
« Reply #117 on: October 26, 2024, 12:22:49 am »
Code: Text  [Select][+][-]
  1. IsAnagram_Fibonacci4 valid?
  2. TestValidity for Fibonacci4
  3. TRUE, TFuncRec.IsValid = TRUE

Code: Pascal  [Select][+][-]
  1. function IsAnagram_Fibonacci4(const s1, s2: string; IgnoreSpaces: boolean=true; ExceptionOnError: boolean=false): boolean;
  2. var
  3.   c, x1, x2: byte;
  4.   p1, p1e, p2, p2e: pchar;
  5. begin
  6.   if not IgnoreSpaces and (length(s1) <> length(s2)) then exit(false);
  7.  
  8.   p1 := @s1[1];
  9.   p1e := p1+length(s1);
  10.   x1 := 0;
  11.   while p1 < p1e do begin
  12.     c := pbyte(p1)^;
  13.     case c of
  14.       32: if not IgnoreSpaces then x1 := x1 xor c;
  15.       65..90: x1 := x1 xor (c or $20);
  16.       33..64, 91..127: x1 := x1 xor c;
  17.       else
  18.         if ExceptionOnError then
  19.         raise ERangeError.Create('Illegal character in S1')
  20.         else exit(false);
  21.     end;
  22.     inc(p1);
  23.   end;
  24.  
  25.   p2 := @s2[1];
  26.   p2e := p2+length(s2);
  27.   x2 := 0;
  28.   while p2 < p2e do begin
  29.     c := pbyte(p2)^;
  30.     case c of
  31.       32: if not IgnoreSpaces then x2 := x2 xor c;
  32.       65..90: x1 := x1 xor (c or $20);
  33.       33..64, 91..127: x2 := x2 xor c;
  34.       else
  35.         if ExceptionOnError then
  36.         raise ERangeError.Create('Illegal character in S2')
  37.         else exit(false);
  38.     end;  
  39.     inc(p2);
  40.   end;
  41.  
  42.   result := x1 = x2;
  43. end;

Code: Text  [Select][+][-]
  1. *** ROUND 1 ***
  2. s1 = s tate
  3. s2 = t a  s t  e
  4.  
  5.             delphius IgnoreSpaces ExceptionOnError |  1047 ms | result 25000000
  6.                              delphius IgnoreSpaces |  1047 ms | result 25000000
  7.                                           delphius |    79 ms | result 0
  8.            
  9.          Fibonacci-4 IgnoreSpaces ExceptionOnError |   766 ms | result 25000000
  10.                           Fibonacci-4 IgnoreSpaces |   921 ms | result 25000000
  11.                                        Fibonacci-4 |    78 ms | result 0

Code: Pascal  [Select][+][-]
  1.   writeln('valid             = ', IsAnagram_Fibonacci4('nightio', 'thingio'));
  2.   writeln('valid (state)     = ', IsAnagram_Fibonacci4('St a    te   ', 'tas t e'));
  3.   writeln('valid w/ spaces   = ', IsAnagram_Fibonacci4('n i g h t i o', 'thing io'));
  4.   writeln('valid w/ var case = ', IsAnagram_Fibonacci4('nigHTIO', 'ThingiO'));
  5.   writeln('invalid           = ', IsAnagram_Fibonacci4('nightio', 'thingyo'));
  6.   writeln('invalid           = ', IsAnagram_Fibonacci4(#1'nightio', #1'thingio'));

Code: Text  [Select][+][-]
  1. valid             = TRUE
  2. valid (state)     = TRUE
  3. valid w/ spaces   = TRUE
  4. valid w/ var case = TRUE
  5. invalid           = FALSE
  6. invalid           = FALSE

Yes, collision possible :D Find one.
« Last Edit: October 26, 2024, 12:27:59 am by Fibonacci »

bytebites

  • Hero Member
  • *****
  • Posts: 776
Re: Contest: fastest IsAnagram function
« Reply #118 on: October 26, 2024, 12:30:50 am »
From Wikipedia
Quote
Eleven plus two — Twelve plus one
Clint Eastwood — Old West action

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Contest: fastest IsAnagram function
« Reply #119 on: October 26, 2024, 12:31:25 am »
Yes, collision possible :D Find one.
Easy:
Code: Pascal  [Select][+][-]
  1. var
  2.   s1, s2: String;
  3. begin
  4.   s1 := #42#64;
  5.   s2 := #106; // 42 Or 64
  6.   WriteLn(IsAnagram_Fibonacci4(s1, s2)); // TRUE
  7. end.

You could instead of byte use a bigger type like sizeint and use a (non cryptographic) hash to reduce the collisions. Then you basically implemented a bloom filter
« Last Edit: October 26, 2024, 12:33:17 am by Warfley »

 

TinyPortal © 2005-2018