Recent

Author Topic: Contest: fastest IsAnagram function  (Read 17965 times)

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #135 on: October 26, 2024, 10:16:25 pm »
I am replacing the previous submissions with the new one.

The one in the zip looks like Martin's implementation?

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #136 on: October 26, 2024, 10:20:16 pm »
Here is another version:

Code: Pascal  [Select][+][-]
  1. function IsAnagram_benibela(const S1, S2: String; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
  2.  

Fastest one so far in my benchmark (dir not yet test avk's new one).

Bart
« Last Edit: October 26, 2024, 10:22:01 pm by Bart »

Josh

  • Hero Member
  • *****
  • Posts: 1445
Re: Contest: fastest IsAnagram function
« Reply #137 on: October 26, 2024, 11:40:15 pm »
saw interesting topic, so heres my code.

I could not find test data, so added some in Anagrams Array.

I have not looked at whole thread...

No idea how fast my routine is on others

Not sure whats doing with Character case, so added option to IgnoreCase.

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   {$IFDEF UNIX}
  7.   cthreads,
  8.   {$ENDIF}
  9.   SysUtils
  10.   { you can add units after this };
  11.  
  12. Type
  13.   TByArr=Array[0..255] of Byte;
  14. Const
  15.   Iterations=500000;
  16.  
  17. Var
  18.   defShortString:ShortString;
  19.   AnagramsA:Array[0..9] of ShortString=('big bend national park',
  20.                                         'carry on laughing',
  21.                                         'george clooney',
  22.                                         'george and mildred',
  23.                                         'the addams family',
  24.                                         'miss neve campbell',
  25.                                         'the comic strip presents',
  26.                                         'kansas city',
  27.                                         'david coleman',
  28.                                         'peter osullevan');
  29.   AnagramsB:Array[0..9] of ShortString=('abandoning bleak trip',
  30.                                          'Ouch An Angry Girl',
  31.                                          'cool energy ego',
  32.                                          'old deranged grime',
  33.                                          'my dismal fathead',
  34.                                          'even climbs maples',
  35.                                          'Cheers Impotent Scripts',
  36.                                          'satanic sky',
  37.                                          'acid lad venom',
  38.                                          'pale volunters');  // this is false spelling to generate a false
  39.   st:qword;
  40.   IterationCounter,AnagramsLoop:longint;
  41.  
  42. function IsAnAnagramJosh(const s1,s2:shortstring;IgnoreSpaces: Boolean = True; IgnoreCase: Boolean = True;ExceptionOnError: Boolean = False):boolean;
  43. var
  44.   S1b:TByArr absolute s1;
  45.   s2b:TByArr absolute s2;
  46.   r1,r2:shortstring;
  47.   r1b:TByArr absolute r1;
  48.   r2b:TByArr absolute r2;
  49.   k:integer;
  50.   errstr:shortstring;
  51.  
  52. procedure InsertIntoArray(var arr:TByArr;va:byte);Inline;
  53. var
  54.   Loop:integer;
  55.   insloc:integer=1;
  56. begin
  57.   if ((IgnoreSpaces) and (Va=$20)) then exit;
  58.   if (((va<$20) or (va>$7f)) and (ExceptionOnError)) then Raise Exception.Create('Invalid character in : '+errstr+' @ loc :'+inttostr(k));
  59.   if IgnoreCase then va:=va or $20; // upcase byte;
  60.   if arr[0]=0 then
  61.   begin
  62.     arr[0]:=1;
  63.     arr[1]:=va;
  64.   end
  65.   else
  66.   begin
  67.     arr[0]:=arr[0]+1;
  68.     insloc:=arr[0];
  69.     for Loop:=1 to arr[0]-1 do
  70.       if va<=arr[Loop] then
  71.       begin
  72.         insloc:=Loop;
  73.         break;
  74.       end;
  75.     Move(arr[insloc], arr[insloc+1], SizeOf(TByArr)-insloc-1);
  76.     arr[insloc]:=va;
  77.   end;
  78. end;
  79.  
  80. begin
  81.  r1:=defShortString;
  82.  r2:=defShortString;
  83.  r1:='';r2:='';
  84.  errstr:=s1;
  85.  for k:=1 to s1b[0] do InsertIntoArray(r1b,s1b[k]);
  86.  errstr:=s2;
  87.  for k:=1 to s2b[0] do InsertIntoArray(r2b,s2b[k]);
  88.  result:=r1=r2;
  89. end;
  90.  
  91. begin
  92.   fillchar(defShortString,255,#255);
  93.   writeln('RETURN to START');
  94.   ReadLn;
  95.   st:=gettickcount64;
  96.   for AnagramsLoop:= Low(AnagramsA) to High(AnagramsA) do
  97.     for IterationCounter:=0 to Iterations do IsAnAnagramJosh(AnagramsA[AnagramsLoop],AnagramsB[AnagramsLoop]);
  98.   writeln('Finished '+inttostr(Iterations*(high(AnagramsA)+1))+' Done in - '+inttostr(gettickcount64-st));
  99.   ReadLn;
  100. end.
  101.  
« Last Edit: October 27, 2024, 12:03:59 am by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #138 on: October 27, 2024, 12:22:16 am »
saw interesting topic, so heres my code.

You cheated by using shortstrings, I corrected that.
Also the IsAnagram function is supposed to be always case insensitive, corrected that also.

Alas, you code fails the validity test:
Code: [Select]
TestValidity for Josh
FAIL: valid anagram rejected (with IgnoreSpaces=TRUE):
S1: "1234567890"
S2: "0 1 2 3 4 5 6 7 8 9 "
Validitycheck FAIL for Josh

Bart

Josh

  • Hero Member
  • *****
  • Posts: 1445
Re: Contest: fastest IsAnagram function
« Reply #139 on: October 27, 2024, 12:45:33 am »
I think you must have messed something up,
it passes test

never seen an anagram greater than 240 chars...

if u remove the if IgnoreCase it defaults to ignoring case

Code: Pascal  [Select][+][-]
  1. procedure InsertIntoArray(var arr:TByArr;va:byte);Inline;
  2. var
  3.   Loop:integer;
  4.   insloc:integer=1;
  5. begin
  6.   if ((IgnoreSpaces) and (Va=$20)) then exit;
  7.   if (((va<$20) or (va>$7f)) and (ExceptionOnError)) then Raise Exception.Create('Invalid character in : '+errstr+' @ loc :'+inttostr(k));
  8.   // if IgnoreCase then
  9.   va:=va or $20; // upcase byte;
« Last Edit: October 27, 2024, 12:58:50 am by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

TRon

  • Hero Member
  • *****
  • Posts: 4377
Today is tomorrow's yesterday.

avk

  • Hero Member
  • *****
  • Posts: 825
Re: Contest: fastest IsAnagram function
« Reply #141 on: October 27, 2024, 07:07:52 am »
I am replacing the previous submissions with the new one.

The one in the zip looks like Martin's implementation?

Bart

It's in the main program text.
But if partially adopted BeniBela's idea, so there is a new version:
Code: Pascal  [Select][+][-]
  1. function IsAnagram_Avk(const s1, s2: string; aIgnoreSpaces: Boolean = True; aExceptionOnError: Boolean = False): Boolean;
  2. type
  3. {$IFDEF CPU64}
  4.   TChunk = QWord;
  5. {$ELSE}
  6.   TChunk = DWord;
  7. {$ENDIF}
  8.   PChunk = ^TChunk;
  9. const
  10. {$IFDEF FPC_REQUIRES_PROPER_ALIGNMENT}
  11.   {$IFDEF CPU64}
  12.   MASK = 7;
  13.   {$ELSE}
  14.   MASK = 3;
  15.   {$ENDIF}
  16. {$ENDIF}
  17. {$IFDEF CPU64}
  18.   BITS5 = QWord($2020202020202020);
  19.   BITS6 = QWord($4040404040404040);
  20.   BITS7 = QWord($8080808080808080);
  21. {$ELSE}
  22.   BITS5 = DWord($20202020);
  23.   BITS6 = DWord($40404040);
  24.   BITS7 = DWord($80808080);
  25. {$ENDIF}
  26.   ERR_FMT = 'Illegal character in %s, position %d(#%d)';
  27.  
  28.   function Invalid(aOfs: Integer; aFirstArg: Boolean): Boolean;
  29.   begin
  30.     Invalid := False;
  31.     if aExceptionOnError then
  32.       if aFirstArg then
  33.         raise ERangeError.CreateFmt(ERR_FMT, ['s1', aOfs, Ord(s1[aOfs])])
  34.       else
  35.         raise ERangeError.CreateFmt(ERR_FMT, ['s2', aOfs, Ord(s2[aOfs])])
  36.   end;
  37.  
  38.   function InvalidUp(aFlags: TChunk; aOfs: Integer; aFirstArg: Boolean): Boolean;
  39.   begin
  40.     {$IFDEF ENDIAN_BIG}aFlags := SwapEndian(aFlags);{$ENDIF}
  41.     Inc(aOfs,{$IFDEF CPU64}BsfQWord{$ELSE}BsfDWord{$ENDIF}(aFlags) div 8);
  42.     Result := Invalid(aOfs, aFirstArg);
  43.   end;
  44.  
  45.   function InvalidLo(aFlags: TChunk; aOfs: Integer; aFirstArg: Boolean): Boolean;
  46.   begin
  47.     {$IFDEF ENDIAN_BIG}aFlags := SwapEndian(aFlags);{$ENDIF}
  48.     Inc(aOfs,{$IFDEF CPU64}BsfQWord{$ELSE}BsfDWord{$ENDIF}(aFlags xor BITS5) div 8);
  49.     Result := Invalid(aOfs, aFirstArg);
  50.   end;
  51.  
  52. type
  53. {$IFDEF CPU64}
  54.   TShortBuffer = array[0..8] of QWord;
  55. {$ELSE}
  56.   TShortBuffer = array[0..17] of DWord;
  57. {$ENDIF}
  58. const
  59. {$PUSH}{$J-}
  60. {$IFDEF CPU64}
  61.   ZERO_BUF: TShortBuffer = (0,0,0,0,0,0,0,0,0);
  62. {$ELSE}
  63.   ZERO_BUF: TShortBuffer = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0);
  64. {$ENDIF}
  65.   CI_MAP: array[32..127] of Byte = (
  66.     32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
  67.     52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
  68.     72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
  69.     92, 93, 94, 95, 96, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
  70.     80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 97, 98, 99, 100, 101
  71.   );
  72. {$POP}
  73.   function TestShort(const s1, s2: string): Boolean;
  74.   var
  75.     Buf: TShortBuffer;
  76.     Counter: array[32..101] of ShortInt absolute Buf;
  77.     p, pEnd: PByte;
  78.   begin
  79.     Buf := ZERO_BUF;
  80.  
  81.     p := PByte(s1);
  82.     PEnd := p + Length(s1);
  83.   {$IFDEF FPC_REQUIRES_PROPER_ALIGNMENT}
  84.     while (p < pEnd) and (SizeUInt(p) and MASK <> 0) do begin
  85.       if p^ in [32..127] then Inc(Counter[CI_MAP[p^]])
  86.       else exit(Invalid(Succ(p - PByte(s1)), True));
  87.       Inc(p);
  88.     end;
  89.   {$ENDIF}
  90.     while p < pEnd - SizeOf(TChunk) do begin
  91.       if PChunk(p)^ and BITS7 <> 0 then
  92.         exit(InvalidUp(PChunk(p)^ and BITS7, p-PByte(s1)+1, True));
  93.       if (PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1) < BITS5 then
  94.         exit(InvalidLo((PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1), p-PByte(s1)+1, True));
  95.       Inc(Counter[CI_MAP[p[0]]]); Inc(Counter[CI_MAP[p[1]]]);
  96.       Inc(Counter[CI_MAP[p[2]]]); Inc(Counter[CI_MAP[p[3]]]);
  97.     {$IFDEF CPU64}
  98.       Inc(Counter[CI_MAP[p[4]]]); Inc(Counter[CI_MAP[p[5]]]);
  99.       Inc(Counter[CI_MAP[p[6]]]); Inc(Counter[CI_MAP[p[7]]]);
  100.     {$ENDIF}
  101.       Inc(p, SizeOf(TChunk));
  102.     end;
  103.     while p < pEnd do begin
  104.       if p^ in [32..127] then Inc(Counter[CI_MAP[p^]])
  105.       else exit(Invalid(Succ(p - PByte(s1)), True));
  106.       Inc(p);
  107.     end;
  108.  
  109.     p := PByte(s2);
  110.     PEnd := p + Length(s2);
  111.   {$IFDEF FPC_REQUIRES_PROPER_ALIGNMENT}
  112.     while (p < pEnd) and (SizeUInt(p) and MASK <> 0) do begin
  113.       if p^ in [32..127] then Dec(Counter[CI_MAP[p^]])
  114.       else exit(Invalid(Succ(p - PByte(s2)), False));
  115.       Inc(p);
  116.     end;
  117.   {$ENDIF}
  118.     while p < pEnd - SizeOf(TChunk) do begin
  119.       if PChunk(p)^ and BITS7 <> 0 then
  120.         exit(InvalidUp(PChunk(p)^ and BITS7, p-PByte(s2)+1, False));
  121.       if (PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1) < BITS5 then
  122.         exit(InvalidLo((PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1), p-PByte(s2)+1, False));
  123.       Dec(Counter[CI_MAP[p[0]]]); Dec(Counter[CI_MAP[p[1]]]);
  124.       Dec(Counter[CI_MAP[p[2]]]); Dec(Counter[CI_MAP[p[3]]]);
  125.     {$IFDEF CPU64}
  126.       Dec(Counter[CI_MAP[p[4]]]); Dec(Counter[CI_MAP[p[5]]]);
  127.       Dec(Counter[CI_MAP[p[6]]]); Dec(Counter[CI_MAP[p[7]]]);
  128.     {$ENDIF}
  129.       Inc(p, SizeOf(TChunk));
  130.     end;
  131.     while p < pEnd do begin
  132.       if p^ in [32..127] then Dec(Counter[CI_MAP[p^]])
  133.       else exit(Invalid(Succ(p - PByte(s2)), False));
  134.       Inc(p);
  135.     end;
  136.  
  137.     if aIgnoreSpaces then Counter[32] := 0;
  138.   {$IFDEF CPU64}
  139.     if Buf[0] or Buf[1] or Buf[2] or Buf[3] or Buf[4] or
  140.        Buf[5] or Buf[6] or Buf[7] or Buf[8] <> 0 then exit(False);
  141.   {$ELSE}
  142.     if Buf[0] or Buf[1] or Buf[2] or Buf[3] or Buf[4] or Buf[5] or Buf[6] or Buf[7] or
  143.        Buf[8] or Buf[9] or Buf[10] or Buf[11] or Buf[12] or Buf[13] or Buf[14] or
  144.        Buf[15] or Buf[16] or Buf[17] <> 0 then exit(False);
  145.   {$ENDIF}
  146.     Result := True;
  147.   end;
  148.  
  149. const
  150.   SHORT  = 126;
  151. var
  152.   Counter: array[32..127] of Integer;
  153.   p, pEnd: PByte;
  154.   I: Integer;
  155. begin
  156.   if (s1 = '') or (s2 = '') then
  157.     exit(False);
  158.   if not aIgnoreSpaces and not aExceptionOnError and (Length(s1) <> Length(s2)) then
  159.     exit(False);
  160.   if Math.Max(Length(s1), Length(s2)) <= SHORT then
  161.     exit(TestShort(s1, s2));
  162.  
  163.   FillChar(Counter, SizeOf(Counter), 0);
  164.  
  165.   p := PByte(s1);
  166.   PEnd := p + Length(s1);
  167. {$IFDEF FPC_REQUIRES_PROPER_ALIGNMENT}
  168.   while (p < pEnd) and (SizeUInt(p) and MASK <> 0) do begin
  169.     if p^ in [32..127] then Inc(Counter[p^])
  170.     else exit(Invalid(Succ(p - PByte(s1)), True));
  171.     Inc(p);
  172.   end;
  173. {$ENDIF}
  174.   while p < pEnd - SizeOf(TChunk) do begin
  175.     if PChunk(p)^ and BITS7 <> 0 then
  176.       exit(InvalidUp(PChunk(p)^ and BITS7, p-PByte(s1)+1, True));
  177.     if (PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1) < BITS5 then
  178.       exit(InvalidLo((PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1), p-PByte(s1)+1, True));
  179.     Inc(Counter[p[0]]); Inc(Counter[p[1]]);
  180.     Inc(Counter[p[2]]); Inc(Counter[p[3]]);
  181.   {$IFDEF CPU64}
  182.     Inc(Counter[p[4]]); Inc(Counter[p[5]]);
  183.     Inc(Counter[p[6]]); Inc(Counter[p[7]]);
  184.   {$ENDIF}
  185.     Inc(p, SizeOf(TChunk));
  186.   end;
  187.   while p < pEnd do begin
  188.     if p^ in [32..127] then Inc(Counter[p^])
  189.     else exit(Invalid(Succ(p - PByte(s1)), True));
  190.     Inc(p);
  191.   end;
  192.   for I := 97 to 122 do
  193.     if Counter[I] <> 0 then begin
  194.       Inc(Counter[I-32], Counter[I]);
  195.       Counter[I] := 0;
  196.     end;
  197.  
  198.   p := PByte(s2);
  199.   PEnd := p + Length(s2);
  200. {$IFDEF FPC_REQUIRES_PROPER_ALIGNMENT}
  201.   while (p < pEnd) and (SizeUInt(p) and MASK <> 0) do begin
  202.     if p^ in [32..127] then Dec(Counter[p^])
  203.     else exit(Invalid(Succ(p - PByte(s2)), False));
  204.     Inc(p);
  205.   end;
  206. {$ENDIF}
  207.   while p < pEnd - SizeOf(TChunk) do begin
  208.     if PChunk(p)^ and BITS7 <> 0 then
  209.       exit(InvalidUp(PChunk(p)^ and BITS7, p-PByte(s2)+1, False));
  210.     if (PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1) < BITS5 then
  211.       exit(InvalidLo((PChunk(p)^ and BITS5) or ((PChunk(p)^ and BITS6) shr 1), p-PByte(s2)+1, False));
  212.     Dec(Counter[p[0]]); Dec(Counter[p[1]]);
  213.     Dec(Counter[p[2]]); Dec(Counter[p[3]]);
  214.   {$IFDEF CPU64}
  215.     Dec(Counter[p[4]]); Dec(Counter[p[5]]);
  216.     Dec(Counter[p[6]]); Dec(Counter[p[7]]);
  217.   {$ENDIF}
  218.     Inc(p, SizeOf(TChunk));
  219.   end;
  220.   while p < pEnd do begin
  221.     if p^ in [32..127] then Dec(Counter[p^])
  222.     else exit(Invalid(Succ(p - PByte(s2)), False));
  223.     Inc(p);
  224.   end;
  225.   for I := 97 to 122 do
  226.     if Counter[I] <> 0 then
  227.       Inc(Counter[I-32], Counter[I]);
  228.  
  229.   if aIgnoreSpaces then Counter[32] := 0;
  230.   I := 32;
  231.   while I < 95 do begin
  232.     if Counter[I] or Counter[I+1] or Counter[I+2] or Counter[I+3] <> 0 then
  233.       exit(False);
  234.     Inc(I, 4);
  235.   end;
  236.   if Counter[96] or Counter[123] or Counter[124] or Counter[125] or Counter[126] or Counter[127] <> 0 then
  237.     exit(False);
  238.   Result := True;
  239. end;
  240.  
« Last Edit: October 27, 2024, 09:17:39 am by avk »

Josh

  • Hero Member
  • *****
  • Posts: 1445
Re: Contest: fastest IsAnagram function
« Reply #142 on: October 27, 2024, 04:04:08 pm »
Hi Bart

Does the following code pass your test data?

Redone the logic and algo to use ansistring;

Ammended
Modified so that it can handle only alphanumeric or alphanumeric+all the other chars {+-<> etc
just comment the second line {$define OnlyAlphaNumericChars}

Code: Pascal  [Select][+][-]
  1. program project1;
  2. {$define OnlyAlphaNumericChars}// a-z,A-Z,' ',0--9
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   {$IFDEF UNIX}
  7.   cthreads,
  8.   {$ENDIF}
  9.   SysUtils;
  10.  
  11. const
  12.   Iterations = 5000000;
  13.   arlow=32;
  14.   arhigh=128;
  15.   mess_s1='Invalid character in input string at position %d in s1';
  16.   mess_s2='Invalid character in input string at position %d in s2';
  17.  
  18. var
  19.   Map: array[arlow..arhigh] of Byte;
  20.   AnagramsA: array[0..10] of AnsiString = ( 'BIG BEND NATIONAL PARK>,',
  21.                                             'carry on laughing',
  22.                                             'george clooney',
  23.                                             'george and mildred',
  24.                                             'the addams family',
  25.                                             'miss neve campbell',
  26.                                             'the comic strip presents',
  27.                                             'kansas city',
  28.                                             'david coleman',
  29.                                             '0123456789',
  30.                                             'peter osullevan');
  31.  
  32.   AnagramsB: array[0..10] of AnsiString = (  'abandoning bleak trip',
  33.                                              'Ouch An Angry Girl',
  34.                                              'cool energy ego',
  35.                                              'old deranged grime',
  36.                                              'my dismal fathead',
  37.                                              'even climbs maples',
  38.                                              'Cheers Impotent Scripts',
  39.                                              'satanic sky',
  40.                                              'acid lad venom',
  41.                                              '  1  4  7  8  5  2  3  6  9  0  ',
  42.                                              'pale volunters');
  43.  
  44. var
  45.   st: qword;
  46.   IterationLoop, AnagramsLoop: LongInt;
  47.   IsAnagramCounter: Int64 = 0;
  48.   Valid: Boolean;
  49.   sg: string;
  50.   DefFreqArray: array[arlow..arHigh] of LongInt;
  51.   IgnoredCounter:LongInt;
  52.  
  53. function IsAnAnagramJosh(const s1, s2: AnsiString; IgnoreSpaces: Boolean; ExceptionOnError: Boolean = False): boolean;  inline;
  54. var
  55.   k, l1, l2: LongInt;
  56.   ch: Byte;
  57.   FreqArray: array[arlow..arHigh] of LongInt;
  58. begin
  59.   Inc(IsAnagramCounter);
  60.   Move(DefFreqArray, FreqArray, SizeOf(DefFreqArray));
  61.   l1 := Length(s1);
  62.   l2 := Length(s2);
  63.   if IgnoreSpaces then
  64.   begin
  65.     for k := 1 to l1 do
  66.     begin
  67.       ch := Byte(s1[k]);
  68.       if ch = $20 then Continue;
  69.       if ExceptionOnError and ((ch < 32) or (ch > 127)) then raise Exception.CreateFmt(mess_s1, [k]);
  70.       FreqArray[Map[ch]]:=FreqArray[Map[ch]]+1;
  71.     end;
  72.     for k := 1 to l2 do
  73.     begin
  74.       ch := Byte(s2[k]);
  75.       if ch = $20 then Continue;
  76.       if ExceptionOnError and ((ch < 32) or (ch > 127)) then raise Exception.CreateFmt(mess_s2, [k]);
  77.       FreqArray[Map[ch]]:=FreqArray[Map[ch]]-1;
  78.       if FreqArray[Map[ch]]<0 then exit(false);
  79.     end;
  80.   end
  81.   else
  82.   begin
  83.     for k := 1 to l1 do
  84.     begin
  85.       ch := Byte(s1[k]);
  86.       if ExceptionOnError and ((ch < 32) or (ch > 127)) then raise Exception.CreateFmt(mess_s1, [k]);
  87.       FreqArray[Map[ch]]:=FreqArray[Map[ch]]+1;
  88.     end;
  89.     for k := 1 to l2 do
  90.     begin
  91.       ch := Byte(s2[k]);
  92.       if ExceptionOnError and ((ch < 32) or (ch > 127)) then raise Exception.CreateFmt(mess_s2, [k]);
  93.       FreqArray[Map[ch]]:=FreqArray[Map[ch]]-1;
  94.     end;
  95.   end;
  96.   for k := arlow to arhigh-1 do if FreqArray[k] <> 0 then Exit(False);
  97.   IgnoredCounter:=FreqArray[arhigh];
  98.   Result := True;
  99. end;
  100.  
  101. begin
  102.   for IterationLoop := arlow to arhigh do Map[IterationLoop] := arhigh;
  103.   map[$20]:=$20;
  104.   {$ifdef OnlyAlphaNumericChars}
  105.   for IterationLoop := $30 to $39 do Map[IterationLoop] := IterationLoop;
  106.   for IterationLoop := $41 to $5A do Map[IterationLoop] := IterationLoop;
  107.   for IterationLoop := $61 to $7A do Map[IterationLoop] := IterationLoop - $20;
  108.   {$else}
  109.   for IterationLoop := arlow+1 to arhigh-1 do Map[IterationLoop] := IterationLoop;
  110.   for IterationLoop := $61 to $7A do Map[IterationLoop] := IterationLoop - $20;
  111.   {$endif}
  112.   repeat
  113.     IsAnagramCounter:=0;
  114.     FillChar(DefFreqArray, SizeOf(DefFreqArray), 0);
  115.     writeln('RETURN to START');
  116.     ReadLn;
  117.     st := GetTickCount64;
  118.     for AnagramsLoop := Low(AnagramsA) to High(AnagramsA) do
  119.     begin
  120.       for IterationLoop := 0 to Iterations do
  121.       begin
  122.         Valid := IsAnAnagramJosh(AnagramsA[AnagramsLoop], AnagramsB[AnagramsLoop], True);
  123.         if IterationLoop = 0 then
  124.         begin
  125.           sg := ' = ';
  126.           if not Valid then
  127.           begin
  128.             sg := ' <> ';
  129.             write('Not ');
  130.           end;
  131.           WriteLn('Valid ', AnagramsA[AnagramsLoop], sg, AnagramsB[AnagramsLoop]{$ifdef OnlyAlphaNumericChars}+'     Non AlphaNumeric Chars Ignored :'+inttostr(IgnoredCounter){$endif});
  132.         end;
  133.       end;
  134.     end;
  135.     writeln('Finished ', '   Called :', IsAnagramCounter div 1000000, ' Million Times     Done in - ', GetTickCount64 - st, ' ms');
  136.     WriteLn('PRESS RETURN');
  137.     ReadLn;
  138.   until False;
  139. end.
  140.  
« Last Edit: October 27, 2024, 08:14:55 pm by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #143 on: October 27, 2024, 10:35:27 pm »
Does the following code pass your test data?

Sorry:
Code: [Select]
TestValidity for Josh
FAIL: erroneously returned as anagram, but they are not (with IgnoreSpaces=True):
S1: "1234567890"
S2: "0 1 2 3 4 5 6 7 8 9!"
Validitycheck FAIL for Josh

Can you please just post the IsAnagram function (and necessary var's, types etc), not an entire program?
Make copy/past/compile a lot easier for me.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #144 on: October 27, 2024, 10:38:11 pm »
It's in the main program text.
But if partially adopted BeniBela's idea, so there is a new version:
Code: Pascal  [Select][+][-]
  1. function IsAnagram_Avk(const s1, s2: string; aIgnoreSpaces: Boolean = True; aExceptionOnError: Boolean = False): Boolean;
  2. <snip>
  3.  

Passes validity test and fastes so far!

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #145 on: October 27, 2024, 10:47:40 pm »
So you can easily test you code against my set of (in)valid anagrams (I posted the TestValidity code earlier):

Valid anagrams my code uses for validity testing (IgnoreSpaces=True)
Code: [Select]
Valid Anagrams for testing
Nr: 0
S1=1234567890
S2=0 1 2 3 4 5 6 7 8 9

Nr: 1
S1= !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⌂
S2=⌂~}|{zyx wvutsrqpon mlkjihgfed cba`_^]\[Z YXWVUTSRQP ONMLKJIHGF EDCBA@?>=< ;:98765432 10/.-,+*)( '&%$#"!

Nr: 2
S1=abcd efgh ijkl mnop qrst uvwx yz
S2=QWERTYUIOP ASDFGHJKL ZXCVBNM

For IgnoreSpaces=False is tests the same set, bet removes all spaces from the strings (S1,S2) before testing them.

Invalid Anagrams for testing my code uses for validity testing (IgnoreSpaces=True)
Code: [Select]
Invalid Anagrams for testing
Nr: 0
S1=1234567890
S2=0 1 2 3 4 5 6 7 8 9!

Nr: 1
S1= !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⌂
S2=⌂~}|{zyx wvutsrqpon mlkjihgfed cba`_^]\[Z YXWVUTSRQP ONMLKJIHGF EDCBA@?>=< ;:98765432 10/.-,+*)( '&%$#"!!

Nr: 2
S1=abcd efgh ijkl mnop qrst uvwx yz
S2=QWERTYUIOP ASDFGHJKL ZXCVBNN

Nr: 3  //happens if you do the or-ing with $20 instead of LowerCase-ing
S1=@[\]^_
S2=`{|}~⌂

It then procedes to test any illegal character (#0..#31, #128..#255) to see if the function returns False (ExceptionOnError=False).

Bart
« Last Edit: October 27, 2024, 10:50:45 pm by Bart »

Josh

  • Hero Member
  • *****
  • Posts: 1445
Re: Contest: fastest IsAnagram function
« Reply #146 on: October 28, 2024, 12:23:56 am »
Added Project Zip.

Note i have split the IsAnAnagram into a self contained Unit

So if I have done it correct you should just include the unit named josh_unit,
and call
IsAnAnagramJosh(S1,S2,True/False,True/False);

nb. In the unit there is a define that is commented out, if left in then it ignores any non alphaNumeric Characters, as that is what I think Anagrams are.
« Last Edit: October 28, 2024, 12:30:29 am by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

Josh

  • Hero Member
  • *****
  • Posts: 1445
Re: Contest: fastest IsAnagram function
« Reply #147 on: October 28, 2024, 11:14:48 am »
Optimized my unit code

Code: Pascal  [Select][+][-]
  1. unit josh_unit;
  2. //{$define OnlyAlphaNumericChars}// a-z,A-Z,' ',0--9
  3. {$mode ObjFPC}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.  {$IFDEF UNIX}
  9.   cthreads,
  10.   {$ENDIF}
  11.   SysUtils;
  12.  
  13. const
  14.   josh_squote=chr(39);
  15.   josh_arlow=Byte(32);
  16.   josh_arhigh=Byte(128); // 128 is dumping ground for invalid chars
  17.   josh_mess_s1='Invalid character in input string at position %d in s1';
  18.   josh_mess_s2='Invalid character in input string at position %d in s2';
  19.  
  20. Var
  21.   josh_Map: array[0..255] of Byte;
  22.   josh_IgnoredCounter:Int32;
  23.   josh_loop:Int32;
  24.  
  25.  
  26. function IsAnAnagramJosh(const s1, s2: AnsiString; IgnoreSpaces: Boolean; ExceptionOnError: Boolean = False): boolean;  inline;
  27.  
  28. implementation
  29.  
  30. function IsAnAnagramJosh(const s1, s2: AnsiString; IgnoreSpaces: Boolean; ExceptionOnError: Boolean = False): boolean;  inline;
  31. var
  32.   k, l1, l2: Int32;
  33.   offset:byte=0;
  34.   josh_FreqArray: array[josh_arlow..josh_arHigh] of Int32;
  35. begin
  36.   if IgnoreSpaces then offset:=1;
  37.   fillchar(josh_FreqArray,sizeof(josh_FreqArray),0);
  38.   l1 := Length(s1);
  39.   l2 := Length(s2);
  40.   If Not ExceptionOnError then
  41.   begin
  42.     for k := 1 to l1 do josh_FreqArray[josh_Map[Byte(s1[k])]]:=josh_FreqArray[josh_Map[Byte(s1[k])]]+1;
  43.     for k := 1 to l2 do josh_FreqArray[josh_Map[Byte(s2[k])]]:=josh_FreqArray[josh_Map[Byte(s2[k])]]-1;
  44.   end
  45.   else
  46.   begin
  47.     for k := 1 to l1 do
  48.     begin
  49.       if ((Byte(s1[k]) < 32) or (Byte(s1[k]) > 127)) then raise Exception.CreateFmt(josh_mess_s1, [k]);
  50.       josh_FreqArray[josh_Map[Byte(s1[k])]]:=josh_FreqArray[josh_Map[Byte(s1[k])]]+1;
  51.     end;
  52.     for k := 1 to l2 do
  53.     begin
  54.       if ((Byte(s2[k]) < 32) or (Byte(s2[k]) > 127)) then raise Exception.CreateFmt(josh_mess_s2, [k]);
  55.       josh_FreqArray[josh_Map[Byte(s2[k])]]:=josh_FreqArray[josh_Map[Byte(s2[k])]]-1;
  56.     end;
  57.   end;
  58.   for k := josh_arlow+offset to josh_arhigh-1 do if josh_FreqArray[k] <> 0 then Exit(False);
  59.   josh_IgnoredCounter:=josh_FreqArray[josh_arhigh];
  60.   Result := True;
  61. end;
  62.  
  63. Initialization
  64.   for josh_loop := 0 to 255 do josh_Map[josh_loop] := josh_arhigh;
  65.   // map all to arhigh so collects all chars
  66.   // now map wanted chars
  67.   josh_map[$20]:=$20;
  68.   {$ifdef OnlyAlphaNumericChars}
  69.   for josh_loop := $30 to $39 do josh_Map[josh_loop] := josh_loop;
  70.   for josh_loop := $41 to $5A do josh_Map[josh_loop] := josh_loop;
  71.   for josh_loop := $61 to $7A do josh_Map[josh_loop] := IterationLoop - $20;
  72.   {$else}
  73.   for josh_loop := josh_arlow+1 to josh_arhigh-1 do josh_Map[josh_loop] := josh_loop;
  74.   for josh_loop := $61 to $7A do josh_Map[josh_loop] := josh_loop - $20;
  75.   {$endif}
  76. end.
  77.  
« Last Edit: October 28, 2024, 11:58:39 am by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

BrunoK

  • Hero Member
  • *****
  • Posts: 762
  • Retired programmer
Re: Contest: fastest IsAnagram function
« Reply #148 on: October 28, 2024, 04:49:14 pm »
Code: Pascal  [Select][+][-]
  1. function IsAnagram_benibela(const S1, S2: String; IgnoreSpaces: Boolean = True; ExceptionOnError: Boolean = False): Boolean;
...

Fastest one so far in my benchmark (dir not yet test avk's new one).

Bart
But doesn't, as far as I see, indicate the position of the first illegal character is detected in case of raise exception which is the case for avk2.

Bart

  • Hero Member
  • *****
  • Posts: 5663
    • Bart en Mariska's Webstek
Re: Contest: fastest IsAnagram function
« Reply #149 on: October 28, 2024, 06:04:42 pm »
But doesn't, as far as I see, indicate the position of the first illegal character is detected in case of raise exception which is the case for avk2.

Whic wasn't a requirement (but a nice thing to have).

Bart

 

TinyPortal © 2005-2018