Forum > Other

I'm seeking (beta-)testers for my fast regular expression engine FLRE

(1/10) > >>

BeRo:
Hello,

I'm searching (beta-)testers for FLRE, my new ObjectPascal-native regular expression engine. It's a rewrite of my older (buggy) regular expression engine BRRE.

The GitHub project site is: https://github.com/BeRo1985/flre

And here first benchmark results (with Delphi 7 compiled, because with FPC compiled it is ~2x slower as 32-bit x86 build than with Delphi 7, and as 64-bit x86 build it is more or less equal fast like Delphi 7): http://vserver.rosseaux.net/stuff/flre.html (based on the PCRE-JIT benchmark  http://sljit.sourceforge.net/regex_perf.html )

If you find any bugs or you have any suggestions, report/post them on the GitHub issue tracker, or if you've no GitHub account, then into this thread.

And as a side note, the experimental support for lookahead assertions and back references may be removed later again, if it is recognized later that these capabilities are not effective without backtracking, because FLRE is primarly a backtracking-free regular expression engine, which is modelled after Google's RE2 (see https://swtch.com/~rsc/regexp/regexp1.html and https://swtch.com/~rsc/regexp/regexp2.html and https://swtch.com/~rsc/regexp/regexp3.html ). Therefore I seek indeed also voluntary beta testers which test my regular expression engine through its paces.




 
 

Leledumbo:
Interesting. I'll at least try reproducing the benchmark.

marcov:
Maybe add FPC's existing regengines to it also?

Roland57:
Hello!

Very interesting. I tested the benchmark program successfully with FPC 2.6.4,


--- Code: ---                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :      104.91 ms |        2388
                                    /(?i)Twain/ :      111.73 ms |        2657
                                   /[a-z]shing/ :       95.15 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :      146.08 ms |         396
                                    /\b\w+nn\b/ :      177.33 ms |         359
                             /[a-q][^u-z]{13}x/ :     1395.17 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      230.92 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      295.75 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      277.67 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      233.92 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      225.81 ms |           2
                                 /[a-zA-Z]+ing/ :      360.64 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      304.98 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :      144.93 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      310.85 ms |        9857
--- End code ---

Delphi 7


--- Code: ---                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :      104.44 ms |        2388
                                    /(?i)Twain/ :       69.28 ms |        2657
                                   /[a-z]shing/ :       65.35 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       86.42 ms |         396
                                    /\b\w+nn\b/ :      167.65 ms |         359
                             /[a-q][^u-z]{13}x/ :      844.82 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      135.85 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      191.10 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      231.50 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      250.89 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      230.15 ms |           2
                                 /[a-zA-Z]+ing/ :      351.19 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      299.93 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       89.53 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      269.56 ms |        9857
--- End code ---

and Delphi XE2.


--- Code: ---                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :       52.28 ms |        2388
                                    /(?i)Twain/ :       52.14 ms |        2657
                                   /[a-z]shing/ :       54.59 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       70.03 ms |         396
                                    /\b\w+nn\b/ :      154.46 ms |         359
                             /[a-q][^u-z]{13}x/ :      707.79 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      111.93 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      154.01 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      201.44 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      204.27 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      204.24 ms |           2
                                 /[a-zA-Z]+ing/ :      275.79 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      247.27 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       75.81 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      233.70 ms |        9857
--- End code ---

Congratulations!  :)

When I begin to use a new library, I like to make short and simple examples for myself. Here is one:


--- Code: ---program multicaptures1;
{$ifdef fpc}
 {$mode delphi}
 {$ifdef cpui386}
  {$define cpu386}
 {$endif}
 {$ifdef cpu386}
  {$asmmode intel}
 {$endif}
 {$ifdef cpuamd64}
  {$asmmode intel}
 {$endif}
 {$ifdef FPC_LITTLE_ENDIAN}
  {$define LITTLE_ENDIAN}
 {$else}
  {$ifdef FPC_BIG_ENDIAN}
   {$define BIG_ENDIAN}
  {$endif}
 {$endif}
 {-$pic off}
 {$define caninline}
 {$ifdef FPC_HAS_TYPE_EXTENDED}
  {$define HAS_TYPE_EXTENDED}
 {$else}
  {$undef HAS_TYPE_EXTENDED}
 {$endif}
 {$ifdef FPC_HAS_TYPE_DOUBLE}
  {$define HAS_TYPE_DOUBLE}
 {$else}
  {$undef HAS_TYPE_DOUBLE}
 {$endif}
 {$ifdef FPC_HAS_TYPE_SINGLE}
  {$define HAS_TYPE_SINGLE}
 {$else}
  {$undef HAS_TYPE_SINGLE}
 {$endif}
{$else}
 {$realcompatibility off}
 {$localsymbols on}
 {$define LITTLE_ENDIAN}
 {$ifndef cpu64}
  {$define cpu32}
 {$endif}
 {$define HAS_TYPE_EXTENDED}
 {$define HAS_TYPE_DOUBLE}
 {$define HAS_TYPE_SINGLE}
{$endif}
{$ifdef win32}
 {$define windows}
{$endif}
{$ifdef win64}
 {$define windows}
{$endif}
{$ifdef wince}
 {$define windows}
{$endif}
{$rangechecks off}
{$extendedsyntax on}
{$writeableconst on}
{$hints off}
{$booleval off}
{$typedaddress off}
{$stackframes off}
{$varstringchecks on}
{$typeinfo on}
{$overflowchecks off}
{$longstrings on}
{$openstrings on}
{$apptype console}

uses
  SysUtils,
  Classes,
  FLRE in '..\..\src\FLRE.pas',
  FLREUnicode in '..\..\src\FLREUnicode.pas';

var
  re: TFLRE;
  pattern, subject: TFLRERawByteString;
  captures: TFLREMultiCaptures;
  i, j: integer;

begin
  subject := '04/09/2015 05/09/2015';
  pattern := '(\d+)/(\d+)/(\d+)';
  re := TFLRE.Create(pattern, []);
  re.MaximalDFAStates := 65536;
  re.MatchAll(subject, captures);

  for i := Low(captures) to High(captures) do
    for j := Low(captures[i]) to High(captures[i]) do
      WriteLn(Format(
        '%d %d %s', [i, j, Copy(subject, captures[i, j].Start, captures[i, j].Length)]
      ));

  SetLength(captures,0);
  re.Free;
  ReadLn;
end.
--- End code ---

The result:


--- Code: ---0 0 04/09/2015
0 1 04
0 2 09
0 3 2015
1 0 05/09/2015
1 1 05
1 2 09
1 3 2015
--- End code ---

BeRo:

--- Quote from: marcov on September 04, 2015, 10:04:33 am ---Maybe add FPC's existing regengines to it also?

--- End quote ---

See https://gist.github.com/BeRo1985/9288f21c9216b98b562e  RegExpr.pas and regex.pp are much slower and have bugs (missing otherwise valid matches). Source of the older (windows-only) benchmark: http://vserver.rosseaux.net/stuff/oldflrebenchmark.dpr

@Roland Chastain, on which CPU you've runned the benchmark? I ask so that I have a comparison factor.

Navigation

[0] Message Index

[#] Next page

Go to full version