Recent

Author Topic: I'm seeking (beta-)testers for my fast regular expression engine FLRE  (Read 28289 times)

BeRo

  • New Member
  • *
  • Posts: 45
    • My site
Hello,

I'm searching (beta-)testers for FLRE, my new ObjectPascal-native regular expression engine. It's a rewrite of my older (buggy) regular expression engine BRRE.

The GitHub project site is: https://github.com/BeRo1985/flre

And here first benchmark results (with Delphi 7 compiled, because with FPC compiled it is ~2x slower as 32-bit x86 build than with Delphi 7, and as 64-bit x86 build it is more or less equal fast like Delphi 7): http://vserver.rosseaux.net/stuff/flre.html (based on the PCRE-JIT benchmark  http://sljit.sourceforge.net/regex_perf.html )

If you find any bugs or you have any suggestions, report/post them on the GitHub issue tracker, or if you've no GitHub account, then into this thread.

And as a side note, the experimental support for lookahead assertions and back references may be removed later again, if it is recognized later that these capabilities are not effective without backtracking, because FLRE is primarly a backtracking-free regular expression engine, which is modelled after Google's RE2 (see https://swtch.com/~rsc/regexp/regexp1.html and https://swtch.com/~rsc/regexp/regexp2.html and https://swtch.com/~rsc/regexp/regexp3.html ). Therefore I seek indeed also voluntary beta testers which test my regular expression engine through its paces.




 
 

Leledumbo

  • Hero Member
  • *****
  • Posts: 8747
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #1 on: September 04, 2015, 09:13:17 am »
Interesting. I'll at least try reproducing the benchmark.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #2 on: September 04, 2015, 10:04:33 am »
Maybe add FPC's existing regengines to it also?

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #3 on: September 04, 2015, 10:20:59 am »
Hello!

Very interesting. I tested the benchmark program successfully with FPC 2.6.4,

Code: [Select]
                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :      104.91 ms |        2388
                                    /(?i)Twain/ :      111.73 ms |        2657
                                   /[a-z]shing/ :       95.15 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :      146.08 ms |         396
                                    /\b\w+nn\b/ :      177.33 ms |         359
                             /[a-q][^u-z]{13}x/ :     1395.17 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      230.92 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      295.75 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      277.67 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      233.92 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      225.81 ms |           2
                                 /[a-zA-Z]+ing/ :      360.64 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      304.98 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :      144.93 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      310.85 ms |        9857

Delphi 7

Code: [Select]
                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :      104.44 ms |        2388
                                    /(?i)Twain/ :       69.28 ms |        2657
                                   /[a-z]shing/ :       65.35 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       86.42 ms |         396
                                    /\b\w+nn\b/ :      167.65 ms |         359
                             /[a-q][^u-z]{13}x/ :      844.82 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      135.85 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      191.10 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      231.50 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      250.89 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      230.15 ms |           2
                                 /[a-zA-Z]+ing/ :      351.19 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      299.93 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       89.53 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      269.56 ms |        9857

and Delphi XE2.

Code: [Select]
                                                        Time     | Match count
==============================================================================
FLRE:
                                        /Twain/ :       52.28 ms |        2388
                                    /(?i)Twain/ :       52.14 ms |        2657
                                   /[a-z]shing/ :       54.59 ms |        1877
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       70.03 ms |         396
                                    /\b\w+nn\b/ :      154.46 ms |         359
                             /[a-q][^u-z]{13}x/ :      707.79 ms |        4929
                  /Tom|Sawyer|Huckleberry|Finn/ :      111.93 ms |        3015
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      154.01 ms |        4820
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      201.44 ms |        3015
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      204.27 ms |        2220
            /Tom.{10,25}river|river.{10,25}Tom/ :      204.24 ms |           2
                                 /[a-zA-Z]+ing/ :      275.79 ms |       95863
                        /\s[a-zA-Z]{0,12}ing\s/ :      247.27 ms |       67810
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       75.81 ms |         313
                    /["'][^"']{0,30}[?!\.]["']/ :      233.70 ms |        9857

Congratulations!  :)

When I begin to use a new library, I like to make short and simple examples for myself. Here is one:

Code: [Select]
program multicaptures1;
{$ifdef fpc}
 {$mode delphi}
 {$ifdef cpui386}
  {$define cpu386}
 {$endif}
 {$ifdef cpu386}
  {$asmmode intel}
 {$endif}
 {$ifdef cpuamd64}
  {$asmmode intel}
 {$endif}
 {$ifdef FPC_LITTLE_ENDIAN}
  {$define LITTLE_ENDIAN}
 {$else}
  {$ifdef FPC_BIG_ENDIAN}
   {$define BIG_ENDIAN}
  {$endif}
 {$endif}
 {-$pic off}
 {$define caninline}
 {$ifdef FPC_HAS_TYPE_EXTENDED}
  {$define HAS_TYPE_EXTENDED}
 {$else}
  {$undef HAS_TYPE_EXTENDED}
 {$endif}
 {$ifdef FPC_HAS_TYPE_DOUBLE}
  {$define HAS_TYPE_DOUBLE}
 {$else}
  {$undef HAS_TYPE_DOUBLE}
 {$endif}
 {$ifdef FPC_HAS_TYPE_SINGLE}
  {$define HAS_TYPE_SINGLE}
 {$else}
  {$undef HAS_TYPE_SINGLE}
 {$endif}
{$else}
 {$realcompatibility off}
 {$localsymbols on}
 {$define LITTLE_ENDIAN}
 {$ifndef cpu64}
  {$define cpu32}
 {$endif}
 {$define HAS_TYPE_EXTENDED}
 {$define HAS_TYPE_DOUBLE}
 {$define HAS_TYPE_SINGLE}
{$endif}
{$ifdef win32}
 {$define windows}
{$endif}
{$ifdef win64}
 {$define windows}
{$endif}
{$ifdef wince}
 {$define windows}
{$endif}
{$rangechecks off}
{$extendedsyntax on}
{$writeableconst on}
{$hints off}
{$booleval off}
{$typedaddress off}
{$stackframes off}
{$varstringchecks on}
{$typeinfo on}
{$overflowchecks off}
{$longstrings on}
{$openstrings on}
{$apptype console}

uses
  SysUtils,
  Classes,
  FLRE in '..\..\src\FLRE.pas',
  FLREUnicode in '..\..\src\FLREUnicode.pas';

var
  re: TFLRE;
  pattern, subject: TFLRERawByteString;
  captures: TFLREMultiCaptures;
  i, j: integer;

begin
  subject := '04/09/2015 05/09/2015';
  pattern := '(\d+)/(\d+)/(\d+)';
  re := TFLRE.Create(pattern, []);
  re.MaximalDFAStates := 65536;
  re.MatchAll(subject, captures);

  for i := Low(captures) to High(captures) do
    for j := Low(captures[i]) to High(captures[i]) do
      WriteLn(Format(
        '%d %d %s', [i, j, Copy(subject, captures[i, j].Start, captures[i, j].Length)]
      ));

  SetLength(captures,0);
  re.Free;
  ReadLn;
end.

The result:

Code: [Select]
0 0 04/09/2015
0 1 04
0 2 09
0 3 2015
1 0 05/09/2015
1 1 05
1 2 09
1 3 2015

« Last Edit: September 04, 2015, 10:26:06 am by Roland Chastain »
My projects are on Gitlab and on Codeberg.

BeRo

  • New Member
  • *
  • Posts: 45
    • My site
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #4 on: September 04, 2015, 12:03:38 pm »
Maybe add FPC's existing regengines to it also?

See https://gist.github.com/BeRo1985/9288f21c9216b98b562e  RegExpr.pas and regex.pp are much slower and have bugs (missing otherwise valid matches). Source of the older (windows-only) benchmark: http://vserver.rosseaux.net/stuff/oldflrebenchmark.dpr

@Roland Chastain, on which CPU you've runned the benchmark? I ask so that I have a comparison factor.

« Last Edit: September 04, 2015, 12:09:17 pm by BeRo »

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #5 on: September 04, 2015, 06:09:09 pm »
@Roland Chastain, on which CPU you've runned the benchmark? I ask so that I have a comparison factor.

I hope this anwers the question:

Quote
Type du système:                            x64-based PC
Processeur(s):                              1 processeur(s) installé(s).
                                            [01] : AMD64 Family 22 Model 0 Stepping 1 AuthenticAMD ~1000 MHz

But I compiled for Win32 (with the three compilers).

I had a look to FLREgrep: it is very interesting.

What I miss in your package (if I may say it) is some basic examples to see how to make simple things (does a string match a pattern, the matched text...).

Is there a replacement function?
« Last Edit: September 05, 2015, 11:06:35 am by Roland Chastain »
My projects are on Gitlab and on Codeberg.

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #6 on: September 04, 2015, 06:23:14 pm »
I never noticed that RegExpr was so slow (maybe because I never used it on large data). But if I believe what I read in the FLRE benchmark, your library looks terrible.  :)
My projects are on Gitlab and on Codeberg.

BeRo

  • New Member
  • *
  • Posts: 45
    • My site
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #7 on: September 04, 2015, 07:45:13 pm »
What I miss in your package (if I may say it) is some basic examples to see how to make simple things (does a string match a pattern, the matched text...).

The project is still young... :-)

Is there a replacement function ?

Yes, TFLRE.PtrReplaceAll  TFLRE.ReplaceAll and  TFLRE.UTF8ReplaceAll


BeRo

  • New Member
  • *
  • Posts: 45
    • My site
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #8 on: September 04, 2015, 07:47:26 pm »
I never noticed that RegExpr was so slow (maybe because I never used it on large data). But if I believe what I read in the FLRE benchmark, your library looks terrible.  :)

Terrible? In which sense?  ;D

garlar27

  • Hero Member
  • *****
  • Posts: 652
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #9 on: September 04, 2015, 08:20:57 pm »
I never noticed that RegExpr was so slow (maybe because I never used it on large data). But if I believe what I read in the FLRE benchmark, your library looks terrible.  :)

Terrible? In which sense?  ;D
"Terrific" and "Terrible" in spoken english is always used as a VERY POSITIVE opinion.
 ;) ;) ;)

And yes, it's very challenging for us, the non native english speakers.
« Last Edit: September 04, 2015, 08:22:43 pm by garlar27 »

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #10 on: September 04, 2015, 08:57:28 pm »
Yes, I wanted to say that the library is incredibly fast.  :)
My projects are on Gitlab and on Codeberg.

BeRo

  • New Member
  • *
  • Posts: 45
    • My site
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #11 on: September 04, 2015, 10:54:17 pm »
Here a another benchmark against a another Object-Pascal-native regular expression engine with the name skRegExp at https://github.com/shukomiya/skregexp :
 
Benchmark: https://gist.github.com/BeRo1985/38b6d43e08922eb2c66f

I hope the so far previous (maybe too) good benchmark results aren't false conclusions, for example due to yet undetected bugs inside FLRE. On the other hand, in FLRE is a lot of micro-optimization-work inside.

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #12 on: September 05, 2015, 08:33:12 am »
Yes, TFLRE.PtrReplaceAll  TFLRE.ReplaceAll and  TFLRE.UTF8ReplaceAll

OK, the syntax is easy.

Code: [Select]
program replace1;
{$I DIRECTIVES}

uses
  SysUtils,
  Classes,
  FLRE in '..\..\src\FLRE.pas',
  FLREUnicode in '..\..\src\FLREUnicode.pas';

var
  re: TFLRE;
  pattern, subject: TFLRERawByteString;

begin
  subject := '05/09/2015';
  pattern := '\d+';
  re := TFLRE.Create(pattern, []);
  re.MaximalDFAStates := 65536;

  WriteLn(re.ReplaceAll(subject, 'x'));

  re.Free;
  ReadLn;
end.

The result:

Code: [Select]
x/x/x
« Last Edit: September 05, 2015, 08:38:57 am by Roland Chastain »
My projects are on Gitlab and on Codeberg.

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #13 on: September 05, 2015, 08:41:56 am »
Another one:

Code: [Select]
program replace2;
{$I DIRECTIVES}

uses
  SysUtils,
  Classes,
  FLRE in '..\..\src\FLRE.pas',
  FLREUnicode in '..\..\src\FLREUnicode.pas';

var
  re: TFLRE;
  pattern, subject: TFLRERawByteString;

begin
  subject := '05/09/2015';
  pattern := '(\d+)/(\d+)/(\d+)';
  re := TFLRE.Create(pattern, []);
  re.MaximalDFAStates := 65536;

  WriteLn(re.ReplaceAll(subject, '\3/\2/\1'));

  re.Free;
  ReadLn;
end.

Result:

Code: [Select]
2015/09/05
Is it possible to use a function as second argument ?
My projects are on Gitlab and on Codeberg.

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: I'm seeking (beta-)testers for my fast regular expression engine FLRE
« Reply #14 on: September 05, 2015, 11:03:41 am »
Is it possible to use a function as second argument?

Here is an example of what I mean, using RegExpr.

It's a function which converts

Code: [Select]
rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R
to

Code: [Select]
rnbqkbnr
pp-ppppp
--------
--p-----
----P---
-----N--
PPPP-PPP
RNBQKB-R
My projects are on Gitlab and on Codeberg.

 

TinyPortal © 2005-2018