Recent

Author Topic: RegExpr freeze  (Read 829 times)

Fahmy Rofiq

  • Jr. Member
  • **
  • Posts: 80
RegExpr freeze
« on: November 25, 2021, 06:09:35 am »
Please help,
This code freeze my app.

Code: Pascal  [Select][+][-]
  1.   R := TRegExpr.Create;
  2.   try
  3.     R.Expression:= '.+ (.+)\.(.+) GAGAL\.(.+)\. Sa.+ (.+) @';
  4.     R.InputString:= 'Rp.20.020 @21/11 18:35:04 =Sukses'#$0A'I25.085785721252 Rp.24.950 @21/11 18:56:58 =Sukses'#$0A'SP100.082164017112 Rp.96.675 @21/11 19:12:15 =Sukses'#$0A'CEKPLNP.32104710689 Rp.0 @21/11 19:16:45 =Sukses'#$0A'PLNS20.32104710689 Rp.20.020 @21/11 19:16:58 =Sukses'#$0A'HOTSP2.087788450220 Rp.11.150 @21/11 19:45:42 =Sukses'#$0A'I50.085783334145 Rp.49.300 @21/11 19:55:46 =Gagal'#$0A'IP50.085783334145 Rp.48.825 @21/11 19:57:05 =Sukses'#$0A'X10.087889018070 Rp.10.750 @21/11 19:58:32 =Sukses'#$0A'VIFI3.085620102211 Rp.24.300 @21/11 20:10:36 =Sukses'#$0A'CEKPLNP.511820312863 Rp.0 @21/11 20:33:24 =Sukses'#$0A'PLNS20.511820312863 Rp.20.020 @21/11 20:33:31 =Sukses'#$0A'SP10.082231983818 Rp.10.310 @21/11 20:37:08 =Sukses'#$0A'VIFI10.085620375811 Rp.47.850 @21/11 20:38:04 =Sukses'#$0A'IDFI10.085785674040 Rp.47.250 @21/11 20:39:31 =Sukses'#$0A'AXDP1.083176152220 Rp.12.800 @21/11 20:40:16 =Sukses'#$0A'VAM1.083821032311 Rp.7.700 @21/11 21:03:27 =Sukses'#$0A'VSFN6.088821060811 Rp.28.200 @21/11 21:06:13 =Sukses'#$0A'CEKPLNP.14289371883 Rp.0 @21/11 21:10:47 =Sukses'#$0A'PLNS20.14289371883 Rp.20.020 @21/11 21:10:55 =Sukses'#$0A'I20.085736474245 Rp.19.920 @21/11 21:12:09 =Sukses'#$0A'I5.085707312468 Rp.5.850 @21/11 21:14:06 =Sukses'#$0A'SP25.081330256164 Rp.24.865 @21/11 21:15:22 =Sukses'#$0A'SP10.081330256164 Rp.10.310 @21/11 21:15:57 =Sukses'#$0A'CEKPLNP.511820710359 Rp.0 @21/11 21:27:45 =Sukses'#$0A'PLNS50.511820710359 Rp.50.020 @21/11 21:27:56 =Sukses'#$0A'X5.083834182370 Rp.5.810 @21/11 21:29:25 =Sukses'#$0A'CEKPLNP.56210024081 Rp.0 @21/11 21:40:53 =Sukses'#$0A'PLNS20.56210024081 Rp.20.020 @21/11 21:41:11 =Sukses'#$0A'MAP4.0895366492293 Rp.3.500 @21/11 21:54:13 =Sukses'#$0A'SP10.082169403831 Rp.10.310 @21/11 21:56:04 =Sukses'#$0A'AIGO8.083821570311 Rp.56.850 @21/11 21:57:09 =Sukses'#$0A'X10.087751660354 Rp.10.750 @21/11 22:03:24 =Sukses'#$0A'SP25.085334597049 Rp.24.865 @21/11 22:08:43 =Sukses'#$0A'SP15.08133808550 Rp.14.880 @21/11 22:10:45 =Gagal'#$0A'SP15.081338808550 Rp.14.880 @21/11 22:11:16 =Sukses'#$0A'SM10.088996450900 Rp.9.950 @21/11 22:49:53 =Sukses'#$0A'SP10.081286394373 Rp.10.310 @21/11 22:52:38 =Gagal'#$0A'S10.081286349373 Rp.10.500 @21/11 22:53:17 =Sukses'#$0A'T5.0895335171927 Rp.5.285 @21/11 23:22:12 =Sukses'#$0A'CEKPLNP.32021831857 Rp.0 @21/11 23:41:41 =Sukses'#$0A'PLNS20.32021831857 Rp.20.020 @21/11 23:41:46 =Sukses'#$0A'CEKPLNP.32021831857 Rp.0 @22/11 02:36:50 =Sukses'#$0A'SP10.081330959088 Rp.10.310 @22/11 07:01:11 =Sukses'#$0A'I25.085748608750 Rp.24.950 @22/11 08:14:56 =Sukses'#$0A'SP25.081331109400 Rp.24.865 @22/11 08:17:42 =Sukses'#$0A'TRDA22.089676907229 Rp.59.250 @22/11 08:39:09 =Sukses'#$0A'SP20.081233829042 Rp.19.975 @22/11 10:22:35 =Sukses'#$0A'I10.085762218826 Rp.10.775 @22/11 10:38:33 =Sukses'#$0A'VXL8.087810385711 Rp.38.100 @22/11 10:39:12 =Sukses'#$0A'TDF1.082133868564 Rp.14.000 @22/11 11:06:02 =Sukses'#$0A'CEKPLNP.14414637059 Rp.0 @22/11 10:58:27 =Sukses'#$0A'PLNS20.14414637059 Rp.20.020 @22/11 10:58:43 =Sukses'#$0A'VIY7.085611331911 Rp.10.200 @22/11 11:33:24 =Sukses'#$0A'VIFI2.085611332411 Rp.15.000 @22/11 11:33:39 =Sukses'#$0A'SP10.081352326723 Rp.10.310 @22/11 11:46:50 =Sukses'#$0A'IP15.085745822016 Rp.15.000 @22/11 12:12:13 =Sukses'#$0A'HOTSP3.087751632740 Rp.8.575 @22/11 12:14:27 =Sukses'#$0A'I10.085608369109 Rp.10.775 @22/11 12:27:17 =Sukses'#$0A'I10.085604334516 Rp.10.775 @22/11 12:26:56 =Sukses'#$0A'VIFI3.085612242411 Rp.24.300 @22/11 12:24:29 =Sukses'#$0A'I10.085730194383 Rp.10.775 @22/11 12:51:55 =Sukses'#$0A'SM50.08883685535 Rp.49.000 @22/11 12:55:56 =Sukses'#$0A'XA15.081997976808 Rp.14.850 @22/11 13:39:52 =Sukses'#$0A'I5.085730384822 Rp.5.850 @22/11 13:53:08 =Sukses'#$0A'VMJT8.085213532511 Rp.56.000 @22/11 13:53:49 =Sukses'#$0A'CEKPLNP.14005491643 Rp.0 @22/11 13:56:00 =Sukses'#$0A'PLNS100.14005491643 Rp.100.020 @22/11 13:56:16 =Sukses'#$0A'CEKPLNP.56210026474 Rp.0 @22/11 16:40:58 =Sukses'#$0A'PLNS20.56210026474 Rp.20.020 @22/11 16:41:10 =Sukses'#$0A'PLNS20.32021831857 Rp.20.020 @22/11 16:59:22 =Sukses'#$0A'CEKPLNP.32169475319 Rp.0 @22/11 17:15:34 =Sukses'#$0A'PLNS20.32169475319 Rp.20.020 @22/11 17:15:50 =Sukses'#$0A'CEKPLNP.32169475285 Rp.0 @22/11 17:16:02 =Sukses'#$0A'PLNS20.32169475285 Rp.20.020 @22/11 17:16:10 =Sukses'#$0A'X10.081997976808 Rp.10.750 @22/11 17:24:11 =Sukses'#$0A'AIGO1.083817332611 R';
  5.     R.Exec;
  6.   finally
  7.     R.Free;
  8.   end;
Lazarus Trunk + FPC Fixes 32bit
Windows 10 x64

Roland57

  • Full Member
  • ***
  • Posts: 199
Re: RegExpr freeze
« Reply #1 on: November 25, 2021, 08:04:27 am »
Hello! What would be surprising is if the function managed to find a result. Your expression is too vague. Concretely, there are too many dots. You should not use the dot, or very little. You should use more determined classes, like '\d' or '\w'. And also more determined operators for the number of characters, like '{n,n}'. Otherwise you give to the computer an impossible task. Do you see what I mean?

If you tell us which substring you try to capture, we could help you to find a more efficient expression.

Fahmy Rofiq

  • Jr. Member
  • **
  • Posts: 80
Re: RegExpr freeze
« Reply #2 on: November 25, 2021, 08:22:17 am »
I dont expect a result from that input string.

Expression '.+ (.+)\.(.+) GAGAL\.(.+)\. Sa.+ (.+) @'
is for input like '@REFID: #TRXID:58935149 TDF1.082133868564 GAGAL. . Saldo 817.794 @12:08 * KOMPLAIN MAX H+7'
i can get: 'TDF1','082133868564',' ','817.794'

But sometimes i get unexpected input from server like example above.
From that example i expect it will result FALSE for R.Exec but its not.
The function never return.

Sorry for my English, not my native language.
Lazarus Trunk + FPC Fixes 32bit
Windows 10 x64

MarkMLl

  • Hero Member
  • *****
  • Posts: 3456
Re: RegExpr freeze
« Reply #3 on: November 25, 2021, 09:09:04 am »
I'm troubled by those embedded newlines. I suspect you also need to check whether your match is case-insensitive.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Roland57

  • Full Member
  • ***
  • Posts: 199
Re: RegExpr freeze
« Reply #4 on: November 25, 2021, 09:35:55 am »
Expression '.+ (.+)\.(.+) GAGAL\.(\s+)\. Sa.+ (.+) @'
is for input like '@REFID: #TRXID:58935149 TDF1.082133868564 GAGAL. . Saldo 817.794 @12:08 * KOMPLAIN MAX H+7'
i can get: 'TDF1','082133868564',' ','817.794'

So (if I correctly understood your problem) your expression should be something like this:

Code: Pascal  [Select][+][-]
  1. begin
  2.   R := TRegExpr.Create;
  3.   try
  4.     R.Expression := '(\w\w\w\d)\.(\d{12}) GAGAL\.(\s+)\. Sa\w+ (\d{3}\.\d{3}) @';
  5.     R.InputString := '@REFID: #TRXID:58935149 TDF1.082133868564 GAGAL. . Saldo 817.794 @12:08 * KOMPLAIN MAX H+7';
  6.     if R.Exec then
  7.     begin
  8.       WriteLn('[', R.Match[1], ']');
  9.       WriteLn('[', R.Match[2], ']');
  10.       WriteLn('[', R.Match[3], ']');
  11.       WriteLn('[', R.Match[4], ']');
  12.     end;
  13.   finally
  14.     R.Free;
  15.   end;
  16. end.

Do you see the idea?
« Last Edit: November 25, 2021, 01:20:34 pm by Roland57 »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7507
  • Debugger - SynEdit - and more
    • wiki
Re: RegExpr freeze
« Reply #5 on: November 25, 2021, 11:07:24 am »
I haven't tested if the regex ever finishes, but...

Given your regexp
Quote
Code: Pascal  [Select][+][-]
  1. R.Expression:= '.+ (.+)\.(.+) GAGAL\.(.+)\. Sa.+ (.+) @';

And a string to match with over 4000 chars... => This regex will take very long (afaik).

Depending on how exactly the regex is implemented, this may go as follows (I have not verified this, just an assumption / a guess).
=> That is, if the regex engine does not do any optimizations to the pattern.

.+   Will find every space (starting with the last) => This are over 300 spaces
=> The entire rest of the regex, must run over 300 times (unless it matches a result)

(.+)\. Will find dots. => there are 216 dots.
=> Note that any dot that was found after the last space, will be found again after any other space.
     So in the first of the above 300 loops, none may be found.
     But going back for each of that 300 loops increasingly more dots are to be found, until all of them are found.
=> Assuming an equal distribution (not checked), that makes an average of 100 dots found, in each of the above 300 loops.
     The rest of the regex therefore runs 30000 times.

(.+) GAGAL\. <space>GAGAL does not exist. But that needs to be checked 30000 times on substring of your text.
  Each substrings having a length between 1 and 4000, avg 2000.
  => so 30000 * 2000 comparisons are required to check this.
  => That are 60000000 = 60 million checks.



About optimizations: Perl's regex (which is highly optimized) scans the above in no time at all. However, only if case-sensitive.
Specifying the "i" modifier, and it takes forever too.
« Last Edit: November 25, 2021, 11:12:47 am by Martin_fr »

Warfley

  • Hero Member
  • *****
  • Posts: 596
Re: RegExpr freeze
« Reply #6 on: November 25, 2021, 11:24:07 am »
A regex is basically just a description of a DFA. A DFA matches in linear time O(N).
If you do a search, i.e. the match can start at any point, you need to implement backtracking, therefore you have O(N²) (even though it could be implemented as .*REGEX which would again be in O(N) but it would explode the state space of the DFA, trading runtime for memory)

So it is still pretty efficient. If it freezes your computer for a long time, thats probably an issue with the implementation
About optimizations: Perl's regex (which is highly optimized) scans the above in no time at all. However, only if case-sensitive.
Specifying the "i" modifier, and it takes forever too.

You shouldn't compare to perl regex, perl regex is an absolute beast, it is more powerful than the normal regex (it can't be implemented using only a DFA) and can be used to build expressions that don't terminate on certain strings

But you are right, there are some DFA optimizations that can help massively
« Last Edit: November 25, 2021, 11:31:57 am by Warfley »

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1063
Re: RegExpr freeze
« Reply #7 on: November 25, 2021, 01:43:32 pm »
hello,
to avoid a freezing regex you can use an if with and on  multiple conditions and with the more complex condition at the end.
the if failed on the first false condition.
Example :
Code: Pascal  [Select][+][-]
  1. if (pos('GAGAL',R.InputString) > 0 ) and  R.Exec then    

Friendly, J.P
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

Alextp

  • Hero Member
  • *****
  • Posts: 1469
    • UVviewsoft
Re: RegExpr freeze
« Reply #8 on: November 25, 2021, 02:30:49 pm »
Instead of .+ use \S+ (\S means not space), because dot also finds the space-chars.

Alextp

  • Hero Member
  • *****
  • Posts: 1469
    • UVviewsoft
Re: RegExpr freeze
« Reply #9 on: November 25, 2021, 02:34:29 pm »

 

TinyPortal © 2005-2018