Recent

Author Topic: match all occurences of 'cat' in a string  (Read 4135 times)

erosolmiz

  • Newbie
  • Posts: 2
match all occurences of 'cat' in a string
« on: April 30, 2019, 06:01:31 pm »
Hi
i can't get but one 'cat' sequence of characters while it is 7 in the string
i need all 'cat's listing with its positions
here is the test
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. uses
  4.   regexpr;
  5.  
  6. var
  7.   regex:TRegExpr;
  8.   i:Integer;
  9. begin
  10. regex:=TRegExpr.Create;
  11. Regex.ModifierG := true;
  12. try
  13.   regex.Expression:='(cat)';
  14.   regex.Exec('thecatibthezcatzbwcatwcatacatycatoidxcatzokay');
  15.   writeln(regex.SubExprMatchCount);
  16.   for i:=1 to regex.SubExprMatchCount do
  17.     writeln(regex.Match[i]);
  18.     writeln(regex.MatchPos[i]);
  19. finally
  20.  
  21.   regex.Free;
  22. end;
  23. end.
  24.  
thanks

bytebites

  • Hero Member
  • *****
  • Posts: 640
Re: match all occurences of 'cat' in a string
« Reply #1 on: April 30, 2019, 06:54:00 pm »
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. uses
  4.   regexpr;
  5.  
  6. var
  7.   regex:TRegExpr;
  8.   i:Integer;
  9. begin
  10. regex:=TRegExpr.Create;
  11. Regex.ModifierG := true;
  12. try
  13.   regex.Expression:='(cat)';
  14.   if regex.Exec('thecatibthezcatzbwcatwcatacatycatoidxcatzokay') then repeat
  15.     writeln(regex.MatchPos[0]);
  16.   until not regex.ExecNext;
  17. finally
  18.  
  19.   regex.Free;
  20. end;
  21. end.
  22.  

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: match all occurences of 'cat' in a string
« Reply #2 on: April 30, 2019, 07:03:26 pm »
Or without regex:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. procedure ShowPositionsOfSubstring(const aSub: String; aString: String);
  6. var
  7.   p, len, last, count: SizeInt;
  8.   original: String;
  9. begin
  10.   len := Length(aSub);
  11.   last := 0;
  12.   count := 0;
  13.   p := Pos(aSub, aString);
  14.   case p of
  15.     0: WriteLn('Zero ocurrences of ',aSub,' in ',aString);
  16.     else begin
  17.            original := aString;
  18.            repeat
  19.              WriteLn(aSub,' occurs at position ',p + last);
  20.              Inc(count);
  21.              Inc(p, len);
  22.              aString := Copy(aString, Succ(p), MaxInt);
  23.              Inc(last, p);
  24.              p := Pos(aSub, aString);
  25.            until (p = 0) or (Length(aString) < len);
  26.            WriteLn(aSub,' occurs ',count,' times in ',original);
  27.          end;
  28.   end;
  29. end;
  30.  
  31. begin
  32.   ShowPositionsOfSubstring('cat', 'thecatibthezcatzbwcatwcatacatycatoidxcatzokay');
  33.   ReadLn;
  34. end.

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: match all occurences of 'cat' in a string
« Reply #3 on: April 30, 2019, 07:16:54 pm »
Or use PosEx (instead of repeated copying of substrings).

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 14371
  • Sensorship about opinions does not belong here.
Re: match all occurences of 'cat' in a string
« Reply #4 on: April 30, 2019, 07:19:20 pm »
Or Bart wants a RegExp? :o
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

dsiders

  • Hero Member
  • *****
  • Posts: 1080
Re: match all occurences of 'cat' in a string
« Reply #5 on: April 30, 2019, 07:31:48 pm »
Or Bart wants a RegExp? :o

I had a problem to solve, so I decided to use regular expressions. Now I have two problems. :)
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

BobDog

  • Sr. Member
  • ****
  • Posts: 394
Re: match all occurences of 'cat' in a string
« Reply #6 on: April 30, 2019, 10:19:50 pm »
I did similar to this a while back.
Keep the information in an array.
Code: Pascal  [Select][+][-]
  1.  
  2.  
  3.     program tally;
  4.      
  5.     Type  
  6.       intArray = Array of integer;
  7.      
  8.     { =========  number of partstring in somestring =============}
  9.     function tally(somestring:ansistring;partstring:ansistring;var arr:intarray ):integer;
  10.     var
  11.     i,j,ln,lnp,count,num:integer ;
  12.     filler:boolean;
  13.     somestringp,partstringp:pchar;
  14.      pos1:integer;
  15.     label
  16.     skip ,start,return;
  17.     begin
  18.     pos1:= Pos(partstring,somestring);
  19.     if pos1=0 then
  20.     begin
  21.     setlength(arr,1); //set arr[0]=0 and go
  22.     exit(0);
  23.     end;
  24.      ln:=length(somestring);
  25.     lnp:=length(partstring);
  26.     filler:=false;
  27.     start:
  28.     count:=0;
  29.     i:=-1;
  30.     somestringp:=@somestring[1];   //speed  for big strings
  31.     partstringp:=@partstring[1];
  32.     repeat
  33.     i+=1;
  34.        if somestringp[i] <> partstringp[0] then goto skip ;
  35.          if somestringp[i] = partstringp[0] then
  36.          begin
  37.          for j:=0 to lnp-1 do
  38.          begin
  39.          if somestringp[j+i]<>partstringp[j] then goto skip;
  40.          end;
  41.           count+=1;
  42.           if filler = true then arr[count]:=i+1 ;
  43.           i:=i+lnp-1;
  44.          end ;
  45.        skip:
  46.        until i>=ln-1 ;
  47.     SetLength(arr,count); // size is now known, repeat the operation to fil arr
  48.     arr[0]:=count;        // save tally in [0]
  49.     num:=count;
  50.     if filler=true then goto return;
  51.     filler:=true;
  52.       goto start;
  53.        return:
  54.       result:=num;
  55.     end; {tally}
  56.  
  57.     procedure show(var arr:array of integer);
  58.     var i:integer;
  59.     var comma:string;
  60.     begin
  61.     if arr[0]=0 then exit;
  62.      writeln('Positions:');
  63.       for i:=1 to arr[0] do
  64.       begin
  65.       if i<arr[0] then comma:=',' else comma:='';
  66.      write(arr[i],comma);
  67.      end;
  68.      writeln;
  69.     end;{show}
  70.      
  71.  
  72.      
  73.      var
  74.      arr:array of integer;
  75.      s,d:ansistring;
  76.  
  77.      begin
  78.      s:='thecatibthezcatzbwcatwcatacatycatoidxcatzokay';
  79.      d:='cat';
  80.  
  81.      writeln(s);
  82.  
  83.      writeln('Tally of ',d,' ',tally(s,d,arr));
  84.  
  85.  
  86.      show(arr);
  87.  
  88.      writeln('Press enter to end');
  89.      
  90.        readln;
  91.      end.
  92.      
  93.          

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: match all occurences of 'cat' in a string
« Reply #7 on: April 30, 2019, 11:41:45 pm »
Universal solution for any Pascal dialect, without use of any unit other then system:

Code: Pascal  [Select][+][-]
  1. function Count(What, InWhat: String): Integer;
  2. var
  3.   i, k: Integer;
  4.   NoCigar: Boolean;
  5. begin
  6.   Result := 0;
  7.   if Length(What) > Length(InWhat) then Exit;
  8.   if Length(What) = 0 then Exit;
  9.   for i := 1 to Length(InWhat) - Length(What) + 1 do
  10.   begin
  11.     NoCigar := False;
  12.     for k := 1 to Length(What) do
  13.     begin
  14.       if What[k] <> InWhat[k+i-1] then
  15.       begin
  16.         NoCigar := True;
  17.         Break;
  18.       end;
  19.     end;
  20.     if not NoCigar then Inc(Result);
  21.   end;
  22. end;

Bart

ASerge

  • Hero Member
  • *****
  • Posts: 2241
Re: match all occurences of 'cat' in a string
« Reply #8 on: May 01, 2019, 01:45:34 am »
Code: Pascal  [Select][+][-]
  1. function Count(What, InWhat: String): Integer;
  2. var
  3.   i, k: Integer;
  4.   NoCigar: Boolean;
  5. begin
  6.   Result := 0;
  7.   if Length(What) > Length(InWhat) then Exit;
  8.   if Length(What) = 0 then Exit;
  9.   for i := 1 to Length(InWhat) - Length(What) + 1 do
  10.   begin
  11.     NoCigar := False;
  12.     for k := 1 to Length(What) do
  13.     begin
  14.       if What[k] <> InWhat[k+i-1] then
  15.       begin
  16.         NoCigar := True;
  17.         Break;
  18.       end;
  19.     end;
  20.     if not NoCigar then Inc(Result);
  21.   end;
  22. end;
1. if Length(What) > Length(InWhat) then Exit; unnecessarily.
2. Using const is a little more efficient. And also (What = '') compared to the (Length(What) = 0).
3. On x64 platform, Integer size may not be enough for indexing.
4. Incorrect result if the part of the string matches the already found fragment (see example below). In this case, you need to skip the entire "What".
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2. {$MODE OBJFPC}
  3. {$LONGSTRINGS ON}
  4.  
  5. function Count(const What, Where: string): Integer;
  6.  
  7.   function IsEqualFromPos(Pos: SizeInt): Boolean;
  8.   var
  9.     i: SizeInt;
  10.   begin
  11.     // The comparison started from the end is usually faster detects the difference
  12.     for i := Length(What) downto 1 do
  13.       if What[i] <> Where[i + Pos - 1] then
  14.         Exit(False);
  15.     Result := True;
  16.   end;
  17.  
  18. var
  19.   CurPos, LastPos: SizeInt;
  20. begin
  21.   Result := 0;
  22.   if What = '' then
  23.     Exit;
  24.   CurPos := 1;
  25.   LastPos := Length(Where) - Length(What) + 1;
  26.   while CurPos <= LastPos do
  27.     if IsEqualFromPos(CurPos) then
  28.     begin
  29.       Inc(Result);
  30.       Inc(CurPos, Length(What));
  31.     end
  32.     else
  33.       Inc(CurPos);
  34. end;
  35.  
  36. begin
  37.   Writeln(Count('cat', 'the cat sits at home because it''s raining cats and dogs')); // 2
  38.   Writeln(Count('zz', 'zzzzz')); // 2
  39.   Writeln(Count('1', '121')); // 2
  40.   Writeln(Count('', 'zzzz')); // 0
  41.   Writeln(Count('1', '')); // 0
  42.   Writeln(Count('1', 'zz')); // 0
  43.   Readln;
  44. end.

erosolmiz

  • Newbie
  • Posts: 2
Re: match all occurences of 'cat' in a string
« Reply #9 on: May 01, 2019, 10:25:09 am »
thanks for all, very nice examples and variations.
this regex can find the 'cat' followed by z or i:
to test look bytebites code listed above
regex.Expression:='cat[z|i]';
but if we use the classic Positive lookahead: regex.Expression:='cat(?=z|i)'; 
i get errors
also if we want cat NOT followed by z or i ?
the usual regex is like this: cat(?!z|i)  it is negative lookahead i get errors by Lazarus, i have tried different ways such as
regex.Expression:='cat(?!z|i)';
the same errors with Negative lookbehind   (?<!z|i)cat
all these regexes tested successfully with the online regex testers
thanks
« Last Edit: May 01, 2019, 10:27:50 am by erosolmiz »

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: match all occurences of 'cat' in a string
« Reply #10 on: May 01, 2019, 12:21:20 pm »
1. if Length(What) > Length(InWhat) then Exit; unnecessarily.
Fast way out.

4. Incorrect result if the part of the string matches the already found fragment (see example below). In this case, you need to skip the entire "What".

That's a matter of definition.
If InWat = 'aaaa'
and What = 'aa'
I would say that 'aa' occurrs 3 times an 'aaaa': aaaa, aaaa and aaaa.

If you don't agree with this, use a while loop instead of for and increment i with length(what) if NoCigar=False.
This could be controlled by a parameter of course.
Also casesensistivity can be implemented and controlled by a parameter.

I leave that as an excercise to topic starter.

As a final remark: none of the code posted above (including my own) is enterprisy enough.
This needs a dedicated factory with abstract classes, configurable via XML (make that JSON, it's more of a hype now than XML) and it needs to support all stringtypes and character encoding (including Morse, EBCDIC etc.) anyone can imagine.

I challenge Thaddy to implement it that way  O:-)

Bart

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: match all occurences of 'cat' in a string
« Reply #11 on: May 03, 2019, 07:13:30 am »
but if we use the classic Positive lookahead: regex.Expression:='cat(?=z|i)'; 
i get errors
...
it is negative lookahead i get errors by Lazarus, i have tried different ways such as
regex.Expression:='cat(?!z|i)';
the same errors with Negative lookbehind   (?<!z|i)cat
all these regexes tested successfully with the online regex testers
thanks
None of the lookahead feature is implemented by TRegexpr (implemented ones are documented here).
also if we want cat NOT followed by z or i ?
the usual regex is like this: cat(?!z|i)
I don't usually use that, instead: cat[^zi] is my choice of pattern. No need to use fancy feature if simple one works ;)

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: match all occurences of 'cat' in a string - ( Problem )
« Reply #12 on: June 12, 2019, 11:07:10 pm »
I passed in the following string;

[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

and it returned 2.   I was expecting 1.



Universal solution for any Pascal dialect, without use of any unit other then system:

Code: Pascal  [Select][+][-]
  1. function Count(What, InWhat: String): Integer;
  2. var
  3.   i, k: Integer;
  4.   NoCigar: Boolean;
  5. begin
  6.   Result := 0;
  7.   if Length(What) > Length(InWhat) then Exit;
  8.   if Length(What) = 0 then Exit;
  9.   for i := 1 to Length(InWhat) - Length(What) + 1 do
  10.   begin
  11.     NoCigar := False;
  12.     for k := 1 to Length(What) do
  13.     begin
  14.       if What[k] <> InWhat[k+i-1] then
  15.       begin
  16.         NoCigar := True;
  17.         Break;
  18.       end;
  19.     end;
  20.     if not NoCigar then Inc(Result);
  21.   end;
  22. end;

Bart
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: match all occurences of 'cat' in a string - ( Problem )
« Reply #13 on: June 12, 2019, 11:49:55 pm »
I passed in the following string;

[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

and it returned 2.   I was expecting 1.

I see two:
[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

and here:
[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

and here as well:
[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

or maybe this:
[KPHX][123548]Niles][Nil][United States][62.458][-123.458]

Which one?

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: match all occurences of 'cat' in a string
« Reply #14 on: June 13, 2019, 02:02:29 am »
@engkin

The call was made:
Var
NilCount : Integer;

Begin
NilCount := Count('Nil',[KPHX][123548]Niles][Nil][United States][62.458][-123.458]);

Thus I think you would expect a return value of 1;

FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

 

TinyPortal © 2005-2018