Lazarus

Programming => General => Topic started by: maurobio on December 07, 2019, 09:11:03 pm

Title: [SOLVED] Extracting substrings from strings
Post by: maurobio on December 07, 2019, 09:11:03 pm
Dear ALL,

I have long strings of digits and symbols like this:

Code: Pascal  [Select][+][-]
  1. S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';

Each single digit or symbol counts as a single position. The digits enclosed between '(' and ')' should also count as a single position, therefore the above string should measure 85 positions (instead of 94, which is the length of the whole string).

I tried to play with the ExtractDelmited function from the StrUtils unit, but strangely the statement below:

Code: Pascal  [Select][+][-]
  1. C := ExtractDelimited(8, S, ['(', ')']);

returns nothing where I would expect it to return '(12)'.

Could someone give me a hand?

Thanks in advance!

Title: Re: Extracting substrings from strings
Post by: jamie on December 07, 2019, 09:19:23 pm
The help files states the first parameter is a WORD index, not a character index
so try this.

ExtractDelimited(1, s, ['(',')']);

Title: Re: Extracting substrings from strings
Post by: maurobio on December 07, 2019, 09:31:59 pm
@jamie,

This does not work. it returns '1110-00' (the beginning of the string, immediately before the first target substring).

Similarly, using

Code: Pascal  [Select][+][-]
  1. ExtractDelimited(9, S, ['(', ')'])

(where 9 is the index of the first '(') return the whole string after the last target string.

BTW, what the definition of ExtracDelimited (https://www.freepascal.org/docs-html/rtl/strutils/extractdelimited.html (https://www.freepascal.org/docs-html/rtl/strutils/extractdelimited.html)) says is:

Quote
Extract the N-th delimited part from a string.

Anyway, I am not even sure if this approach is the most adequate for this problem.

Cheers,
Title: Re: Extracting substrings from strings
Post by: winni on December 07, 2019, 10:00:06 pm
Hi!

Before reading for hours manuals I would do it the simple way
Code: Pascal  [Select][+][-]
  1. p,q : integer;
  2.  
  3. p := pos ('(',s);
  4. q := pos (')', s);
  5. Wanted := copy (s,p,q-p+1);
  6.  
  7.  
Winni
Title: Re: Extracting substrings from strings
Post by: jamie on December 07, 2019, 10:09:08 pm
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. const
  3.  S = '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  4. begin
  5.   Caption := Copy(S,Pos('(',S)+1,Pos(')',S)-Pos('(',S)-1);
  6.  
  7. end;
  8.                                                        
  9.  
Title: Re: Extracting substrings from strings
Post by: winni on December 07, 2019, 10:13:03 pm
I did not know that this was a competetion who writes the best C style code in Pascal.

Winni
Title: Re: Extracting substrings from strings
Post by: jamie on December 07, 2019, 10:17:35 pm
Me nether, just timing I guess.

 But in the case that he is doing, he would be better to use split there by specifying the '(' only..
then on the resulting of that, one can use split again on the ')' after which, there should be a list of all the numbers between the (.)  :)
Title: Re: Extracting substrings from strings
Post by: maurobio on December 07, 2019, 10:23:19 pm
@jamie,

OK, but this only gives me the first delimited substring. How do I get the others, till the end of the original string?

Cheers,
Title: Re: Extracting substrings from strings
Post by: jamie on December 07, 2019, 10:30:08 pm
That is what you asked for...

The first one..
in any case you can use split on the string..
Code: Pascal  [Select][+][-]
  1. var A :Array of string;
  2. Begin
  3.   A := S.Split(['(',')']);
  4.   If Length(A) <> 0 Then
  5.   Caption := A[3];  
  6. End;          
  7.  

Every odd number gives you the contents of each (?)..

Every even number gives you the contents of what is before it.

So
A[0] is the first content before the first (12)
A[1] is the (12);
A[2] is the content before the next (..);
A[3] etc////

Title: Re: Extracting substrings from strings
Post by: winni on December 07, 2019, 10:38:11 pm
Hi!

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var s : string;
  3. st : TStringList;
  4. p,q : integer;
  5. begin
  6. s:= '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  7. st := TStringList.create;
  8. repeat
  9. p := pos ('(',s);
  10. q := pos (')',s);
  11. if (q > 0) and (p> 0) then
  12.     begin
  13.     st.add(copy (s,p,q-p+1)); { with brackets }
  14.     delete (s,1,q);
  15.     end;
  16. until (p=0) or (q=0);
  17. showMessage (st.text);
  18. st.free;
  19. end;                    

Winni
Title: Re: Extracting substrings from strings
Post by: jamie on December 07, 2019, 10:48:51 pm
Yum, Code!
Title: Re: Extracting substrings from strings
Post by: dbannon on December 08, 2019, 12:43:09 am
or

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.     S, Buff : string;
  4.     PC : pchar;
  5.     I : integer = 1;
  6.     Bracket : boolean = false;
  7. begin
  8.   s:= '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  9.   PC := pchar(S);
  10.   while I < length(s) do begin
  11.     if (PC+i)^ = ')' then begin
  12.         if Buff <> '' then writeln('found one ' + Buff);
  13.         Bracket := False;
  14.     end;
  15.     if (PC+i)^ = '(' then begin
  16.         Buff := '';
  17.         Bracket := True;
  18.     end else
  19.         if Bracket then Buff := Buff + (PC+i)^;
  20.     inc(i);
  21.   end;
  22. end;      
     

Bet winni will find that a bit too C like.  :P

Should, IMHO be a bit faster and could be tweaked a lot. Only important if there is a lot of data to process.

Davo
Title: Re: Extracting substrings from strings
Post by: winni on December 08, 2019, 01:12:48 am
@dbannon

No, I don't fight about some nanoseconds.

When it was necessary because of slow machines (long time ago...) you should have looked into my code. move was my best friend. But today with 4 cores and 8 threads ....

And they are all lazy. Even if I hear radio all day ....

But where are your results????

Winni
Title: Re: Extracting substrings from strings
Post by: 440bx on December 08, 2019, 01:41:58 am
... and how does  all that code handle a string where the open and close parentheses are not balanced ? 
Title: Re: Extracting substrings from strings
Post by: Zvoni on December 08, 2019, 01:53:08 am
I would have expected  someone mentioning the Copy2Symb-Function...
Title: Re: Extracting substrings from strings
Post by: winni on December 08, 2019, 01:57:22 am
... and how does  all that code handle a string where the open and close parentheses are not balanced ?

Garbage in, garbage out!

Title: Re: Extracting substrings from strings
Post by: dbannon on December 08, 2019, 03:13:11 am
No, I don't fight about some nanoseconds.
Sorry, old habits. Come from an HPC background. If CPU usage dropped below about 90% that job could be killed. But I was the sysadmin, not the programmer.
 
Quote
But where are your results????

Results ? You mean -

found one 12
found one 01
found one 12

Not bothering with timing, we both have fast machines but which is faster ?

@440bx - come on, you can see that. In my example its
Code: Pascal  [Select][+][-]
  1.     ......
  2.     if (PC+i)^ = ')' then begin
  3.         if not Bracket then break;     // new line to handle bad data
  4.    ......
  5.    if (PC+i)^ = '(' then begin
  6.        if Bracket then break;     // new line to handle bad data
  7.    .....

And a similar line at the end to make sure we don't have an unclosed bracket. I had them in initially but removed them trying to make mine smaller than Winni's.  Still did not make it so fell back to saying it would be faster  :P

What you actually do on an error is dependent on the application.  Personally, I'd be more worried about what happens if there is is any UTF8 char in there ....

Davo



Title: Re: Extracting substrings from strings
Post by: 440bx on December 08, 2019, 03:38:06 am

@440bx - come on, you can see that. In my example its

Code: Pascal  [Select][+][-]
  1.     ......
  2.     if (PC+i)^ = ')' then begin
  3.         if not Bracket then break;     // new line to handle bad data
  4.    ......
  5.    if (PC+i)^ = '(' then begin
  6.        if Bracket then break;     // new line to handle bad data
  7.    .....

I had them in initially but removed them trying to make mine smaller than Winni's.
<snip>
Personally, I'd be more worried about what happens if there is is any UTF8 char in there ....

Davo
No wonder I didn't see them, you removed them, therefore they weren't present in your post.  Also, it's not just your code that made me wonder about error handling in case of bad input.

I agree that error handling is application dependent and the "user" didn't state what should be done in case of a malformed string.  I agree with your observation about UTF8.  If correct input consists of something other than only digits and parentheses then it could definitely be a concern.
Title: Re: Extracting substrings from strings
Post by: PaulRowntree on December 08, 2019, 07:27:13 am
Man, where were you guys when I was trying to do _my_ programming assignments?
Title: Re: Extracting substrings from strings
Post by: bytebites on December 08, 2019, 09:15:22 am
Code: Pascal  [Select][+][-]
  1. writeln(ExtractDelimited(2, S, ['(',')']));

gives 12
Title: Re: Extracting substrings from strings
Post by: Roland57 on December 08, 2019, 09:36:47 am
Hello!

We could use regular expressions.

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, RegExpr;
  3.  
  4. function Extract(const AStr: string; const APos: integer): string;
  5. var
  6.   E: TRegExpr;
  7.   I: integer;
  8. begin
  9.   E := TRegExpr.Create('\(\d+\)|.');
  10.   I := 0;
  11.   result := '';
  12.   try
  13.     if E.Exec(AStr) then
  14.     repeat
  15.       Inc(I);
  16.       if I = APos then
  17.       begin
  18.         result := E.Match[0];
  19.         Break;
  20.       end;
  21.     until not E.ExecNext;
  22.   finally
  23.     E.Free;
  24.   end;
  25. end;
  26.  
  27. var
  28.   S: string;
  29.  
  30. begin
  31.   S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  32.   WriteLn(Extract(S, 1));
  33.   WriteLn(Extract(S, 5));
  34.   WriteLn(Extract(S, 8));
  35. end.

Code: [Select]
1
-
(12)

 8-)
Title: Re: Extracting substrings from strings
Post by: maurobio on December 08, 2019, 11:17:38 am
Dear ALL,

Thank you very much for the many suggestions! I am fairly amazed by the repercussion of my humble question.  ;)

Unfortunately, the suggestions by @bytebites and @Roland57, clear as they are, lack in generality because they assume the substrings have fixed positions along the string, which they do not. For exemple, I can have strings as:

Code: Pascal  [Select][+][-]
  1. S1 := '13(23)(01)-002000201000000(01)020(01)0020000020010100202-00-00(01)00022(234)1??01?????????1?3?2?????????';

or

Code: Pascal  [Select][+][-]
  1. S2 := '13(0123)(02)000210(01)00110000000201013000002(01)200120004010-10000020(12)1120221121123010302101113002';

(Notice that the delimited substrings can be more than two characters in length).

The solution by @winni seems to deal with all these situations, but I have not yet tested it.

Now, let me take this opportunity to explain in a little bit more detail what I want to do:

I need to run along these strings, getting each character and storing it into a stringlist. However, the substrings delimited by '(' and ')' should be treated as a single 'character' and stored together.

For example, if the original long string is:

Code: Pascal  [Select][+][-]
  1. S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';

I should have the characters stored into a stringlist as:

Code: Pascal  [Select][+][-]
  1. Slist.Add('1');
  2. Slist.Add('1');
  3. Slist.Add('1');
  4. Slist.Add('0');
  5. Slist.Add('-');
  6. Slist.Add('0');
  7. Slist.Add('0');
  8. Slist.Add('12');
  9.  

... and so on.

It looks like I can achieve that with the solution proposed by @winni, but I have not tried it as yet.

Again, thank you very much for the many suggestions!

Cheers,


Title: Re: Extracting substrings from strings
Post by: Roland57 on December 08, 2019, 11:55:38 am
Another attempt.  :)

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, Classes, RegExpr;
  3.  
  4. procedure Extract(const AStr: string; const AList: TStringList);
  5. var
  6.   E: TRegExpr;
  7. begin
  8.   E := TRegExpr.Create('\(\d+\)|.');
  9.   try
  10.     if E.Exec(AStr) then
  11.     repeat
  12.       if E.Match[0][1] = '(' then
  13.         AList.Append(Copy(E.Match[0], 2, Length(E.Match[0]) - 2))
  14.       else
  15.         AList.Append(E.Match[0]);
  16.     until not E.ExecNext;
  17.   finally
  18.     E.Free;
  19.   end;
  20. end;
  21.  
  22. var
  23.   S: string;
  24.   L: TStringList;
  25.   I: integer;
  26.  
  27. begin
  28.   S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  29.   L := TStringList.Create;
  30.   Extract(S, L);
  31.   for I := 0 to L.Count - 1 do WriteLn(L[I]);
  32.   L.Free;
  33. end.
  34.  
Title: Re: Extracting substrings from strings
Post by: maurobio on December 08, 2019, 12:13:52 pm
@Roland57,

Voilá!

It works perfectly! Long live regexes!

Thanks!

Cheers,
TinyPortal © 2005-2018