Recent

Author Topic: [SOLVED] Extracting substrings from strings  (Read 6228 times)

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Extracting substrings from strings
« Reply #15 on: December 08, 2019, 01:57:22 am »
... and how does  all that code handle a string where the open and close parentheses are not balanced ?

Garbage in, garbage out!


dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Extracting substrings from strings
« Reply #16 on: December 08, 2019, 03:13:11 am »
No, I don't fight about some nanoseconds.
Sorry, old habits. Come from an HPC background. If CPU usage dropped below about 90% that job could be killed. But I was the sysadmin, not the programmer.
 
Quote
But where are your results????

Results ? You mean -

found one 12
found one 01
found one 12

Not bothering with timing, we both have fast machines but which is faster ?

@440bx - come on, you can see that. In my example its
Code: Pascal  [Select][+][-]
  1.     ......
  2.     if (PC+i)^ = ')' then begin
  3.         if not Bracket then break;     // new line to handle bad data
  4.    ......
  5.    if (PC+i)^ = '(' then begin
  6.        if Bracket then break;     // new line to handle bad data
  7.    .....

And a similar line at the end to make sure we don't have an unclosed bracket. I had them in initially but removed them trying to make mine smaller than Winni's.  Still did not make it so fell back to saying it would be faster  :P

What you actually do on an error is dependent on the application.  Personally, I'd be more worried about what happens if there is is any UTF8 char in there ....

Davo



Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Extracting substrings from strings
« Reply #17 on: December 08, 2019, 03:38:06 am »

@440bx - come on, you can see that. In my example its

Code: Pascal  [Select][+][-]
  1.     ......
  2.     if (PC+i)^ = ')' then begin
  3.         if not Bracket then break;     // new line to handle bad data
  4.    ......
  5.    if (PC+i)^ = '(' then begin
  6.        if Bracket then break;     // new line to handle bad data
  7.    .....

I had them in initially but removed them trying to make mine smaller than Winni's.
<snip>
Personally, I'd be more worried about what happens if there is is any UTF8 char in there ....

Davo
No wonder I didn't see them, you removed them, therefore they weren't present in your post.  Also, it's not just your code that made me wonder about error handling in case of bad input.

I agree that error handling is application dependent and the "user" didn't state what should be done in case of a malformed string.  I agree with your observation about UTF8.  If correct input consists of something other than only digits and parentheses then it could definitely be a concern.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

PaulRowntree

  • Full Member
  • ***
  • Posts: 132
    • Paul Rowntree
Re: Extracting substrings from strings
« Reply #18 on: December 08, 2019, 07:27:13 am »
Man, where were you guys when I was trying to do _my_ programming assignments?
Paul Rowntree
- coding for instrument control, data acquisition & analysis, CNC systems

bytebites

  • Hero Member
  • *****
  • Posts: 633
Re: Extracting substrings from strings
« Reply #19 on: December 08, 2019, 09:15:22 am »
Code: Pascal  [Select][+][-]
  1. writeln(ExtractDelimited(2, S, ['(',')']));

gives 12

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: Extracting substrings from strings
« Reply #20 on: December 08, 2019, 09:36:47 am »
Hello!

We could use regular expressions.

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, RegExpr;
  3.  
  4. function Extract(const AStr: string; const APos: integer): string;
  5. var
  6.   E: TRegExpr;
  7.   I: integer;
  8. begin
  9.   E := TRegExpr.Create('\(\d+\)|.');
  10.   I := 0;
  11.   result := '';
  12.   try
  13.     if E.Exec(AStr) then
  14.     repeat
  15.       Inc(I);
  16.       if I = APos then
  17.       begin
  18.         result := E.Match[0];
  19.         Break;
  20.       end;
  21.     until not E.ExecNext;
  22.   finally
  23.     E.Free;
  24.   end;
  25. end;
  26.  
  27. var
  28.   S: string;
  29.  
  30. begin
  31.   S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  32.   WriteLn(Extract(S, 1));
  33.   WriteLn(Extract(S, 5));
  34.   WriteLn(Extract(S, 8));
  35. end.

Code: [Select]
1
-
(12)

 8-)
My projects are on Gitlab and on Codeberg.

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Extracting substrings from strings
« Reply #21 on: December 08, 2019, 11:17:38 am »
Dear ALL,

Thank you very much for the many suggestions! I am fairly amazed by the repercussion of my humble question.  ;)

Unfortunately, the suggestions by @bytebites and @Roland57, clear as they are, lack in generality because they assume the substrings have fixed positions along the string, which they do not. For exemple, I can have strings as:

Code: Pascal  [Select][+][-]
  1. S1 := '13(23)(01)-002000201000000(01)020(01)0020000020010100202-00-00(01)00022(234)1??01?????????1?3?2?????????';

or

Code: Pascal  [Select][+][-]
  1. S2 := '13(0123)(02)000210(01)00110000000201013000002(01)200120004010-10000020(12)1120221121123010302101113002';

(Notice that the delimited substrings can be more than two characters in length).

The solution by @winni seems to deal with all these situations, but I have not yet tested it.

Now, let me take this opportunity to explain in a little bit more detail what I want to do:

I need to run along these strings, getting each character and storing it into a stringlist. However, the substrings delimited by '(' and ')' should be treated as a single 'character' and stored together.

For example, if the original long string is:

Code: Pascal  [Select][+][-]
  1. S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';

I should have the characters stored into a stringlist as:

Code: Pascal  [Select][+][-]
  1. Slist.Add('1');
  2. Slist.Add('1');
  3. Slist.Add('1');
  4. Slist.Add('0');
  5. Slist.Add('-');
  6. Slist.Add('0');
  7. Slist.Add('0');
  8. Slist.Add('12');
  9.  

... and so on.

It looks like I can achieve that with the solution proposed by @winni, but I have not tried it as yet.

Again, thank you very much for the many suggestions!

Cheers,


UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Roland57

  • Sr. Member
  • ****
  • Posts: 421
    • msegui.net
Re: Extracting substrings from strings
« Reply #22 on: December 08, 2019, 11:55:38 am »
Another attempt.  :)

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, Classes, RegExpr;
  3.  
  4. procedure Extract(const AStr: string; const AList: TStringList);
  5. var
  6.   E: TRegExpr;
  7. begin
  8.   E := TRegExpr.Create('\(\d+\)|.');
  9.   try
  10.     if E.Exec(AStr) then
  11.     repeat
  12.       if E.Match[0][1] = '(' then
  13.         AList.Append(Copy(E.Match[0], 2, Length(E.Match[0]) - 2))
  14.       else
  15.         AList.Append(E.Match[0]);
  16.     until not E.ExecNext;
  17.   finally
  18.     E.Free;
  19.   end;
  20. end;
  21.  
  22. var
  23.   S: string;
  24.   L: TStringList;
  25.   I: integer;
  26.  
  27. begin
  28.   S := '1110-00(12)1000011000000020(01)02(12)0000020300100204020-0000002021122222010023013322101002000';
  29.   L := TStringList.Create;
  30.   Extract(S, L);
  31.   for I := 0 to L.Count - 1 do WriteLn(L[I]);
  32.   L.Free;
  33. end.
  34.  
My projects are on Gitlab and on Codeberg.

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Extracting substrings from strings
« Reply #23 on: December 08, 2019, 12:13:52 pm »
@Roland57,

Voilá!

It works perfectly! Long live regexes!

Thanks!

Cheers,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

 

TinyPortal © 2005-2018