Recent

Author Topic: [SOLVED] Stripping inner tags from a string  (Read 587 times)

maurobio

  • Full Member
  • ***
  • Posts: 175
[SOLVED] Stripping inner tags from a string
« on: February 25, 2021, 01:45:23 pm »
Dear ALL,

I have some strange strings, with tags inside tags like this:

<<5%>> or <rarely <5%>>

and I want to strip these inner tags, so that in the first case I would have an empty tag ('<>'), and in the second case I would have a string like '<rarely>'.

I came up with a simplistic function as follows:

Code: Pascal  [Select][+][-]
  1. function StripInner(const S: string): string;
  2. begin
  3.     Result := Copy(S, Pos('<<', S) + 1, Length(S) - 2);
  4. end;
  5.  

but it just strips the outer brackets, keeping the inner tag ('<5%>') in the first case and returning a completely wrong string ('<rarely <5%') in the second case.

Could anyone give me hint on how to do this?

Thanks in advance!

With best wishes,
« Last Edit: February 25, 2021, 07:02:26 pm by maurobio »
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 1.9.3/2.0.8 - FPC 3.0.4 on GNU/Linux Mint 19 ("Tessa"), Windows XP SP3, Windows 7 Professional, Windows 10 Home

lucamar

  • Hero Member
  • *****
  • Posts: 3777
Re: Stripping inner tags from a string
« Reply #1 on: February 25, 2021, 02:00:27 pm »
Parse the string by chars (or using PosEx): when you encounter a "<" move the parsing to a function to do the same but deleting from where it finds a "<" to where it finds a ">" and return. Keep going on until there's no more text.

Or the same function, recursively, keeping a counter (or flag) to know whether it's in the top level and, if not, do the deletion.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

engkin

  • Hero Member
  • *****
  • Posts: 2662
Re: Stripping inner tags from a string
« Reply #2 on: February 25, 2021, 02:32:57 pm »
I have some strange strings, with tags inside tags like this:

<<5%>> or <rarely <5%>>

and I want to strip these inner tags, so that in the first case I would have an empty tag ('<>'), and in the second case I would have a string like '<rarely>'.

I came up with a simplistic function as follows:

Code: Pascal  [Select][+][-]
  1. function StripInner(const S: string): string;
  2. begin
  3.     Result := Copy(S, Pos('<<', S) + 1, Length(S) - 2);
  4. end;
  5.  

but it just strips the outer brackets, keeping the inner tag ('<5%>') in the first case and returning a completely wrong string ('<rarely <5%') in the second case.

Could anyone give me hint on how to do this?

Use Regular Expressions:
Code: Pascal  [Select][+][-]
  1. uses
  2.   ..., regexpr;
  3.  
  4. var
  5.   s: String;
  6. begin
  7.   s := '<rarely <5%>>';
  8.   s := ReplaceRegExpr('<\d+%>',s,'');

The first parameter '<\d+%>' means look for < followed by a digit or more, followed by %>

The third parameter is the replacement string, empty string is to delete the matching parts.
« Last Edit: February 25, 2021, 02:40:13 pm by engkin »

engkin

  • Hero Member
  • *****
  • Posts: 2662
Re: Stripping inner tags from a string
« Reply #3 on: February 25, 2021, 02:57:02 pm »
I missed the part where you said "tags inside tags". This probably is what you meant:
Code: Pascal  [Select][+][-]
  1.   s := ReplaceRegExpr('<([^>]*)(<\d+%>)>',s,'<$1>',true);

howardpc

  • Hero Member
  • *****
  • Posts: 3680
Re: Stripping inner tags from a string
« Reply #4 on: February 25, 2021, 02:58:53 pm »
Or use brute force:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   SysUtils;
  7.  
  8.  function CleanedInnerTagsOK(aText: String; out FreeOfInnerTags: String): Boolean;
  9.  var
  10.    p1, p2: SizeInt;
  11.  begin
  12.    FreeOfInnerTags := aText;
  13.    p1 := Pos('<<', aText);
  14.    p2 := Pos('>>', aText);
  15.    Result := (p1 > 0) or (p2 > 0);
  16.    if Result then
  17.      if (p1 > 0) and (p2 > 0) then
  18.          FreeOfInnerTags := Copy(aText, 1, p1) + Copy(aText, Succ(p2), MaxInt)
  19.      else if (p2 > 0) then
  20.        begin
  21.          p1 := p2;
  22.          while (p1 > 1) and (aText[p1] <> '<') do
  23.            Dec(p1);
  24.          FreeOfInnerTags := Trim(Copy(aText, 1, Pred(p1))) + Copy(aText, Succ(p2), MaxInt);
  25.        end
  26.      else
  27.        begin
  28.          p2 := p1;
  29.          while (p2 < Length(aText)) and (aText[p2] <> '>') do
  30.            Inc(p2);
  31.          FreeOfInnerTags := Copy(aText, 1, p1) + Trim(Copy(aText, Succ(p2), MaxInt));
  32.        end;
  33.  end;
  34.  
  35. var
  36.   s1: String = 's1 <<5%>> s1';            // '<>'
  37.   s2: String = 's2  <rarely <5%>>  s2';   // '<rarely>'
  38.   s3: String = 's3  <<6%>  seldom>  s3';
  39.   s4: String = 's4 <tag>   <tag>  other text <tag> s4';
  40.   s: String;
  41.  
  42. begin
  43.   WriteLn('CleanedInnerTagsOK(s1, s) is ',CleanedInnerTagsOK(s1, s):5,', s = "',s,'"');
  44.   WriteLn('CleanedInnerTagsOK(s2, s) is ',CleanedInnerTagsOK(s2, s):5,', s = "',s,'"');
  45.   WriteLn('CleanedInnerTagsOK(s3, s) is ',CleanedInnerTagsOK(s3, s):5,', s = "',s,'"');
  46.   WriteLn('CleanedInnerTagsOK(s4, s) is ',CleanedInnerTagsOK(s4, s):5,', s = "',s,'"');
  47.   ReadLn;
  48. end.


maurobio

  • Full Member
  • ***
  • Posts: 175
Re: Stripping inner tags from a string
« Reply #5 on: February 25, 2021, 03:10:56 pm »
Dear ALL,

Thanks a lot for your helpful suggestions!

Ah, Regular Expressions! I love them all (although can't understand them)!  ::)

Both regexes suggested by @engkin work well for me and in fact return the same result, i.e., for the string '<rarely <5%>' they return '<rarely >' what is what I want (although there remains an annoying blank at the end of the string which I would also be pleased to get rid of).

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 1.9.3/2.0.8 - FPC 3.0.4 on GNU/Linux Mint 19 ("Tessa"), Windows XP SP3, Windows 7 Professional, Windows 10 Home

maurobio

  • Full Member
  • ***
  • Posts: 175
Re: Stripping inner tags from a string
« Reply #6 on: February 25, 2021, 03:18:48 pm »
By the way, the 'brute force' solution by @howardpc is quite poweful indeed!  ;)
« Last Edit: February 25, 2021, 03:38:18 pm by maurobio »
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 1.9.3/2.0.8 - FPC 3.0.4 on GNU/Linux Mint 19 ("Tessa"), Windows XP SP3, Windows 7 Professional, Windows 10 Home

engkin

  • Hero Member
  • *****
  • Posts: 2662
Re: Stripping inner tags from a string
« Reply #7 on: February 25, 2021, 05:20:54 pm »
Keeping track of the nested level based on < and >, it makes it easier to decide what to copy: 
Code: Pascal  [Select][+][-]
  1. function DeleteTags(const AStr: String): String;
  2. var
  3.   Lvl: integer = 0;
  4.   p: integer = 0;
  5.   c: Char;
  6. begin
  7.   SetLength(Result, AStr.Length);
  8.   for c in AStr do
  9.   begin
  10.     if c='<' then inc(Lvl);
  11.     if Lvl<=1 then
  12.     begin
  13.       inc(p);
  14.       Result[p] := c;
  15.     end;
  16.     if c='>' then dec(Lvl);
  17.   end;
  18.   Result.Length:=p;
  19. end;

Roland57

  • Full Member
  • ***
  • Posts: 148
Re: Stripping inner tags from a string
« Reply #8 on: February 25, 2021, 05:45:33 pm »
Hello!

Both regexes suggested by @engkin work well for me and in fact return the same result, i.e., for the string '<rarely <5%>' they return '<rarely >' what is what I want (although there remains an annoying blank at the end of the string which I would also be pleased to get rid of).

You could do like this:

Code: Pascal  [Select][+][-]
  1.   // V1
  2.   s := ReplaceRegExpr(' *<\d+%>', s, '', FALSE);
  3.   // V2
  4.   s := ReplaceRegExpr('<([^>]*?)( *<\d+%>)>', s, '<$1>', TRUE);

Regards.

Roland

maurobio

  • Full Member
  • ***
  • Posts: 175
Re: Stripping inner tags from a string
« Reply #9 on: February 25, 2021, 07:02:08 pm »
Dear ALL,

All solutions worked well! Thanks a lot!  :D

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 1.9.3/2.0.8 - FPC 3.0.4 on GNU/Linux Mint 19 ("Tessa"), Windows XP SP3, Windows 7 Professional, Windows 10 Home

Peter H

  • Full Member
  • ***
  • Posts: 197
Re: [SOLVED] Stripping inner tags from a string
« Reply #10 on: February 25, 2021, 07:24:38 pm »
My try:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. //uses
  6. //  SysUtils;
  7.  
  8.  function skipinner(const s: String; var level:integer): String;
  9.  //Skips inner Tages, when nestinglevel is larger than level
  10.  //Returns recognised level match
  11.  //Edit: Removes Spaces if Level >= 1.
  12.  //If the loop is small and local enough to fit in 1st level cache,
  13.  //it should run 10 times faster than in 2nd level and 100 times faster than in memory.
  14.  
  15.  var
  16.    l: integer;
  17.    c: char;
  18.  begin
  19.    Result:='';
  20.    l := 0;
  21.    for c in s do begin
  22.      if c = '<' then inc(l);
  23.      if l <= level then
  24.        {Edit:}
  25.        if (c <> ' ') or (l < 1) then
  26.          Result := Result+c;
  27.      if c = '>' then dec(l);
  28.    end;
  29.    level := l;
  30.  end;
  31.  
  32. var
  33.   s1: String = 's1 <<5%>> s1';            // '<>'
  34.   s2: String = 's2  <rarely <5%>>  s2';   // '<rarely>'
  35.   s3: String = 's3  <<6%>  seldom>  s3';
  36.   s4: String = 's4 <tag>   <tag>  other text <tag> s4';
  37.   level:integer;
  38. begin
  39.   level :=1;
  40.   WriteLn(skipinner(s1, level),#9'    Level:',level );
  41.   level :=1;
  42.   WriteLn(skipinner(s2, level),#9'    Level:',level );
  43.   level :=1;
  44.   WriteLn(skipinner(s3, level),#9'    Level:',level );
  45.   level :=1;
  46.   WriteLn(skipinner(s4, level),#9'    Level:',level );
  47.  
  48.   ReadLn;
  49. end.
« Last Edit: February 25, 2021, 10:14:43 pm by Peter H »

maurobio

  • Full Member
  • ***
  • Posts: 175
Re: [SOLVED] Stripping inner tags from a string
« Reply #11 on: February 25, 2021, 10:25:49 pm »
@Peter H,

Nice solution with a level variable. Thanks a lot!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 1.9.3/2.0.8 - FPC 3.0.4 on GNU/Linux Mint 19 ("Tessa"), Windows XP SP3, Windows 7 Professional, Windows 10 Home

 

TinyPortal © 2005-2018