Recent

Author Topic: How to split delimited strings?  (Read 1147 times)

rgh

  • New Member
  • *
  • Posts: 45
How to split delimited strings?
« on: October 13, 2019, 01:50:09 pm »
I'm wanting to take a space delimited string and split it into its component parts.

Here is a line from an apache log:

Code: [Select]
185.86.164.111 [13/Oct/2019:11:10:29 +0000] "GET /wp-login.php HTTP/1.1" 404 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
The 2nd field, wrapped in [] brackets is the time. It contains a space, but it's all to be treated as one.
The 3rd field, wrapped in double quotes and containing spaces, is the first line of the request.
The 5th field, again wrapped in double quotes and containing spaces, is the useragent.

I don't think I can use
Code: Pascal  [Select]
  1. ExtractDelimited(fieldNumber,logEntry, [' ']);
as it simply splits on the spaces.

What's the best way to do this?

I could probably adjust the time format of the log to  something that doesn't use [] brackets, or replace the [] brackets with double quotes, before doing the split.



Thaddy

  • Hero Member
  • *****
  • Posts: 9190
Re: How to split delimited strings?
« Reply #1 on: October 13, 2019, 03:01:09 pm »
Something like this?:
Code: Pascal  [Select]
  1. {$mode objfpc}{$H+}
  2. uses
  3.   sysutils;
  4. var
  5.   a:AnsiString = '185.86.164.111 [13/Oct/2019:11:10:29 +0000] "GET /wp-login.php HTTP/1.1" 404 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"';
  6.   aa:array of string;    
  7. begin
  8.   aa := a.split(['"','[',']']);
  9.   for a in aa do writeln(a);
  10. end.

Outputs:
Code: Bash  [Select]
  1. 185.86.164.111
  2. 13/Oct/2019:11:10:29 +0000
  3.  
  4. GET /wp-login.php HTTP/1.1
  5.  404
  6. Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
  7.  
also related to equus asinus.

rgh

  • New Member
  • *
  • Posts: 45
Re: How to split delimited strings?
« Reply #2 on: October 13, 2019, 04:33:06 pm »
Thanks for that technique Thaddy.

Looking at https://wiki.freepascal.org/Character_and_string_types#AnsiString I thought AnsiString is a pointer to the string in memory. Yet you are calling a .split() method on it as though it were an Object instance. How come that is legal ? Could you direct me to the documentation of what methods are available?

howardpc

  • Hero Member
  • *****
  • Posts: 3182
Re: How to split delimited strings?
« Reply #3 on: October 13, 2019, 05:20:35 pm »
Looking at https://wiki.freepascal.org/Character_and_string_types#AnsiString I thought AnsiString is a pointer to the string in memory. Yet you are calling a .split() method on it as though it were an Object instance. How come that is legal ? Could you direct me to the documentation of what methods are available?
FPC (and Delphi) introduced type helpers a while ago.
The RTL now contains a great many methods, implemented as type helper routines that work with a wide variety of basic types such as String and Integer.
They are accessed using the "dot" notation.

One way to find out what is available (in the Lazarus IDE) if you have "aString.Split(..)" is to place the cursor somewhere in "Split" and press Alt-UpArrow which takes you to the source declaration(s).
« Last Edit: October 13, 2019, 05:23:07 pm by howardpc »

Thaddy

  • Hero Member
  • *****
  • Posts: 9190
Re: How to split delimited strings?
« Reply #4 on: October 13, 2019, 06:20:22 pm »
Indeed.
I personally always try to use the newer features for my examples on the forum because it shows Object Pascal is a living language with many more features than most people coming from other languages expect. The example was intentional in that sense.

Those features are fully - in the case of split() a bit sparsely - documented if they are part of a release version, e.g:
https://www.freepascal.org/docs-html/rtl/sysutils/tstringhelper.html  or the corresponding pdf.
« Last Edit: October 13, 2019, 06:29:04 pm by Thaddy »
also related to equus asinus.

howardpc

  • Hero Member
  • *****
  • Posts: 3182
Re: How to split delimited strings?
« Reply #5 on: October 13, 2019, 07:04:38 pm »
Note that .Split is not perfect in this case.
It inserts an unwanted space character, and an unwanted line break.
To get a "perfect" routine you have to write a custom one such as the following.
Code: Pascal  [Select]
  1. program project1;
  2.  
  3. {$Mode objfpc}{$H+}
  4.  
  5. uses sysutils;
  6.  
  7. var
  8.   txt: String = '185.86.164.111 [13/Oct/2019:11:10:29 +0000] "GET /wp-login.php HTTP/1.1" 404 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"';
  9.   arr: TStringArray;
  10.   s: String;
  11.  
  12. type
  13.   TEnclosers = record
  14.     left, right: Char;
  15.   end;
  16.   TEnclosersArray = array of TEnclosers;
  17.  
  18.   function SplitToArrayOK(const aTxt: String; enclosers: array of const; out StringArray: TStringArray; separator: Char = ' '): Boolean;
  19.   var
  20.     lefts: TSysCharSet = [];
  21.     rights: TSysCharSet = [];
  22.     match: Char = #0;
  23.     p, j, encsHigh: Integer;
  24.     encsArray: TEnclosersArray;
  25.     inEnclosure: Boolean = False;
  26.  
  27.     function GetRightForLeft(aLeft: Char): Char;
  28.     var
  29.       i: Integer;
  30.     begin
  31.       for i := 0 to High(encsArray) do
  32.         if encsArray[i].left = aLeft then
  33.           Exit(encsArray[i].right);
  34.       Result := #0;
  35.     end;
  36.  
  37.   begin
  38.     SetLength(StringArray{%H-}, 0);
  39.     SetLength(encsArray{%H-}, 0);
  40.     Result := False;
  41.     encsHigh := High(enclosers);
  42.     if (encsHigh = -1) or not Odd(encsHigh) then
  43.       Exit;
  44.     SetLength(encsArray, Length(enclosers) div 2);
  45.     j := 0;
  46.     for p := 0 to encsHigh do
  47.       begin
  48.         if enclosers[p].VType <> vtChar then
  49.           Exit;
  50.         case Odd(p) of
  51.           False:
  52.             begin
  53.               Include(lefts, enclosers[p].VChar);
  54.               encsArray[j].left := enclosers[p].VChar;
  55.             end;
  56.           True:
  57.             begin
  58.               Include(rights, enclosers[p].VChar);
  59.               encsArray[j].right := enclosers[p].VChar;
  60.               Inc(j);
  61.             end;
  62.         end;
  63.       end;
  64.  
  65.     for p := 1 to Length(aTxt) do
  66.       case (aTxt[p] in lefts) of
  67.         True:
  68.           begin
  69.             case (match = #0) of
  70.               True:
  71.                 begin
  72.                   match := GetRightForLeft(aTxt[p]);
  73.                   inEnclosure := True;
  74.                 end;
  75.               False:
  76.                 case (aTxt[p] in rights) of
  77.                   True:
  78.                     begin
  79.                       match := #0;
  80.                       inEnclosure := False;
  81.                     end;
  82.                   False: Exit;
  83.                 end;
  84.             end;
  85.           end;
  86.         False:
  87.           begin
  88.             case (aTxt[p] in rights) of
  89.               True:
  90.                 begin
  91.                   case (aTxt[p] = match) of
  92.                     True:
  93.                       begin
  94.                         match := #0;
  95.                         inEnclosure := False;
  96.                       end;
  97.                     False: Exit;
  98.                   end;
  99.                 end;
  100.               False:
  101.                 case inEnclosure of
  102.                   True:
  103.                     begin
  104.                       if Length(StringArray) = 0 then
  105.                         SetLength(StringArray, 1);
  106.                       StringArray[High(StringArray)] += aTxt[p];
  107.                     end;
  108.                   False:
  109.                     case (aTxt[p] = separator) of
  110.                       True: SetLength(StringArray, Length(StringArray)+1);
  111.                       False:
  112.                         begin
  113.                           if Length(StringArray) = 0 then
  114.                             SetLength(StringArray, 1);
  115.                           StringArray[High(StringArray)] += aTxt[p];
  116.                         end;
  117.                       end;
  118.                 end;
  119.             end;
  120.           end;
  121.       end;
  122.     Result := True;
  123.   end;
  124.  
  125. begin
  126.   if SplitToArrayOK(txt,  ['"','"','[',']'], arr) then
  127.     for s in arr do
  128.       WriteLn(s)
  129.   else Writeln('invalid data or parameters');
  130. end.

Thaddy

  • Hero Member
  • *****
  • Posts: 9190
Re: How to split delimited strings?
« Reply #6 on: October 13, 2019, 07:16:51 pm »
Another option is a multi pass with split, using a stringlist and the text property. That may be simpler.
also related to equus asinus.

krexon

  • Jr. Member
  • **
  • Posts: 68
Re: How to split delimited strings?
« Reply #7 on: October 13, 2019, 07:46:48 pm »
You can also find positions of '[' and ']' chars, using pos command. Then use copy command to extract text between these chars.

simone

  • Sr. Member
  • ****
  • Posts: 255
Re: How to split delimited strings?
« Reply #8 on: October 13, 2019, 08:10:40 pm »
You can also find positions of '[' and ']' chars, using pos command. Then use copy command to extract text between these chars.

Sometimes I have successfully followed this way to make simple log file analyzers. Using this approach the 'posex' companion function can also be useful, as it allows you to specify the offset from which to start, facilitating an incremental search within each line of log.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7508
Re: How to split delimited strings?
« Reply #9 on: October 13, 2019, 08:37:56 pm »
Code: Pascal  [Select]
  1. function ExtractBetween(const Value, A, B: string): string;
  2. var
  3.   aPos, bPos: Integer;
  4. begin
  5.   result := '';
  6.   aPos := Pos(A, Value);
  7.   if aPos > 0 then begin
  8.     aPos := aPos + Length(A);
  9.     bPos := PosEx(B, Value, aPos);
  10.     if bPos > 0 then begin
  11.       result := trim(Copy(Value, aPos, bPos - aPos));
  12.     end;
  13.   end;
  14. end;
  15.  

mirce.vladimirov

  • Full Member
  • ***
  • Posts: 220
Re: How to split delimited strings?
« Reply #10 on: October 13, 2019, 10:33:29 pm »
This should work:

Code: Pascal  [Select]
  1. var
  2.   mystringlist: TStringList;
  3.   i: Integer;
  4. begin
  5. mystringlist:=TStringlist.Create;
  6. mystringlist.Delimiter:=',';
  7. mystringlist.DelimitedText:=somestring;
  8.  
  9. for I := 0 to mystringlist.Count-1  do begin
  10.     showmessage('see this:' + mystringlist.Strings[I]);  
  11. end;
  12.  
  13. mystringlist.free;
  14. end;
  15.  
« Last Edit: October 13, 2019, 10:36:20 pm by mirce.vladimirov »

jamie

  • Hero Member
  • *****
  • Posts: 2088
Re: How to split delimited strings?
« Reply #11 on: October 14, 2019, 12:52:10 am »
using your sample string..
Code: Pascal  [Select]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. Type TFlags =
  3.  Record
  4.    Case Longint of
  5.     0:(LB,QT:Word);
  6.     1:(HasSomething:QWordBool);
  7.    End;
  8.  Var
  9.    Flags:TFlags;
  10.    C:String;
  11.    Acc:string;
  12.  begin
  13.   Flags.HasSomething := false;
  14.   With Memo1 do
  15.    begin
  16.      Clear;
  17.      Acc := '';
  18.      For C in TestString do
  19.       Case C of
  20.        '[' :Inc(Flags.LB);
  21.        ']' :if WordBool(Flags.LB) Then Dec(Flags.LB);
  22.        '"' :WordBool(Flags.QT) := Not WordBool(Flags.Qt);
  23.        ' ' :if (Acc <> '')and (Not flags.HasSomething) then
  24.             Begin
  25.              Lines.Add(Acc);
  26.              Acc := '';
  27.              end;
  28.        Else Acc := Acc+C;
  29.       end;
  30.      if Acc <> '' Then Lines.Add(Acc);
  31.    end;
  32. end;
  33.  
And I get this
Quote
185.86.164.111
13/Oct/2019:11:10:29+0000
GET/wp-login.phpHTTP/1.1
404
Mozilla/5.0(WindowsNT10.0;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/51.0.2704.103Safari/537.36

Does that look correct to you?
Number 1 at blue screen app creations!

Kays

  • Full Member
  • ***
  • Posts: 182
  • Whasup!?
    • KaiBurghardt.de
Re: How to split delimited strings?
« Reply #12 on: October 14, 2019, 01:30:11 pm »
Here is a line from an apache log: […]

What's the best way to do this? […]
The best way is of course to omit the intermediate step and use structured logging. Unfortunately the journald_module is still in trunk and states, that due to performance reasons it isn’t suitable for an access_log. But then you could use sd-journal(3) (provided the headers are/become available in Pascal, too).

PS: systemd sucks.
Yours Sincerely
Kai Burghardt

rgh

  • New Member
  • *
  • Posts: 45
Re: How to split delimited strings?
« Reply #13 on: October 14, 2019, 02:55:29 pm »
Thanks to all for their help on this topic. Most instructive code examples. I had no idea there was such a thing as type helpers.

I now realize that it's naive to suppose that log files with the format I gave can be split up simply using split.

Fortunately, I'm in a position to define my own LogFormat for the Apache server I'm wanting to monitor. Seems I can define a tab delimited format with no spaces, quotes or other complications between variables. All I then need do is split on the tabs.

LogFormat "%h\t%r\t%t" my_log_format

and so on.

My end goal was to log to a postgresql table. I'm piping the Apache log to the program a2log2pg written in fp that uses ReadLn on standard input, splits each line up and writes it to the database.

CustomLog "|/usr/bin/a2log2pg" my_log_format

Seems to be working nicely so far!



howardpc

  • Hero Member
  • *****
  • Posts: 3182
Re: How to split delimited strings?
« Reply #14 on: October 14, 2019, 03:54:32 pm »
Jamie's clever use of a variant record yields an elegant and compact solution.
Note it can be simplified to avoid typecasts, as follows
Code: Pascal  [Select]
  1. program project1;
  2.  
  3. {$Mode objfpc}{$H+}
  4. {$IfDef Windows}{$AppType console}{$EndIf}
  5.  
  6. uses Classes;
  7.  
  8. type
  9.   TFlags = record
  10.     case Boolean of
  11.       False: (diffBound, sameBound: ByteBool);
  12.       True:  (HasSomething: WordBool);
  13.   end;
  14.  
  15. procedure Parse(const aStr: String; var lines: TStringList);
  16. var
  17.   flags: TFlags;
  18.   c: Char;
  19.   tmp: String = '';
  20.  begin
  21.    Flags.HasSomething := False;
  22.    lines.Clear;
  23.    for c in aStr do
  24.       case c of
  25.         '[': Inc(flags.diffBound);
  26.         ']': if flags.diffBound then
  27.                Dec(flags.diffBound);
  28.         '"': flags.sameBound := not flags.sameBound;
  29.         ' ': if (tmp <> '') and not flags.HasSomething then
  30.                begin
  31.                  Lines.Add(tmp);
  32.                  tmp := '';
  33.                end;
  34.         else
  35.           tmp := tmp + c;
  36.       end;
  37.    if tmp <> '' then
  38.      lines.Add(tmp);
  39. end;
  40.  
  41. var
  42.   txt: String = '185.86.164.111 [13/Oct/2019:11:10:29 +0000] "GET /wp-login.php HTTP/1.1" 404 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"';
  43.   sl: TStringList;
  44.   s: String;
  45.  
  46. begin
  47.   sl := TStringList.Create;
  48.   Parse(txt, sl);
  49.   for s in sl do
  50.     WriteLn(s);
  51.   sl.Free;
  52.   WriteLn('press [Enter]');
  53.   ReadLn;
  54. end.