Recent

Author Topic: Parsing Long string to specific sizes  (Read 1990 times)

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Parsing Long string to specific sizes
« on: August 13, 2022, 11:34:57 pm »
This code actually works but there must be a better way. Took me about 3 days and the next parser I need is even worse.

 {Rules:
 1. String can be any length.
 2. DelSpace1 will be performed on string
 3. The indent will vary.
 4. The header will vary in length
}

Code: Pascal  [Select][+][-]
  1. Const
  2.   TwoSpaces = '  ';
  3.  
  4. var
  5.  Form1: TForm1;
  6.  Data: array of integer;
  7.  
  8.  bit1: String='Header: xx bbb ccccc dddd eeee fffff gggg kk i jjjjj kkkk lll mmmm nnn ooo ppppp qqqqq rrrr ssssss tttttt uu vvv www xxx yy zzz.';
  9.  bit5: string='Header: Minvalue returns the smallest value in the data array';
  10.  
  11.  S2Parse: String;
  12.  TSList: TStringList;
  13.  
  14. implementation
  15.  
  16. {$R *.lfm}
  17.  
  18. { TForm1 }
  19.  
  20. procedure TForm1.FormCreate(Sender: TObject);
  21. begin
  22. // S2Parse:= bit5;
  23.  S2PARSE:=CleanString(bit1);
  24.  S2Parse:=delspace1(S2Parse);
  25. end;
  26.  
  27.  
  28. //1st cut ='Header: xx bbb ccccc dddd eeee fffff gggg kk i jjjjj kkkk '
  29. //2nd cut='lll mmmm nnn ooo ppppp qqqqq rrrr ssssss tttttt uu vvv www '
  30. //3rd cut='xxx yy zzz.'
  31.  
  32. procedure TForm1.ParseLine;
  33.  Var i,Cut,Lgth: Integer;
  34.   Sub: String;
  35.   Base: Integer=60;
  36.   Done: Boolean=False;
  37.   begin
  38.   Memo1.Clear;
  39.   TSList.Clear;
  40.   Repeat
  41.    SetSpaceArray;
  42.    Cut:=FetchCut(Base);
  43.    Lgth:=Length(S2Parse);
  44.    if Lgth<Base then begin
  45.      TSList.Add(TwoSpaces + S2Parse);
  46.      Memo1.Lines.AddStrings( TSList, True );
  47.      exit;
  48.    end;
  49.    Sub:=' ';
  50.    Sub:=Copy(S2Parse,1,Cut);
  51.    TSList.Add(TwoSpaces + Sub);
  52.    Memo1.Lines.AddStrings( TSList, True );
  53.    Application.ProcessMessages;
  54.    Lgth:=Length(S2Parse);
  55.    if Lgth=0 then exit;
  56.    for i := 1 to Cut do begin S2Parse[i]:=' '; end;
  57.    S2Parse:=Trim(S2Parse);
  58.   until Done;
  59.  end;
  60.  
  61.  
  62. procedure tform1.SetSpaceArray;
  63. var aPos: integer;
  64.   Idx: integer=1;
  65.   idx2: integer=0;
  66.   ctr: integer=0;
  67.   done: boolean=false;
  68.  begin
  69.    SetLength(Data,0);
  70.    SetLength(Data,500);
  71.  
  72.    repeat
  73.        inc(ctr);
  74.        aPos:=nPos(' ',s2Parse,idx);
  75.        if apos=0 then done:=true;
  76.        Data[idx2]:=apos;
  77.        Inc(Idx);
  78.        Inc(idx2);
  79.    until done;
  80.      Label2.Caption:=inttostr(ctr);
  81.      Application.ProcessMessages;
  82.      IDX2:=IDX2;
  83.    end;
  84.  
  85. function TForm1.FetchCut(ABase: Integer): Integer;
  86.  var i: Integer;
  87.   Last: Integer;
  88.   S: Integer;
  89.  begin
  90.   for i := Low(Data) to High(Data) do begin
  91.       Last:=Data[i];
  92.       if Last=0 then break;
  93.       if (Last<ABase) Or (Last=ABase) then S:=Last
  94.       else Break;
  95.   end;
  96.   Result:=S;
  97.  end;  
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: Parsing Long string to specific sizes
« Reply #1 on: August 14, 2022, 03:06:35 am »
If I understand the rules correctly, I think the procedure can be reduced to the following:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.ParseLine(S: String);
  2. const
  3.   Delims: Set of Char = [' ', #3..#10, #13, #34, #123, #125];
  4.   Base: Integer = 60;
  5. var
  6.   I, J, LineLen, L: SizeInt;
  7.   Line: String;
  8.  
  9.   procedure AddToLine(S: String);
  10.   var
  11.     L: SizeInt;
  12.   begin
  13.     L := Length(S);
  14.     if LineLen + L > Base then
  15.     begin
  16.       Memo1.Lines.Add(TwoSpaces + Line);
  17.       Line := '';
  18.       LineLen := 0;
  19.     end;
  20.     Line := Line + S;
  21.     Inc(LineLen, L);
  22.   end;
  23.  
  24. begin
  25.   Memo1.Clear;
  26.   L := Length(S);
  27.   Line := '';
  28.   LineLen := 0;
  29.   I := 1;
  30.   repeat
  31.     J := I;
  32.     // Skip delimiters
  33.     while (I <= L) and (S[I] in Delims) do
  34.       Inc(I);
  35.     if I > J then
  36.       AddToLine(' '); // Add a single space for all delimiters
  37.     J := I;
  38.     // Find the next word
  39.     while (I <= L) and not (S[I] in Delims) do
  40.       Inc(I);
  41.     if I > J then
  42.       AddToLine(Copy(S, J, I - J)); // Add the word
  43.   until I > L;
  44.   if LineLen > 0 then
  45.     Memo1.Lines.Add(TwoSpaces + Line); // Add the last line
  46. end;  

and then called with:
Code: Pascal  [Select][+][-]
  1. ParseLine(bits1);

The rules doesn't specify what if individual word length is greater than Base,
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

jamie

  • Hero Member
  • *****
  • Posts: 6077
Re: Parsing Long string to specific sizes
« Reply #2 on: August 14, 2022, 03:36:51 am »
Why can't you use "Split" ?

it can be done in a couple of steps I guess.

 ArrayOFSomeStrnig := TheStringTOSplit.Split(' ');// Space being the separator for now.

The only true wisdom is knowing you know nothing

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #3 on: August 14, 2022, 06:39:12 am »
@y.ivanov Thanks I will definitely try out the code. and let you know.

@jamie. Hi Janie, I looked at Split and couldn't figure out how to use split and get  the Base:Integer=60; But I don't know split.

   ArrayOFSomeStrnig := TheStringTOSplit.Split(' ');
 As i understand Split this would split each word. I need a line at a specified length.

 I really need to change the Declaration of ParseLine to
ParseLine(AIndent,AOffset: Integer; AS2Parse: String);
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #4 on: August 14, 2022, 07:13:52 am »
@y.ivanov Plug and play. Works WOW!

I worked three days on my solution and was just happy it worked. You, about an hour.

The parser I really need is a lot more complicate that this one. There is the idea of an indent and an offset which takes it over my pay grade.

Each New line I parse can have a different indent. This parser can handles that. Each has the same indent.

The Offset changes that in this way: The Header: would be indented 2 spaces but the rest of the lines need to be indented  x:=Length('Header: ');

  Header: xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx xx x xxxx xxxxxxxx xxx xxxxx xxx
              xxx xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx xxx xxx xx
              xxxxxxxxxx xxxxxx xxx.

I'm not even sure this is possible. Well not for me I have 4 or five rewrites and can't get it to work.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

dje

  • Full Member
  • ***
  • Posts: 134
Re: Parsing Long string to specific sizes
« Reply #5 on: August 14, 2022, 09:32:32 am »
Just my 2 cents. While this task appears to be simple, it obviously isn't. The trick is to break up the process into 3 sections.

1) Split a string into a token structure.
2) Join the structure into a new structure (formatted to your requirements).
3) Combine both steps 1 & 2 into a utility function.

For something this simple, a TStringList is perfect for the "token structure". An "array of string" is also useful, but lacks many of the TStringList features.

Your step one can be as simple as this: (Note: there are problems with using DelimtedText, which is dependent on your requirements)
Code: Pascal  [Select][+][-]
  1. procedure MySplit(ADest: TStrings; const ASource: string);
  2. begin
  3.   ADest.DelimitedText := ASource; // You can modify this builtin parsing feature later to suit your requirments.
  4. end;    

As far as I can see, your step two is something like:
Code: Pascal  [Select][+][-]
  1. procedure MyJoin(ADest, ASource: TStrings; AIndent, AColWidth: integer);
  2. var
  3.   LIndex: integer;
  4.   LBuffer: string;
  5. begin
  6.   if ASource.Count > 0 then begin
  7.     LBuffer := StringOfChar(' ', AIndent) + ASource[0];
  8.     for LIndex := 1 to ASource.Count - 1 do begin
  9.       if Length(LBuffer) + Length(ASource[LIndex]) > AColWidth then begin
  10.         ADest.Add(LBuffer);
  11.         LBuffer := StringOfChar(' ', Length(ASource[0]) + AIndent);
  12.       end;
  13.       LBuffer := LBuffer + ' ' + ASource[LIndex];
  14.     end;
  15.     ADest.Add(LBuffer);
  16.   end;
  17. end;  

Therefore your step 3 is:
Code: Pascal  [Select][+][-]
  1. procedure MyParse(ADest: TStrings; const ASource: string; AIndent, AColWidth: integer);
  2. var
  3.   LTemp: TStringList;
  4. begin
  5.   LTemp := TStringList.Create;
  6.   try
  7.     MySplit(LTemp, ASource);
  8.     MyJoin(ADest, LTemp, AIndent, AColWidth);
  9.   finally
  10.     LTemp.Free;
  11.   end;
  12. end;

Parsing only gets harder as the syntax becomes more complex. By breaking up the task this way, you prevent your final functionality from becoming impossible to modify and unit test.

Note: MyParse() can now be called to parse a string and append the formatted results to a TMemo. You have already expressed how long your solution took, and how many rewrites it required. Always think about future requirements. Maybe the input string comes from a TIniFile, XML, TDbf, Http encoded, or a TStringGrid user interface. It could be anything. You want to aim for a standard intermediate data structure, so the formatting (encoding) code is separate from your parsing (decoding) code.


« Last Edit: August 14, 2022, 10:03:32 am by derek.john.evans »

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: Parsing Long string to specific sizes
« Reply #6 on: August 14, 2022, 11:29:59 am »
*snip*
Each New line I parse can have a different indent. This parser can handles that. Each has the same indent.

The Offset changes that in this way: The Header: would be indented 2 spaces but the rest of the lines need to be indented  x:=Length('Header: ');

  Header: xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx xx x xxxx xxxxxxxx xxx xxxxx xxx
              xxx xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx xxx xxx xx
              xxxxxxxxxx xxxxxx xxx.

I'm not even sure this is possible. Well not for me I have 4 or five rewrites and can't get it to work.
Revised procedure:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.ParseLine(S: String; Lines: TStrings);
  2. const
  3.   Delims: Set of Char = [' ', #3..#10, #13, #34, #123, #125];
  4.   Base: Integer = 60;
  5. var
  6.   I, J, LineLen, L: SizeInt;
  7.   Line: String;
  8.  
  9.   procedure AddWithOffset(S: String);
  10.   var
  11.     Offset: Integer;
  12.   begin
  13.     if Lines.Count = 0 then
  14.       Offset := 2 else
  15.       Offset := 2{Length(TwoSpaces)} + 8{Length('Header: ')};
  16.     Lines.Add(StringOfChar(' ', Offset) + S);
  17.   end;
  18.  
  19.   procedure AddToLine(S: String);
  20.   var
  21.     L: SizeInt;
  22.   begin
  23.     L := Length(S);
  24.     if LineLen + L > Base then
  25.     begin
  26.       AddWithOffset(Line);
  27.       Line := '';
  28.       LineLen := 0;
  29.     end;
  30.     Line := Line + S;
  31.     Inc(LineLen, L);
  32.   end;
  33.  
  34. begin
  35.   Lines.Clear;
  36.   L := Length(S);
  37.   Line := '';
  38.   LineLen := 0;
  39.   I := 1;
  40.   repeat
  41.     J := I;
  42.     // Skip delimiters
  43.     while (I <= L) and (S[I] in Delims) do
  44.       Inc(I);
  45.     if I > J then
  46.       AddToLine(' '); // Add a single space for all delimiters
  47.     J := I;
  48.     // Find the next word
  49.     while (I <= L) and not (S[I] in Delims) do
  50.       Inc(I);
  51.     if I > J then
  52.       AddToLine(Copy(S, J, I - J)); // Add the word
  53.   until I > L;
  54.   if LineLen > 0 then
  55.     AddWithOffset(Line); // Add the last line
  56. end;  
Call it with:
Code: Pascal  [Select][+][-]
  1. ParseLine(bit1, Memo1.Lines);

By specifying the second parameter you can put the result in any TStrings you want.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

jamie

  • Hero Member
  • *****
  • Posts: 6077
Re: Parsing Long string to specific sizes
« Reply #7 on: August 14, 2022, 03:46:19 pm »
if you already know the index offset then why not use the COPY with the split?

Code: Pascal  [Select][+][-]
  1. MyFinalStringArray := Copy(TheString, Start, End).Split(' ');
  2.  

I think that got that right, I did it by memory, but it should be close enough for you to get the idea.

 Just thinking outload here.!
The only true wisdom is knowing you know nothing

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #8 on: August 14, 2022, 04:32:57 pm »
Thank you

It will take me a couple of days to test this out. I run the solutions up and see how they work.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #9 on: August 15, 2022, 07:50:34 am »
I got both @i.ivanov and @derek.john.evans demo code to work in demos.

However when I cut both examples into the program where need( I cut both in) on both I get an error I can't solve. I get an error on a parameter. I think I know why but don't know how to fix it.

So I will have to make a demo out of the program and post for help.

I don't mind posting the project but it's rather large, probably won't post.

Thanks
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #10 on: August 15, 2022, 07:27:39 pm »
@y.ivanov I get a error on line 13 when I try to compile the procedure parse line.

Code: Pascal  [Select][+][-]
  1.  procedure TForm1.ParseLine(S: String; Lines: TStrings);
  2.  const
  3.    Delims: Set of Char = [' ', #3..#10, #13, #34, #123, #125];
  4.    Base: Integer = 60;
  5.  var
  6.    I, J, LineLen, L: SizeInt;
  7.    Line: String;
  8.  
  9.    procedure AddWithOffset(S: String);
  10.    var
  11.      Offset: Integer;
  12.    begin
  13.      if Lines.Count = 0 then
  14.        Offset := 2 else
  15.        Offset := 2{Length(TwoSpaces)} + 8{Length('Header: ')};
  16.      Lines.Add(StringOfChar(' ', Offset) + S);
  17.    end;                                    
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #11 on: August 15, 2022, 07:33:41 pm »
@derek.john.evans
I get an error: unit1.pas(884,9) Error: identifier idents no member "DelimitedText"
I don't understand it.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.MySplit(ADest: TStrings; const ASource: string);
  2. begin
  3.   ADest.DelimitedText := ASource; // You can modify this builtin parsing feature later to suit your requirments.
  4. end;
  5.              
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

alpine

  • Hero Member
  • *****
  • Posts: 1032
Re: Parsing Long string to specific sizes
« Reply #12 on: August 15, 2022, 08:14:07 pm »
@y.ivanov I get a error on line 13 when I try to compile the procedure parse line.
*snip*
It seems that your definition of TStrings differs from what one we have with derek.john.evans. Have you redefined it accidentally?
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #13 on: August 15, 2022, 09:36:39 pm »
Well I guess I did. I have the following under types. I'll change that and see what happens.

Thanks.

type

  { TForm1 }

  TStrings = (Ts2Parse, TSeeAlso, TsErrors, TsDeclaration,
              TsDescription, TsFunctionresult,TsArguments,
              TsNil);         
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Parsing Long string to specific sizes
« Reply #14 on: August 15, 2022, 09:45:48 pm »
Yes that was it. And I wan't even using the TStrings I defined. Now both of the parsing units compile in my program.

Plan on using both with minor mods if I can manage it.


Thanks.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

 

TinyPortal © 2005-2018