Recent

Author Topic: Convert string with Key-Value pairs to TStrings with Key-Value pairs.  (Read 3364 times)

Bart

  • Hero Member
  • *****
  • Posts: 5667
    • Bart en Mariska's Webstek
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #30 on: December 06, 2025, 08:07:00 pm »
FWIW: I added handling of "non quoted" values (with limitations: value can not contains a space in that case).
It's in my FsiStrUtils unit, called TryParseKeyValuePairs.

Feel free to completely ignore or test and make it fail  :)

Bart

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #31 on: December 07, 2025, 05:29:37 pm »
Honest question, why so complicated? You are basically building a regex by hand and by doing this you built yourself a power automaton which requires you to handle all states within one big loop, which means that the loop grows exponentially per state. But because you are in the case that your result is a well defined sequence of tokens, so instead you can write smaller automatons and have them match individually instead. This only grows linearly, so I'm pretty sure you could easily half the code size for that function (see my code example from before)

Bart

  • Hero Member
  • *****
  • Posts: 5667
    • Bart en Mariska's Webstek
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #32 on: December 07, 2025, 11:14:10 pm »
Well, I couldn't get your example code working at all.
And if it would, would it handle the case of unquoted values as well?
Not sure why you would think the loop grows exponential.
Each new state adds at most 3 new comparisons (worst case scenario) so +3n, whre n is the number of states.
Nevertheless, I'm trying to decrease the number of states, as ome of them may prove to be redundant.

As said before, it's mostly fun for me to code it that way.
(You may have notice there is more probably not very usefull (in the real world) stuff inside my FsiStrUtils unit.
The original problem (parsing a string of Key/Value pairs) I already solved in a different (less efficient) way, before starting this thread.

It's funny you say that it's basically a regex machine.
I don't really know how regex's work.
I tend to avoid them like the plague.
(At work (which is not programming or ICT related at all) I was kind of forced to use regex's in JavaScript. Still have headaches from that.
And these were simple...)

All that being said:
I would like to see your idea in a working example.
Even if it would only work for the case where Values are always quoted and spaces around the equal sign are disallowed.

Off-topic: what is the coreect plural of "regex"?

Bart

cdbc

  • Hero Member
  • *****
  • Posts: 2575
    • http://www.cdbc.dk
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #33 on: December 08, 2025, 01:21:47 am »
Hi
Quote
Off-topic: what is the coreect plural of "regex"?
Off the bat, I'd say "Regex's" or "Regex'es" ~ 'Regular Expression(s)'  %)
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

dseligo

  • Hero Member
  • *****
  • Posts: 1651
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #34 on: December 08, 2025, 02:38:28 am »
Hi
Quote
Off-topic: what is the coreect plural of "regex"?
Off the bat, I'd say "Regex's" or "Regex'es" ~ 'Regular Expression(s)'  %)
Regards Benny

regexes, without '

Warfley

  • Hero Member
  • *****
  • Posts: 2037
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #35 on: December 08, 2025, 10:41:49 am »
Well, I couldn't get your example code working at all.
And if it would, would it handle the case of unquoted values as well?
Not sure why you would think the loop grows exponential.
Each new state adds at most 3 new comparisons (worst case scenario) so +3n, whre n is the number of states.
Nevertheless, I'm trying to decrease the number of states, as ome of them may prove to be redundant.
I fixed the errors and made it so it can use non quoted values:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$ModeSwitch arrayoperators}
  5.  
  6. type
  7.   TKeyValuePair = record
  8.     Key: String;
  9.     Value: String;
  10.   end;
  11.  
  12.   TKeyValuePairArray = array of TKeyValuePair;
  13.  
  14. function SkipNoise(const s: String; var Start: SizeInt): Boolean; inline;
  15. begin
  16.   while (Start<=Length(s)) and (s[Start]<=#32) do
  17.     Inc(Start);
  18.   Result:=Start<=Length(s);
  19. end;
  20.  
  21. function MatchIdentifier(const s: String; var Start: SizeInt): SizeInt;
  22. const
  23.   IdentFirstChars = ['A'..'Z', 'a'..'z', '_'];
  24.   IdentChars = IdentFirstChars + ['0'..'9'];
  25. begin
  26.   Result:=0;
  27.   if not SkipNoise(s, Start) then
  28.     Exit;
  29.   while (Start+Result<=Length(s)) and
  30.         ((Result>0) or (s[Start+Result] in IdentFirstChars)) and
  31.         ((Result=0) or (s[Start+Result] in IdentChars)) do
  32.     Inc(Result);
  33. end;
  34.  
  35. function MatchLiteral(const s,l: String; var Start: SizeInt): Boolean;
  36. var
  37.   Len: SizeInt;
  38. begin
  39.   Result:=False;
  40.   if not SkipNoise(s, Start) then
  41.     Exit;
  42.   Len:=1;
  43.   while (Start+len<=Length(s)) and (len<Length(l)) and (s[Start+len] = l[Len+1]) do
  44.     Inc(len);
  45.   Result:=Len=Length(l);
  46. end;
  47.  
  48. function MatchString(const s: String; var Start: SizeInt): SizeInt;
  49. begin
  50.   Result:=0;
  51.   if not SkipNoise(s, Start) or (s[Start+Result] <> '"') then
  52.     Exit;
  53.   Inc(Result);
  54.   while (Start+Result<=Length(s)) and (s[Start+Result] <> '"') do
  55.     Inc(Result);
  56.   if Start+Result>Length(s) then
  57.     Result:=0
  58.   else
  59.     Inc(Result);
  60. end;
  61.  
  62. function MatchKVPair(const s: String; var p: SizeInt; out KVPair: TKeyValuePair): Boolean;
  63. var
  64.   tokLen: SizeInt;
  65. begin
  66.   Result:=False;
  67.   tokLen:=MatchIdentifier(s, p);
  68.   if tokLen=0 then
  69.     Exit;
  70.   KVPair.Key:=Copy(s,p,tokLen);
  71.   Inc(p,tokLen);
  72.  
  73.   if not MatchLiteral(s, '=', p) then
  74.     Exit;
  75.   Inc(p,Length('='));
  76.  
  77.   tokLen:=MatchString(s, p);
  78.   if tokLen>0 then
  79.   begin
  80.     KVPair.Value:=Copy(s,p+1,tokLen-2);
  81.     Inc(p,tokLen);
  82.   end
  83.   else
  84.   begin
  85.     tokLen:=MatchIdentifier(s,p);
  86.  
  87.     if tokLen=0 then
  88.       Exit;
  89.  
  90.     KVPair.Value:=Copy(s,p,tokLen);
  91.     Inc(p,tokLen);
  92.   end;
  93.  
  94.   Result:=True;
  95. end;
  96.  
  97. function KVPairs(const s: String): TKeyValuePairArray;
  98. var
  99.   kvPair: TKeyValuePair;
  100.   p: SizeInt = 1;
  101. begin
  102.   Result:=[];
  103.   while MatchKVPair(s,p,kvPair) do
  104.     Result += [kvPair];
  105.   if p<=Length(s) then
  106.     WriteLn('Syntax Error in string at ', p);
  107. end;
  108.  
  109. const KVString = 'Test1 = "Hello World" test2=Foo test3 ="Bar"Test4="FooBar"';
  110. var
  111.   pairs: TKeyValuePairArray;
  112.   p: TKeyValuePair;
  113. begin
  114.   pairs := KVPairs(KVString);
  115.   for p in pairs do
  116.     WriteLn('Pair: ', p.key, ' = "', p.Value, '"');
  117. end.
  118.  

It's funny you say that it's basically a regex machine.
I don't really know how regex's work.
I tend to avoid them like the plague.
(At work (which is not programming or ICT related at all) I was kind of forced to use regex's in JavaScript. Still have headaches from that.
And these were simple...)
Basically exactly what you wrote. A Regex is just a string representation of a state automaton. Basically it's a machine that has a current state and checks for every char in a string how it is processed according to the current state. Consider an identifier, meaning something that starts with a normal character and continues alphanumeric, a regex automaton for it would look like this:
Code: Pascal  [Select][+][-]
  1. function MatchIdentifier(const s: String): Boolean;
  2. type
  3.   TAutomatonState = (asFirstChar, asFollowChar, asError);
  4. var
  5.   currentState: TAutomatonState = asFirstChar;
  6.   c: Char;
  7. begin
  8.   for c in s do
  9.     case currentState of
  10.     asFirstChar: if c in ['A'..'Z', 'a'..'z', '_'] then currentState:=asFollowChar else currentState:=asError;
  11.     asFollowChar: if c in ['A'..'Z', 'a'..'z', '_', '0'..'9'] then currentState:=asFollowChar else currentState:=asError;
  12.     asError: break;
  13.   end;
  14.   Result:=currentState = asFollowChar;    
  15. end;
The automaton has 3 states, firstChar, followChar and Error. Once it's in the Error state, it can never return (also called "Sink" in the literature). The FollowChar state is what is considered a "Final State", because you need to reach that state to confirm the match.

The thing above is a full matcher, meaning it just checks if the full string matches the regex. If you want partial matching, you could instead re-write it into a step function:
Code: Pascal  [Select][+][-]
  1. type
  2.   TAutomatonState = (asFirstChar, asFollowChar, asError);
  3. const
  4.   InitialState = asFirstChar;
  5.   FinalStates = [asFollowChar];
  6.  
  7. function MatchIdentifierStep(c: Char; currentState: TAutomatonState): TAutomatonState;
  8. begin
  9.   case currentState of
  10.     asFirstChar: if c in ['A'..'Z', 'a'..'z', '_'] then Result:=asFollowChar else Result:=asError;
  11.     asFollowChar: if c in ['A'..'Z', 'a'..'z', '_', '0'..'9'] then Result:=asFollowChar else Result:=asError;
  12.     otherwise Result:=asError;
  13. end;
  14.  
  15. function MatchIdentifier(const s: String): Boolean;
  16. var
  17.   c: Char;
  18.   st: TAutomatonState;
  19. begin
  20.   st := InitialState;
  21.   for c in s do
  22.   begin
  23.     st:=MatchIdentifierStep(c);
  24.     if st=asError then Break;
  25.   end;
  26.   Result := st in FinalStates;
  27. end;

Now if you want to add another automaton, e.g. for matching numbers (starting with +/- or a digit other than 0 followed by any digit), you can just do the following:
Code: Pascal  [Select][+][-]
  1. type
  2.   TAutomatonState = (asFirstChar, asAfterSign, asFollowDigits, asError);
  3. const
  4.   InitialState = asFirstChar;
  5.   FinalStates = [asFollowDigits];
  6.  
  7. function MatchNumberStep(c: Char; currentState: TAutomatonState): TAutomatonState;
  8. begin
  9.   case currentState of
  10.     asFirstChar:
  11.       case c of
  12.       '+', '-': Result := asAfterSign;
  13.       '1'..'9': Result := asFollowDigits;
  14.       otherwise Result:=asError;
  15.       end;
  16.     asAfterSign: if c in ['0'..'9'] then Result:=asFollowDigits else Result:=asError;
  17.     asFollowDigits: if c in ['0'..'9'] then Result:=asFollowDigits else Result:=asError;
  18.     otherwise Result:=asError;
  19. end;
  20.  
  21. function MatchNumber(const s: String): Boolean;
  22. var
  23.   c: Char;
  24.   st: TAutomatonState;
  25. begin
  26.   st := InitialState;
  27.   for c in s do
  28.   begin
  29.     st:=MatchNumberStep(c);
  30.     if st=asError then Break;
  31.   end;
  32.   Result := st in FinalStates;
  33. end;

Now there are different ways to combine different regexes:
1. Concatination (e.g. matching first a number than an identifier)
2. Union (e.g. matching either a number or an identifier)
3. Repetition or Kleene operation (e.g. 0 or more numbers)

The concatenation is easy, just append the states:
Code: Pascal  [Select][+][-]
  1. type
  2.   TAutomatonState = (asFirstNumChar, asAfterSign, asFollowDigits { same as FirstChar for Identifiers}, asFollowChar, asError);
  3. const
  4.   InitialState = asFirstNumChar;
  5.   FinalStates = [asFollowChars];
  6.  
  7. function MatchNumberStep(c: Char; currentState: TAutomatonState): TAutomatonState;
  8. begin
  9.   case currentState of
  10.     asFirstNumChar:
  11.       case c of
  12.       '+', '-': Result := asAfterSign;
  13.       '1'..'9': Result := asFollowDigits;
  14.       otherwise Result:=asError;
  15.       end;
  16.     asAfterSign: if c in ['0'..'9'] then Result:=asFollowDigits else Result:=asError;
  17.     asFollowDigits:
  18.       case c of
  19.       '0'..'9': Result := asFollowDigits;
  20.       'A'..'Z', 'a'..'z', '_': Result := asFollowChar;
  21.       otherwise Result:=asError;
  22.       end;
  23.     asFollowChar: if c in ['A'..'Z', 'a'..'z', '_', '0'..'9'] then Result:=asFollowChar else Result:=asError;
  24.     otherwise Result:=asError;
  25. end;
  26.  
  27. function MatchNumberAndIdentifier(const s: String): Boolean;
  28. var
  29.   c: Char;
  30.   st: TAutomatonState;
  31. begin
  32.   st := InitialState;
  33.   for c in s do
  34.   begin
  35.     st:=MatchNumberStep(c);
  36.     if st=asError then Break;
  37.   end;
  38.   Result := st in FinalStates;
  39. end;

The Union is technically more difficult to compute the state set (it's called the powerset algorithm), but we can simply make use of the fact that our pascal program is stateful and use the stack for that:
Code: Pascal  [Select][+][-]
  1. type
  2.   TIdentAutomatonState = (asFirstIdentChar, asFollowChar, asError);
  3. const
  4.   IdentInitialState = asFirstIdentChar;
  5.   IdentFinalStates = [asFollowChar];
  6.  
  7. function MatchIdentifierStep(c: Char; currentState: TIdentAutomatonState): TIdentAutomatonState;
  8. begin
  9.   case currentState of
  10.     asFirstIdentChar: if c in ['A'..'Z', 'a'..'z', '_'] then Result:=asFollowChar else Result:=asError;
  11.     asFollowChar: if c in ['A'..'Z', 'a'..'z', '_', '0'..'9'] then Result:=asFollowChar else Result:=asError;
  12.     otherwise Result:=asError;
  13. end;
  14.  
  15. type
  16.   TNumberAutomatonState = (asFirstNumChar, asAfterSign, asFollowDigits, asError);
  17. const
  18.   NumberInitialState = asFirstNumChar;
  19.   NumberFinalStates = [asFollowDigits];
  20.  
  21. function MatchNumberStep(c: Char; currentState: TNumberAutomatonState): TNumberAutomatonState;
  22. begin
  23.   case currentState of
  24.     asFirstNumChar:
  25.       case c of
  26.       '+', '-': Result := asAfterSign;
  27.       '1'..'9': Result := asFollowDigits;
  28.       otherwise Result:=asError;
  29.       end;
  30.     asAfterSign: if c in ['0'..'9'] then Result:=asFollowDigits else Result:=asError;
  31.     asFollowDigits: if c in ['0'..'9'] then Result:=asFollowDigits else Result:=asError;
  32.     otherwise Result:=asError;
  33. end;
  34.  
  35. function MatchNumberOrIdent(const s: String): Boolean;
  36. var
  37.   c: Char;
  38.   numST: TNumberAutomatonState;
  39.   identST: TIdentifierAutomatonState;
  40. begin
  41.   numST := NumberInitialState;
  42.   identST := IdentInitialState;
  43.   for c in s do
  44.   begin
  45.     numST:=matchNumberStep(c);
  46.     identST:=MatchIdentifierStep(c);
  47.     if (numST=asError) and (identST=asError) then Break;
  48.   end;
  49.   Result := (numST in NumberFinalStates) or (identST in IdentFinalStates);
  50. end;

Because this is growing very long, the kleene operation you can probably figure out by yourself.
In the end, a (true) Regex engine just takes a regex string, and "compiles" it into such an automaton. This state based design guarantees that every character only needs to be visited once (given to each step function).

And while you of course did not do all the formalism I did above, the code you wrote is basically exactly that, a state automaton to match the key value pairs. So you basically built yourself your own regex :)

Bart

  • Hero Member
  • *****
  • Posts: 5667
    • Bart en Mariska's Webstek
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #36 on: December 08, 2025, 06:37:32 pm »
Thanks for the extensive explanation.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5667
    • Bart en Mariska's Webstek
Re: Convert string with Key-Value pairs to TStrings with Key-Value pairs.
« Reply #37 on: December 12, 2025, 07:28:04 pm »
@warfley Here's my new try at it (based upon your suggestions, not your code, so it's a bit different):

Code: Pascal  [Select][+][-]
  1. type
  2.   TKeyValueOption = (
  3.                       kvoDequoteValue,          //remove surrouding quote characters from Value
  4.                       kvoAllowSpaceAroundSep,   //allow for spaces before and/or after the KeyValueSeparator like: Key ="Value", Key= "Value" or Key = "Value"
  5.                       kvoAllowUltraCompact      //allow for no space after Value like: Ke1="Value1"Key2="Value2"
  6.                     );
  7.   TKeyValueOptions = set of TKeyValueOption;
  8. const
  9.   DefaultKeyValueOptions = [kvoDequoteValue, kvoAllowSpaceAroundSep];
  10.   StrictKeyValueOptions = [kvoDequoteValue];
  11.   RelaxedKeyValueOptions = [kvoDequoteValue, kvoAllowSpaceAroundSep, kvoAllowUltraCompact];
  12.  
  13. function TryParseKeyValuePairs_Alt(Line: String; List: TStrings; out ErrorPos: Integer; QuoteStart, QuoteEnd, KeyValueSeparator: Char; Options: TKeyValueOptions = DefaultKeyValueOptions): Boolean;
  14. var
  15.   Index, Len, ValueStart: Integer;
  16.   NormalChars: TSysCharSet;
  17.   Key, Value: String;
  18.   QuotedValues: Boolean;
  19. const
  20.   WhiteSpace = [#32,#9];
  21.   LineBreaks = [#10,#13];
  22.  
  23.   procedure AddKeyValuePair;
  24.   begin
  25.     if QuotedValues then
  26.     begin
  27.       if (kvoDequoteValue in Options) then
  28.         Value := Copy(Line, ValueStart, Index-ValueStart-1)
  29.       else
  30.         Value := Copy(Line, ValueStart-1, Index-ValueStart+1);
  31.     end
  32.     else
  33.       Value := Copy(Line, ValueStart, Index-ValueStart);
  34.     //writeln('Value=[',Value,']');
  35.     List.AddPair(Key, Value);
  36.     Key := '';
  37.     Value := '';
  38.   end;
  39.  
  40.   //all these "automatons" will set ErrorPos upon failure
  41.   //upon success Index will be set to 1 after the last character that was parsed
  42.   function SkipWhiteSpace(Mandatory: Boolean=False): Boolean;
  43.   begin
  44.     Result := False;
  45.     if Mandatory and not (Line[Index] in WhiteSpace) then
  46.     begin
  47.       ErrorPos := Index;
  48.       Exit;
  49.     end;
  50.     while (Index <= Len) and (Line[Index] in WhiteSpace) do Inc(Index);
  51.     Result := True;
  52.   end;
  53.  
  54.   function ParseKey: Boolean;
  55.   var
  56.     Start: Integer;
  57.   begin
  58.     Start := Index;
  59.     while (Index <= Len) and (Line[Index] in NormalChars) do Inc(Index);
  60.     if (Index > Len) then
  61.     begin
  62.       ErrorPos := -1;
  63.       Exit(False);
  64.     end;
  65.     Key := Copy(Line, Start, Index-Start);
  66.     Result := True;
  67.   end;
  68.  
  69.   function ParseSeparator: Boolean;
  70.   begin
  71.     if (Index > Len) then
  72.       ErrorPos := -1
  73.     else if (Line[Index] <> KeyValueSeparator) then
  74.       ErrorPos := Index;
  75.     if (ErrorPos = 0) then
  76.       Inc(Index);
  77.     Result := (ErrorPos = 0);
  78.     //if ErrorPos<>0 then writeln('ParseSeparator: ErrorPos=',ErrorPos);
  79.   end;
  80.  
  81.   function ParseQuoteChar(QC: Char): Boolean;
  82.   begin
  83.     if not QuotedValues then
  84.       Exit(True);
  85.     if (Index > Len) then
  86.       ErrorPos := -1
  87.     else if (Line[Index] <> QC) then
  88.       ErrorPos := Index;
  89.     if (ErrorPos = 0) then
  90.       Inc(Index);
  91.     Result := (ErrorPos = 0);
  92.     //if ErrorPos<>0 then writeln('ParseQuoteChar: ErrorPos=',ErrorPos);
  93.   end;
  94.  
  95.   function ParseValue: Boolean;
  96.   begin
  97.     if not ParseQuoteChar(QuoteStart) then
  98.       Exit(False);
  99.     if (Index > Len) then
  100.     begin
  101.       ErrorPos := -1;
  102.       Exit(False);
  103.     end;
  104.     ValueStart := Index;
  105.     while (Index <= Len) and
  106.           ((Line[Index] in NormalChars) or (QuotedValues and (Line[Index] in (WhiteSpace + [KeyValueSeparator])))) do Inc(Index);
  107.     if QuotedValues and (Index > Len) then
  108.       ErrorPos := -1;
  109.     Result := (ErrorPos = 0) and ParseQuoteChar(QuoteEnd);
  110.     //if ErrorPos<>0 then writeln('ParseValue: ErrorPos=',ErrorPos);
  111.   end;
  112.  
  113. begin
  114.   Result := False;
  115.   List.Clear;
  116.   ErrorPos := 0;
  117.   Index := 1;
  118.   Len := Length(Line);
  119.   if (Len = 0) then
  120.     Exit;
  121.   NormalChars := [#1..#255] - WhiteSpace - LineBreaks - [QuoteStart, QuoteEnd, KeyValueSeparator];
  122.   QuotedValues := (QuoteStart <> #0);
  123.   SkipWhiteSpace;
  124.   //writeln('First non-whitespace char @Index: ',Index,'=',Line[Index]);
  125.   while (Index <= Len) and (ErrorPos = 0) do
  126.   begin
  127.     if  ParseKey then
  128.     begin
  129.       if (kvoAllowSpaceAroundSep in Options) then
  130.         SkipWhiteSpace;
  131.       if ParseSeparator then
  132.       begin
  133.         if (kvoAllowSpaceAroundSep in Options) then
  134.           SkipWhiteSpace;
  135.         if ParseValue then
  136.         begin
  137.           AddKeyValuePair;
  138.           if (Index < Len) then
  139.             SkipWhiteSpace(not (kvoAllowUltraCompact in Options));  // no need to break on False, since we jump to begin of while loop and check ErrorPos
  140.         end; //ParseValue
  141.       end; //ParseSeparator
  142.     end; //ParseKey
  143.   end; //while
  144.   Result := (ErrorPos = 0);
  145. end;

Simplified the rules about having space around the KeyValueSeparator.
It now also supports having different start and endquotes, so you can have:
Code: Pascal  [Select][+][-]
  1. 'Key1=<Value1> Key2=<Value2>'
That last feature proved to be trivial to implement in this way.
Using the original StateMachine it took a bit more effort (and 14 lines of code extra...), got there in the end though.

Probably all of this is over-engineered, since my actual real life use case is in the form of:
Code: [Select]
Key1="Value1" Key2="Value2"But it's always nice to generalize the hell out of it.
The more options, the merrier  O:-)

Bart

 

TinyPortal © 2005-2018