Recent

Author Topic: How could I parse this text file?  (Read 10005 times)

knuckles

  • Full Member
  • ***
  • Posts: 122
How could I parse this text file?
« on: September 09, 2018, 06:58:37 pm »
Given the following sample file as an example:

Quote
NPC_1_Dialog
    {
        Data_1
        {
            option1
            {
            "Text" "Test 1"
            }
            option2
            {
            "Text" "Test 2"
            }
        }
        Data_2
        {
            option1
            {
            "Text" "Another Test"
            }
        }
        Data_3
        {
            option1
            {
            "Text" "Hello World"
            }
            option2
            {
            "Text" "Good morning"
            }
        }
    }

NPC_2_Dialog
    {
        Data_1
        {
            option1
            {
            "Text" "Some Message"
            }
            option2
            {
            "Text" "Hi!"
            }
        }
        Data_2
        {
            option1
            {
            "Text" "Goodbye"
            }
        }
    }

What would be the approach to parsing this type of data? Note this is not Json although it looks like one.

I need to load a file such as the above into a stringlist I guess and then parse the data into 2 seperate object lists. I have no control or influence of how this file is saved, I must simply load it and parse it.

List 1 to contain the names (NPC_1_Dialog and NPC_2_Dialog etc)
List 2 to contain the sub list (Data_1 and Data_2 etc)
The values such as "Text" "Hi!" I will be stored in a standard class which are part of (Data_1 and Data_2 etc)

I have no problem creating the structure for the object lists etc, but I am unsure how to parse the text file and retrieve the data.

I would be grateful for some pointers and tips of how I should approach this task and can provide more information if required.

Thank you.

jamie

  • Hero Member
  • *****
  • Posts: 6129
Re: How could I parse this text file?
« Reply #1 on: September 09, 2018, 07:27:59 pm »
I would first have a counter for the "{" and when you read each line you test for this value...

if for example the first time you read a string in and there is no "{" at the start, then you test
for the value of the "{" counter, if it is zero then you can assume you are at a main heading and
thus can save this string value somewhere or act on it, then you inc("{") counter and then move
on to the next line and retrieve the string.
 
 if the next string you retrieve does not have a "{" at the start then you can assume this is a
sub heading and then do with it as you wish..

 when ever you encoder the "}" you should dec("{" counter" so that you can unroll the nested
incursion.

 so it it looks here that you have the Main heading for a dialog name, a sub heading for what is to
come and the options for each..

 What I would do is make a function that simply retrieves the strings line by line and there by
handling the "{" and returning the gotten string in a xxx(Var ?):StatusCode; this code can be used to
indicate where the code should follow next, for example a case statement for which calls other
functions..

 so basically you can start at a main loop where the first Case statement handles the Main Headings
The Parser function can return the Case number to  use.
 The parser will then  move into the next depth of the nested calls and also you should be calling a
function from that first Case list that also has a another Case List in it to go deeper.

 I think you get the idea.

 P.S.
   Put a bail out limit on the number of times the "{" counter gets incremented, if it goes to far you
have a problem because when you come to a "}" you should dec that counter of course..

The only true wisdom is knowing you know nothing

knuckles

  • Full Member
  • ***
  • Posts: 122
Re: How could I parse this text file?
« Reply #2 on: September 09, 2018, 07:40:35 pm »
Thanks for the detailed reply, I'll take a look a bit later see what I can come up with :)

soerensen3

  • Full Member
  • ***
  • Posts: 213
Re: How could I parse this text file?
« Reply #3 on: September 09, 2018, 08:59:17 pm »
If you use recursion you might not have to count the brackets (It is said that recursion slows down the execution but I don't know by what degree). But I think that in a parser it helps a lot.

I would store the whole file in a string or string stream. Then you have a cursor which is a index to indicate where you are in the string.

Here is a pseudo code to give you an idea how I would parse the file (There might be better ways though but I've written some parser like this which were always good enough for me).

Code: Pascal  [Select][+][-]
  1. function Ch: Char; //Return InFile[ Cursor ]
  2.  
  3. function ReadIdentifier: String;
  4.   while uppercase( Ch ) in [ 'A' ..'Z', '_' ] do begin
  5.     ReadChar;
  6.     Inc( Cursor );
  7.   end;
  8.  
  9. function ReadString: String;
  10.   Inc( Cursor ); // Skip over first quote
  11.   while Ch <> '"' do begin //End quote
  12.     ReadChar;
  13.     Inc( Cursor );
  14.   end;
  15.  
  16. function ReadBracket: String;
  17.   while Ch <> '}' do begin //This will work nested because we call read bracket inside read bracket. It will always select the right bracket.
  18.     ReadIdentifier;
  19.     ReadBracket;
  20.   end;
  21.  
  22. procedure ParseInFile;
  23.   while not EOF do
  24.     ParseIdentifier;
  25.     ParseBracket;
  26.  
Now you should be able to parse everything in the file but the strings. You could solve this by passing a callback function to ReadBracket instead of the predefined behaviour. Then you can make a function for each level which might call the function for the next level. Make sure to handle white spaces and tabs. You can do that with a SkipWhiteSpace function. You call it  before each Parse Command. Of course you need to store your data somewhere but you may start with writeln debug info on everything that was read.
Lazarus 1.9 with FPC 3.0.4
Target: Manjaro Linux 64 Bit (4.9.68-1-MANJARO)

knuckles

  • Full Member
  • ***
  • Posts: 122
Re: How could I parse this text file?
« Reply #4 on: September 09, 2018, 10:52:01 pm »
This is as good as I can get (see screenshot) it's too complex for me or I am overthinking it  >:(

you can recreate this by making a new project and dropping 3 listboxes down, add a form create and destroy handler:

Code: Pascal  [Select][+][-]
  1. unit main;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls, _classes;
  9.  
  10. type
  11.  
  12.   { TMainForm }
  13.  
  14.   TMainForm = class(TForm)
  15.     ListBox1: TListBox;
  16.     ListBox2: TListBox;
  17.     Label2: TLabel;
  18.     Edit2: TEdit;
  19.     ListBox3: TListBox;
  20.     Label1: TLabel;
  21.     Label3: TLabel;
  22.     Label4: TLabel;
  23.     procedure FormCreate(Sender: TObject);
  24.     procedure FormDestroy(Sender: TObject);
  25.     procedure ListBox1Click(Sender: TObject);
  26.   private
  27.     procedure PopulateDialogsList;
  28.     procedure PopulateDialogEntriesList(ADialog: TDialog);
  29.   public
  30.  
  31.   end;
  32.  
  33. var
  34.   MainForm: TMainForm;
  35.  
  36. implementation
  37.  
  38. {$R *.lfm}
  39.  
  40. { TMainForm }
  41.  
  42. procedure TMainForm.PopulateDialogsList;
  43. var
  44.   I: Integer;
  45.   Obj: TDialog;
  46. begin
  47.   ListBox1.Items.BeginUpdate;
  48.   try
  49.     ListBox1.Items.Clear;
  50.  
  51.     for I := 0 to FDialogCollection.Dialogs.Count - 1 do
  52.     begin
  53.       Obj := FDialogCollection.Dialogs.Items[I];
  54.       if Obj <> nil then
  55.       begin
  56.         ListBox1.Items.AddObject(Obj.Name, Obj);
  57.       end;
  58.     end;
  59.   finally
  60.     ListBox1.Items.EndUpdate;
  61.   end;
  62. end;
  63.  
  64. procedure TMainForm.PopulateDialogEntriesList(ADialog: TDialog);
  65. var
  66.   I: Integer;
  67.   Obj: TDialogEntry;
  68. begin
  69.   ListBox2.Items.BeginUpdate;
  70.   try
  71.     ListBox2.Items.Clear;
  72.  
  73.     for I := 0 to ADialog.DialogEntries.Count - 1 do
  74.     begin
  75.       Obj := ADialog.DialogEntries.Items[I];
  76.       if Obj <> nil then
  77.       begin
  78.         ListBox2.Items.AddObject(Obj.Name, Obj);
  79.       end;
  80.     end;
  81.   finally
  82.     ListBox2.Items.EndUpdate;
  83.   end;
  84. end;
  85.  
  86. procedure TMainForm.FormCreate(Sender: TObject);
  87. begin
  88.   FDialogCollection := TDialogCollection.Create;
  89.   FDialogCollection.LoadFromFile('C:\some path\sample file.txt'); // Change this
  90.   PopulateDialogsList;
  91. end;
  92.  
  93. procedure TMainForm.FormDestroy(Sender: TObject);
  94. begin
  95.   FDialogCollection.Free;
  96. end;
  97.  
  98. procedure TMainForm.ListBox1Click(Sender: TObject);
  99. var
  100.   Obj: TDialog;
  101. begin
  102.   if ListBox1.ItemIndex <> -1 then
  103.   begin
  104.     Obj := TDialog(ListBox1.Items.Objects[ListBox1.ItemIndex]);
  105.     if Obj <> nil then
  106.     begin
  107.       PopulateDialogEntriesList(Obj);
  108.     end;
  109.   end;
  110. end;
  111.  
  112. end.

In the second unit (_classes) I have the objects list defined and where the coding nightmare is:

Code: Pascal  [Select][+][-]
  1. unit _classes;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Dialogs, Fgl;
  9.  
  10. type
  11.  
  12.   { TDialogOption }
  13.  
  14.   TDialogOption = class
  15.   private
  16.     FText: string;
  17.   public
  18.     constructor Create;
  19.     destructor Destroy; override;
  20.     property Text: string read FText write FText;
  21.   end;
  22.   TDialogOptionList = specialize TFPGObjectList<TDialogOption>;
  23.  
  24.   { TDialogEntry }
  25.  
  26.   TDialogEntry = class
  27.   private
  28.     FName: string;
  29.     FDialogOptions: TDialogOptionList;
  30.   public
  31.     constructor Create;
  32.     destructor Destroy; override;
  33.     property DialogOptions: TDialogOptionList read FDialogOptions write FDialogOptions;
  34.     property Name: string read FName write FName;
  35.   end;
  36.   TDialogEntryList = specialize TFPGObjectList<TDialogEntry>;
  37.  
  38.   { TDialog }
  39.  
  40.   TDialog = class
  41.   private
  42.     FName: string;
  43.     FDialogEntries: TDialogEntryList;
  44.   public
  45.     constructor Create;
  46.     destructor Destroy; override;
  47.     property DialogEntries: TDialogEntryList read FDialogEntries write FDialogEntries;
  48.     property Name: string read FName write FName;
  49.   end;
  50.   TDialogList = specialize TFPGObjectList<TDialog>;
  51.  
  52.   { TDialogCollection }
  53.  
  54.   TDialogCollection = class
  55.   private
  56.     FDialogs: TDialogList;
  57.   public
  58.     constructor Create;
  59.     destructor Destroy; override;
  60.     procedure LoadFromFile(FileName: string);
  61.     property Dialogs: TDialogList read FDialogs write FDialogs;
  62.   end;
  63.  
  64. var
  65.   FDialogCollection: TDialogCollection;
  66.  
  67. implementation
  68.  
  69. { TDialogOption }
  70.  
  71. constructor TDialogOption.Create;
  72. begin
  73.   inherited Create;
  74.   FText := '';
  75. end;
  76.  
  77. destructor TDialogOption.Destroy;
  78. begin
  79.   inherited Destroy;
  80. end;
  81.  
  82. { TDialogEntry }
  83.  
  84. constructor TDialogEntry.Create;
  85. begin
  86.   inherited Create;
  87.   FName := '';
  88.   FDialogOptions := TDialogOptionList.Create(True);
  89. end;
  90.  
  91. destructor TDialogEntry.Destroy;
  92. begin
  93.   FDialogOptions.Free;
  94.   inherited Destroy;
  95. end;
  96.  
  97. { TDialog }
  98.  
  99. constructor TDialog.Create;
  100. begin
  101.   inherited Create;
  102.   FName := '';
  103.   FDialogEntries := TDialogEntryList.Create(True);
  104. end;
  105.  
  106. destructor TDialog.Destroy;
  107. begin
  108.   FDialogEntries.Free;
  109.   inherited Destroy;
  110. end;
  111.  
  112. { TDialogCollection }
  113.  
  114. constructor TDialogCollection.Create;
  115. begin
  116.   inherited Create;
  117.   FDialogs := TDialogList.Create(True);
  118. end;
  119.  
  120. destructor TDialogCollection.Destroy;
  121. begin
  122.   FDialogs.Free;
  123.   inherited Destroy;
  124. end;
  125.  
  126. procedure TDialogCollection.LoadFromFile(FileName: string);
  127. var
  128.   SL: TStringList;
  129.   S: string;
  130.   Level: Integer;
  131.   I: Integer;
  132.   Dialog: TDialog;
  133.   DialogEntry: TDialogEntry;
  134. begin
  135.   SL := TStringList.Create;
  136.   try
  137.     SL.LoadFromFile(FileName);
  138.  
  139.     S := '';
  140.     Level := 0;
  141.     for I := 0 to SL.Count - 1 do
  142.     begin
  143.       if Trim(SL.Strings[I]) = '{' then
  144.       begin
  145.         S := SL.Strings[I - 1]; // get the dialog name eg NPC_1_Dialog
  146.         if Level < 1 then
  147.           Inc(Level);
  148.         Continue;
  149.       end;
  150.  
  151.       while (Level > 0) do
  152.       begin
  153.         Dialog := TDialog.Create;
  154.         Dialog.FName := S;
  155.  
  156.         if Trim(SL.Strings[I]) = '{' then
  157.         begin
  158.           S := SL.Strings[I - 1]; // get the dialog entry name eg Data_1
  159.           if Level < 2 then
  160.             Inc(Level);
  161.           Continue;
  162.         end;
  163.  
  164.         if Trim(SL.Strings[I]) = '}' then Dec(Level);
  165.         while (Level > 1) do
  166.         begin
  167.           DialogEntry := TDialogEntry.Create;
  168.           DialogEntry.FName := S;
  169.           Dialog.FDialogEntries.Add(DialogEntry);
  170.           if Trim(SL.Strings[I]) = '}' then Dec(Level);
  171.         end;
  172.  
  173.         FDialogs.Add(Dialog);
  174.         if Trim(SL.Strings[I]) = '}' then Dec(Level);
  175.         Break;
  176.       end;
  177.     end;
  178.   finally
  179.     SL.Free;
  180.   end;
  181. end;
  182.  
  183. end.  

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: How could I parse this text file?
« Reply #5 on: September 09, 2018, 11:07:47 pm »
Even if just to test your assumptions, you could use a TTreeView instead of a TListBox. An advantage of TTreeView is that once you get it right, you can extrapolate (or just plain copy!) your tree to almost any other nested-style structure ... like a TMenu, v..g  :)

ETA: The process in your case is rather simple: when you find a '{', read the name and add a child node with that name; When you find the closing '}' go up one level on the tree. That's really all.

EATA: Incidentally, if you're sure the file is always relatively small I would read it directly to a string (or a string-stream), discard the line-ends and process it.
« Last Edit: September 09, 2018, 11:21:39 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

knuckles

  • Full Member
  • ***
  • Posts: 122
Re: How could I parse this text file?
« Reply #6 on: September 10, 2018, 12:39:04 am »
ok I will add a treeview as a parameter and remove all those object lists and things to simply it see if it can help me %)

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: How could I parse this text file?
« Reply #7 on: September 10, 2018, 10:11:41 am »
There are many, many ways to parse and collect the parsed data.
lucamar's suggestion to use a pre-built data container like TTreeView clearly saves the  work of constructing your own database from lists (which are not node-oriented).
One way to use this idea is attached.

Edson

  • Hero Member
  • *****
  • Posts: 1302
Re: How could I parse this text file?
« Reply #8 on: September 10, 2018, 05:03:51 pm »
You can use SynFacilSyn: https://github.com/t-edson/SynFacilSyn as a lexer and you have additionally a highlighter for SynEdit.

The XML syntax file would be something like:

Code: XML  [Select][+][-]
  1. <?xml version="1.0"?>
  2. <Language >
  3.   <Identifiers CharsStart= "A..Za..z_" Content = "A..Za..z0..9_"/>
  4.   <String Start='"' End='"'/>
  5.   <Block Start="{" End="}" Folding="true"/>
  6. </Language>
  7.  
Lazarus 2.2.6 - FPC 3.2.2 - x86_64-win64 on Windows 10

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: How could I parse this text file?
« Reply #9 on: September 11, 2018, 09:44:46 am »
Obvious answer - make parser. Parser is just some sort of finite-state machine. I would make one for you, but I currently don't have time for this. I just can give some advices. Parser parses text symbol by symbol. When each symbol is processed, parser either continues to stay in the same state, switches to another or performs some action. First of all you need to define symbol classes - white space, character, number, quote, opening bracket, closing braket, etc. What states can I suggest?

psWaitForDialogID - skip all white spaces (space, CRLF) before first dialog ID, first character switches parser to next state, any non character should cause error
psDialogID - store all characters to dialog ID variable, first non-character or white space switches parser to next state, creates Dialog object
psWaitForDialogOpeningBracket - skip all white space before first dialog opening bracket, any character, except white space or bracket causes error, closing bracket returns parser to psWaitForDialogID state
psWaitForDataID - skip all white spaces before data ID, etc.
psDataID - creates Data object and adds it to Dialog object
psWaitForDataOpeningBracket - closing bracket returns parser to psWaitForDataID state
psWaitForOptionID
psOptionID - creates Option object and adds it to Data object
psWaitForOptionBraket - closing bracket returns parser to psWaitForOptionID state
psWaitForNameOpeningQuote
psName
psWaitForValueOpeningQuote - error, if anything, but white space of quote, quote adds another option to Data object

Also, this parser is simple, but not optimal, cuz, as you can see, Dialog, Data and Option blocks have similar structure, so you can also use advanced parser - recursive one.
« Last Edit: September 11, 2018, 10:02:11 am by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

knuckles

  • Full Member
  • ***
  • Posts: 122
Re: How could I parse this text file?
« Reply #10 on: September 12, 2018, 12:58:27 am »
Sorry for the late reply it took me a while to look over the code howardpc posted, it worked great for most parts so you have my appreciation. However I convinced the person I was doing this for to use a standard format file instead rather than have me hack around trying to read this type of file and running into various problems.

Thanks again all who posted.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: How could I parse this text file?
« Reply #11 on: September 12, 2018, 10:56:47 am »
Here is my parser, if you're still interested. I haven't tested it at all, i.e. just have written it and launched, but it seems to work. Here is it's output:
Code: [Select]
Node: Root
Attributes(0):
Child nodes(2):
Node: NPC_1_Dialog
Attributes(0):
Child nodes(3):
Node: Data_1
Attributes(0):
Child nodes(2):
Node: option1
Attributes(1):
Text = Test 1
Child nodes(0):
Node: option2
Attributes(1):
Text = Test 2
Child nodes(0):
Node: Data_2
Attributes(0):
Child nodes(1):
Node: option1
Attributes(1):
Text = Another Test
Child nodes(0):
Node: Data_3
Attributes(0):
Child nodes(2):
Node: option1
Attributes(1):
Text = Hello World
Child nodes(0):
Node: option2
Attributes(1):
Text = Good morning
Child nodes(0):
Node: NPC_2_Dialog
Attributes(0):
Child nodes(2):
Node: Data_1
Attributes(0):
Child nodes(2):
Node: option1
Attributes(1):
Text = Some Message
Child nodes(0):
Node: option2
Attributes(1):
Text = Hi!
Child nodes(0):
Node: Data_2
Attributes(0):
Child nodes(1):
Node: option1
Attributes(1):
Text = Goodbye
Child nodes(0):

P.S. TStrings is used to simplify things. It's better to use your own container. For example using "=" symbol in options may mess things up.
« Last Edit: September 12, 2018, 11:11:54 am by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

guest58172

  • Guest
Re: How could I parse this text file?
« Reply #12 on: September 12, 2018, 01:20:35 pm »
Given the following sample file as an example:

Are there specs for this format ? I'm afraid that your example contains only a few subset of what's possible, e.g something like Object and Data and we don't know if there can be several Data in each Object, we don't know if they are a List (array literal like in JSON), we don't know if identifiers can be UTF-8, we don't know if the double quotes are escapable, etc...

Where does this sample come from ?

kapibara

  • Hero Member
  • *****
  • Posts: 610
Re: How could I parse this text file?
« Reply #13 on: September 12, 2018, 08:04:17 pm »
Quick & dirty parsing into objects and loaded into GUI master/detail. Dont forget to free the objects or there will be memory leak.

EDIT: for the parsing to work, no empty row above the NPC row is allowed.

OK:

Code: Pascal  [Select][+][-]
  1.     }
  2. NPC_2_Dialog


Not OK:

Code: Pascal  [Select][+][-]
  1.     }
  2.  
  3. NPC_2_Dialog
« Last Edit: September 13, 2018, 12:47:43 am by kapibara »
Lazarus trunk / fpc 3.2.2 / Kubuntu 22.04 - 64 bit

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
Re: How could I parse this text file?
« Reply #14 on: September 15, 2018, 12:52:10 am »
Or ... you could build a grammar and use some parser generator like Gold. It took me less then 10 minutes to build and test a Gold grammar for your Dialogue format. From there to using it in your code is not too far.

Code: bnf  [Select][+][-]
  1. "Name"     = 'Dialogue Grammar'
  2. "Author"   = 'Zeljko Avramovic'
  3. "Version"  = '1.0'
  4. "About"    = 'Grammar for Dialogue files as described at https://forum.lazarus.freepascal.org/index.php/topic,42512.msg296818.html#msg296818'
  5.  
  6. "Start Symbol" = <Dialogues>
  7. "Case Sensitive" = False
  8.  
  9. ! ------------------------------------------------- Sets
  10.  
  11. {Id Head}        = {Letter} + [_]
  12. {Id Tail}        = {Id Head} + {Digit}
  13. id               = {Id Head}{Id Tail}*
  14. {StringChar}     = {Printable} - ["]                ! " - added commented paired quote for forum BNF formatting to work correctly
  15. string           = '"' {StringChar}* '"'          
  16.              
  17. ! ------------------------------------------------- Rules
  18.  
  19. <Dialogues> ::= <Dialogue>
  20.             |   <Dialogue> <Dialogues>
  21.          
  22. <Dialogue>  ::= id '{' <Datas> '}'
  23.  
  24. <Datas>     ::= <Data>
  25.             |   <Data> <Datas>
  26.  
  27. <Data>      ::= id '{' <Options> '}'
  28.            
  29. <Options>   ::= <Option>
  30.             |   <Option> <Options>
  31.  
  32. <Option>    ::= id '{' string string '}'

The above grammar is for fixed 2 strings in each <Option> as in your sample Dialogue file. If for example you need 1 or more strings in each <Option>, then you simply change this line
Code: bnf  [Select][+][-]
  1. <Option>   ::= id '{' string string '}'
into this:
Code: bnf  [Select][+][-]
  1. <Option>   ::= id '{' <Strings> '}'
  2.  
  3. <Strings>  ::= string
  4.            |   string <Strings>

It's that simple  ;)

http://wiki.freepascal.org/Gold
« Last Edit: September 15, 2018, 01:26:36 am by avra »
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

 

TinyPortal © 2005-2018