Recent

Author Topic: Reading a complex text file in Pascal  (Read 9121 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 8393
Re: Reading a complex text file in Pascal
« Reply #15 on: October 25, 2019, 09:25:56 am »
It isn't my intention to give you a hard time but, one of the important reasons it would be good to parse the file is to ensure it is structured as desired and won't cause unexpected problems if it isn't.

It's Friday morning, the Sun's not yet shining, and I'm supposed to be working on some legal stuff.

It's at times like this that I wish I could come up with something both memorable and instructive, some riff on the revered

"I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain. Time to die."

The whole point of a parser is that it's open ended: it won't get broken by a change in the number of records or in the number of content lines in each (which was the original question), and while it might say "I don't recognise this" it won't get thrown or behave unpredictably.

Oh, I've tried hacks over the years. Simple compilers and threaded-interpretive code written in assembler. Pattern recognition. Nested state machines. Probably every bodge and mistaken assumption you could imagine, and a few you shouldn't. But for something like this- nice serialised data with a fairly well-defined structure- there's only one "right" way to do it, and that's using a good old 1960s parser. And one emphatically doesn't have to understand all of the linguistic theory that those not gainfully employed have built up around the field.

MarkMLl


« Last Edit: October 25, 2019, 09:48:20 am by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Handoko

  • Hero Member
  • *****
  • Posts: 5425
  • My goal: build my own game engine using Lazarus
Re: Reading a complex text file in Pascal
« Reply #16 on: October 25, 2019, 09:54:29 am »
@maurobio

It seems I'm late. But hope it can be useful:

Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$APPTYPE CONSOLE}
  4. {$mode objfpc}{$H+}
  5.  
  6. uses
  7.   Classes, sysutils;
  8.  
  9. type
  10.   TMyRecord = record
  11.     Name:  string;
  12.     Items: TStringList;
  13.   end;
  14.  
  15. var
  16.   Data: array of TMyRecord;
  17.  
  18. procedure ExpandDataWhenNeeded(RecNo: Integer);
  19. begin
  20.   if (Length(Data) >= RecNo) then Exit;
  21.   SetLength(Data, RecNo);
  22.   Data[RecNo-1].Items := TStringList.Create;
  23. end;
  24.  
  25. function isRecordHeader
  26.   (const S: string; out RecNo: Integer; out Name: string): Boolean;
  27. const
  28.   HeaderSymbolBegin = '#';
  29.   HeaderSymbolEnd   = '.';
  30. var
  31.   Number : string;
  32.   i      : Integer;
  33. begin
  34.   Result := False;
  35.   if not(S[1] = HeaderSymbolBegin) or S.IsEmpty then Exit;
  36.   Number := '';
  37.   for i := 2 to S.Length do
  38.   begin
  39.     if (S[i] = HeaderSymbolEnd) and not(Number.IsEmpty) then
  40.     begin
  41.       RecNo  := Number.ToInteger;
  42.       Name   := Copy(S, i+1, S.Length).Trim;
  43.       Result := True;
  44.       Exit;
  45.     end;
  46.     Number := Number + S[i];
  47.   end;
  48. end;
  49.  
  50. procedure ReadAllData;
  51. const
  52.   FileName = 'Data.txt';
  53. var
  54.   DataFile : TextFile;
  55.   RecNo    : Integer;
  56.   Line     : string;
  57.   Name     : string;
  58. begin
  59.   // Clear previous data
  60.   for RecNo := 0 to Length(Data)-1 do
  61.     Data[RecNo].Items.Free;
  62.   SetLength(Data, 0);
  63.  
  64.   // Read data from file
  65.   RecNo := 0;
  66.   AssignFile(DataFile, FileName);
  67.   Reset(DataFile);
  68.   while not EOF(DataFile)do
  69.   begin
  70.     ReadLn(DataFile, Line);
  71.     Line := Line.Trim;
  72.     if Line = '' then Continue;
  73.     // Line is a record header
  74.     if isRecordHeader(Line, RecNo, Name) then
  75.     begin
  76.       ExpandDataWhenNeeded(RecNo);
  77.       Data[RecNo-1].Name := Name;
  78.     end
  79.     else
  80.     // Line is a state
  81.     begin
  82.       if RecNo <= 0 then Continue;
  83.       Data[RecNo-1].Items.Add(Line);
  84.     end;
  85.   end;
  86.   CloseFile(DataFile);
  87.  
  88.   // Display info
  89.   WriteLn('File name: ' + FileName);
  90.   WriteLn('Contains ' + Length(Data).ToString + ' record(s).');
  91.   WriteLn;
  92. end;
  93.  
  94. procedure ShowARecord;
  95. var
  96.   RecNo : Integer;
  97.   S     : string;
  98. begin
  99.   // Read user input
  100.   WriteLn('Please provide the record no: (1..' + Length(Data).ToString + ')');
  101.   ReadLn(S);
  102.   S := S.Trim;
  103.   // Validate input
  104.   if S.IsEmpty then Exit;
  105.   if not(TryStrToInt(S, RecNo)) then
  106.   begin
  107.     WriteLn('Cannot show record, wrong record no.');
  108.     Exit;
  109.   end;
  110.   if (RecNo < 1) or (RecNo > Length(Data)) then
  111.   begin
  112.     WriteLn('No such record');
  113.     Exit;
  114.   end;
  115.   // Show result
  116.   WriteLn('Record no #' + RecNo.ToString + ', ' +
  117.     Data[RecNo-1].Items.Count.ToString + ' states.');
  118.   WriteLn(Data[RecNo-1].Name);
  119.   WriteLn;
  120.   for S in Data[RecNo-1].Items do
  121.     WriteLn(S);
  122.   WriteLn;
  123. end;
  124.  
  125. var
  126.   Input: string;
  127. begin
  128.   ReadAllData;
  129.   repeat
  130.     WriteLn('Type: r -> Reload, s -> Show Record, q -> Exit');
  131.     ReadLn(Input);
  132.     WriteLn;
  133.     case Input of
  134.       'r': ReadAllData;
  135.       's': ShowARecord;
  136.       'q': Exit;
  137.       else begin
  138.         WriteLn('Unrecognize input: ' + Input);
  139.         WriteLn;
  140.       end;
  141.     end;
  142.  
  143.   until False;
  144. end.

I quickly tested the code, it seemed to work. But it may not working correctly, especially if the input file contains bad data.

~edit~
My code has a potential bug.

The ExpandDataWhenNeeded should be:

Code: Pascal  [Select][+][-]
  1. procedure ExpandDataWhenNeeded(RecNo: Integer);
  2. var
  3.   i: Integer;
  4. begin
  5.   if (Length(Data) >= RecNo) then Exit;
  6.   i := RecNo;
  7.   SetLength(Data, RecNo);
  8.   for i := i to RecNo-1 do
  9.     Data[i].Items := TStringList.Create;
  10. end;
« Last Edit: October 25, 2019, 02:07:25 pm by Handoko »

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #17 on: October 25, 2019, 01:01:59 pm »
@Handoko, many thanks for the very pretty and sophisticated piece of code! I will try it, which will prove no doubt to be a great source of good ideas and inspiration.

I have no doubt that, as both @MarkML and @440bx have correctly pointed out, a full-fledged parser is the best way of dealing with this matter. But for immediate purposes, a fast hack as initially provided by @Winni (or perhaps even the more sophisticated, but still not yet awfully complex, routines provided by @Handoko) will have to suffice.

Perhaps this is the time to tell that the file of descriptors I first presented when I opened this thread is part of a format called DELTA ("DEscription Language for TAxonomy"), developed in Australia around 1980 for the representation and processing of biological taxonomic data (see www.delta-intkey.com) and still largely used today by the academic community of zoologists and botanists. The DELTA format is much more complex (but also flexible!) than just the descriptor file I provided. To begin with, a complete DELTA dataset consists of three separate files - besides the descriptors file I have already provided, there is also a "data matrix" file, scoring objects for each descriptor present in the descriptors file, as follows:

# person 1/
1,1-2 2,5 3,- 4,- 5,3 6<an> 7,1 8,7

# person 2/
1,1-2 2,5 3,400 4,- 5,3 6<an> 7,1 8,7

# person 3/
1,2-4 2,1 3,23 4,23-45 5,3 6<adsf> 7,1 8,7

# person 4/
1,2 2,2 3,67 4,12.23-23 5,50 6<COBOL> 7,1 8,7

# person 5/
1,2-6 2,4 3,20-70 4,1.23-3 5,21 6<COBOL> 7,1 8,7

# person 6/
1,2/3 2,3 3,50-100 4,4-5 5,50 6<COBOL> 7,1 8,7

# person 7/
1,2-3 2,1 3,60 4,100-1223 5,5 6<COBOL> 7,1 8,7

# person 8/
1,2-3 2,4 3,23 4,1400 5,50 6<COBOL> 7,1 8,7

and a third file containing metadata, as follows:

*SHOW ~ Dataset specifications.

*DATA BUFFER SIZE 4000

*NUMBER OF CHARACTERS 9

*MAXIMUM NUMBER OF STATES 10

*MAXIMUM NUMBER OF ITEMS 8

*CHARACTER TYPES 2,OM 3,IN 4,RN 5,IN 6,TE 9,OM

*NUMBERS OF STATES 1,8 2,5 8,10

As can be seen in this last file, the DELTA format also includes a command language, with hundreds of commands organized in a hierarchy of precedence of execution. And there is much, much more...

It happens that writing a decent parser for the DELTA format has proven to be very hard and time-consuming. The original programs for handling the DELTA format were written in FORTRAN (and still in use today), with the most recent versions being written in Java. Over the years, some have attempted to develop general-purpose parsers for use as software libraries, with attempts being made in C++, Python, and Pascal/Delphi (see freedelta.sourceforge.net). I have myself written a large library (with thousands of lines of code) for reading DELTA files in Pascal/Delphi around 1996-1998. This was back before this forum (and StackOverflow!), and therefore my implementation have not necessarily been the best one. I then did give up and spent the last ten years using Python, but for many reasons outside the scope of this discussion, I have recently decided to return to the old and good world of Pascal (especially given the maturity of such a superb free, cross-platform development tool as Lazarus). But instead of simply using my old library "as-is", I would like to both "modernized" and simplify them, adopting at the same time a more "minimalist" approach (perhaps this is a side effect of spending ten years with Python!), if possible.

So far, the best existing DELTA parser is the C++ version. It would be tempting to translate it into Pascal, or perhaps turning the current static library into a dynamic one (that is, a Windows DLL or Linux shared library) for use in FPC/Lazarus applications.

That is all to say that properly working along the lines suggested by @MarkML and @440bx would be a huge task, but unfortunately this is beyond my available time and resources. The advantage of such fast hacks as that provided by @Winni is that they can be used for the development of prototypes and "proof-or-concept" applications which hopefully may, over time, evolve into more mature software.

Thank you all very much!
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

MarkMLl

  • Hero Member
  • *****
  • Posts: 8393
Re: Reading a complex text file in Pascal
« Reply #18 on: October 25, 2019, 01:07:45 pm »
Thanks for the background on DELTA, I'm very glad too see that it's not just something you pulled out of a hat.

Basically, it's a good old punched card format :-)

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #19 on: October 25, 2019, 01:19:40 pm »
Glad to have things clarified, @MarkML. I surely would not be so disrespectful of such a great community of developers as the FPC/Lazarus community to come here "pulling things out a hat" (I don't even use one! ::)).

Yes, DELTA is from the old days of punching cards, but it is still quite readable by humans (as was the intention of its original developers).

Cheers,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Handoko

  • Hero Member
  • *****
  • Posts: 5425
  • My goal: build my own game engine using Lazarus
Re: Reading a complex text file in Pascal
« Reply #20 on: October 25, 2019, 02:06:38 pm »
Oops, I just saw a potential bug in my code. I added the comment to fix it:
https://forum.lazarus.freepascal.org/index.php/topic,47187.msg337305.html#msg337305

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #21 on: October 25, 2019, 08:42:02 pm »
Thanks, @Handoko. I added the modified procedure to the code of your example.

Now, let me take this opportunity to ask a prosaic question: in your code, I noticed that you have called class methods for strings (eg. S := S.Trim), just like in Ruby. Also, I noticed the use of multiple return parameters (by means of the keyword "out") in function IsRecordHeader. I presume these are recent additions to FPC/Lazarus (and Delphi?). I have never seen such constructs either in former versions of FPC/Lazarus (1.x) or Delphi (the last one I used was Delphi 7). So, where could I find specific documentation on these (and other) new additions?

Thank you, again! :)
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

dsiders

  • Hero Member
  • *****
  • Posts: 1403
Re: Reading a complex text file in Pascal
« Reply #22 on: October 25, 2019, 09:04:53 pm »
Thanks, @Handoko. I added the modified procedure to the code of your example.

Now, let me take this opportunity to ask a prosaic question: in your code, I noticed that you have called class methods for strings (eg. S := S.Trim), just like in Ruby. Also, I noticed the use of multiple return parameters (by means of the keyword "out") in function IsRecordHeader. I presume these are recent additions to FPC/Lazarus (and Delphi?). I have never seen such constructs either in former versions of FPC/Lazarus (1.x) or Delphi (the last one I used was Delphi 7). So, where could I find specific documentation on these (and other) new additions?

Thank you, again! :)

Out parameters documented here: https://www.freepascal.org/docs-html/ref/refsu66.html. I don't think it's a recent addition... but I don't know when it came to be.

HTH
Preview the next Lazarus documentation release at: https://dsiders.gitlab.io/lazdocsnext

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #23 on: October 25, 2019, 09:45:37 pm »
Thanks, @dsiders. According to the documentation, there is little immediate difference between the "out" parameter and the classic "var" parameter (as I suspected).
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Reading a complex text file in Pascal
« Reply #24 on: October 25, 2019, 10:45:37 pm »
Hi!

Oh,oh! Take care and dont mix up out and var!

Think about the simple procedure

Code: Pascal  [Select][+][-]
  1. procedure inc ( var i : integer);

You give a value to the procedure, it is doing something with this parameter and then you get the output in the same variable.

With  out variable you are not allowed to giv a value as input. It is only allowed to receive an output from the variable!!

So you have exactly to know when you use out!

Winni

PS UCSD-Pascal? I used it for long years first on an Apple II clone and then on Stride 440/460 with 16/32 Users!

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #25 on: October 25, 2019, 11:09:15 pm »
@Winni

Thanks for the clarification on the "out" parameter. As I have told, it is new to me. I will be careful.

My first contact with the Pascal language, back in 1983 (!!) was UCSD Pascal running on a Burroughs 6700 mainframe and using punched cards  (:o).

Cheers,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

440bx

  • Hero Member
  • *****
  • Posts: 5302
Re: Reading a complex text file in Pascal
« Reply #26 on: October 26, 2019, 12:01:51 am »
With  out variable you are not allowed to giv a value as input. It is only allowed to receive an output from the variable!!

So you have exactly to know when you use out!
That is not the case at all.  An "out" variable can be initialized before passing it to the procedure/function that uses it and that initial value can be used by the procedure/function before it is modified.

The only thing "out" does in addition to what "var" does is inform the compiler that the parameter should be written to in the function/procedure (since it is "out"), nothing else.

Also, the fact that FPC complains when passing an uninitialized variable to a "var" parameter is non-standard and actually, downright incorrect but, it's been made quite clear that conceptual mistake is here to stay.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v4.0rc3) on Windows 7 SP1 64bit.

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Reading a complex text file in Pascal
« Reply #27 on: October 26, 2019, 01:33:22 am »
@440bx

So you say what I always thought:

It is a feature from the compiler builder for the compiler builders.

This is a real question!

Winni

440bx

  • Hero Member
  • *****
  • Posts: 5302
Re: Reading a complex text file in Pascal
« Reply #28 on: October 26, 2019, 02:53:23 am »
@440bx

So you say what I always thought:

It is a feature from the compiler builder for the compiler builders.

This is a real question!

Winni
It's not "for" the compiler builders, it's for everyone.  For instance, the most appropriate declaration of the Windows API function "GetClientRect" is:
Code: Pascal  [Select][+][-]
  1. BOOL GetClientRect(Wnd : THANDLE; out Rect : TRECT);
where "out" is used to indicate that the function sets the Rect parameter.

When used in a function that is written by the programmer, instead of used in an API declaration, "out" enables the compiler to emit a warning if there is some path in the function's logic where the value of the "out" parameter is not set, thereby helping the programmer catch bugs at compile time instead of runtime.

IOW, "out" simply gives more information than "var" to the compiler.  That additional information can be used by the compiler to catch logic errors at compile time.  A nice feature for everyone.

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v4.0rc3) on Windows 7 SP1 64bit.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Reading a complex text file in Pascal
« Reply #29 on: October 26, 2019, 12:32:23 pm »
[..] in your code, I noticed that you have called class methods for strings (eg. S := S.Trim), just like in Ruby [...]

Not exactly like in Ruby; in FPC it is made with type helpers, documented in chapter 10 of the reference: Class, Record and Type helpers. I don't remember ATM when they were introduced (in 3.0?) but yes, they are relatively recent.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

 

TinyPortal © 2005-2018