Recent

Author Topic: Reading a complex text file in Pascal  (Read 9122 times)

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #30 on: October 26, 2019, 01:16:15 pm »
@Lucamar,

Thanks. When I compared this feature of FPC (new to me) to Ruby, I referred to usage, not to implementation. It is quite elegant anyway, but I found the documentation very scanty; it would be nice to have a wiki page with examples!  ;D
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Reading a complex text file in Pascal
« Reply #31 on: October 26, 2019, 03:29:16 pm »
@maurobio

Yes, the documentation .....

BUT: Look for the Delphi Documentation at Embarcado. They have "hidden" it in a huge web page. So search  for "Embarcado filesize" if you want to know how to use filesize.

And if you are in {$mode Delphi} you dont have to care about differences in the syntax.

Winni

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #32 on: October 26, 2019, 03:48:54 pm »
@Winni,

Thanks, I usually avoid the clumsy Embarcadero website, but will try and look for more documentation on these new features there.

Cheers.
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Reading a complex text file in Pascal
« Reply #33 on: October 26, 2019, 04:03:26 pm »
@maurobio

I forgot: Do you know the fpc Documentation page?

Here: https://www.freepascal.org/docs.var

And if you have installed the help files then you put your cursor in the editor on the related routine and press F1.

And read the source code: Again put your cursor on a procedure/function/var/type and press
Alt Cursor up - the source appears in the editor.

Winni

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #34 on: October 26, 2019, 04:21:57 pm »
@Winni,

Yes, sure. But since I'm getting old, and lazy, I prefer to look for documentation for a specific function directly on Google than browse the documentation...  :-[
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Reading a complex text file in Pascal
« Reply #35 on: October 26, 2019, 04:22:43 pm »
BUT: Look for the Delphi Documentation at Embarcado. They have "hidden" it in a huge web page. So search  for "Embarcado filesize" if you want to know how to use filesize.

Or for "helpers" to find the docs on class and record helpers :)

Note that Delphi calls "record helper" what we call a "type helper"; a "record helper" in Free Pascal applies only to records. The concept is basically the same, only where Delphi declares, for example:

  TStringHelper = record helper for String

Free Pascal has it as:

  TStringHelper = type helper for String
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8393
Re: Reading a complex text file in Pascal
« Reply #36 on: October 26, 2019, 06:15:23 pm »
I didn't spot your sig earlier:

> UCSD Pascal / Burroughs 6700 / Master Control Program

I was going to remark that the file format reminded me of the way that Burroughs card readers could flag an "invalid character" at the start, but figured nobody would know what I was talking about... :-)

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Handoko

  • Hero Member
  • *****
  • Posts: 5425
  • My goal: build my own game engine using Lazarus
Re: Reading a complex text file in Pascal
« Reply #37 on: October 26, 2019, 07:20:14 pm »
... I prefer to look for documentation for a specific function directly on Google than browse the documentation...  :-[

Me too, too lazy to browse the documentation. Luckily, we have Google. Just put the text "free pascal" and add the keyword you want to search in the Google search box. For example:

free pascal string helper


The first one on the result list, usually is the best one you should read.

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #38 on: October 26, 2019, 08:31:47 pm »
@lucamar

Thanks, I am looking into it.

@MarkML

We're getting older, but hopefully better! Good old times of punched card stacks and processing jobs!

@Handoko

Again, many thanks! That's just what I was looking for: https://www.freepascal.org/docs-html/rtl/sysutils/tstringhelper.html
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #39 on: October 26, 2019, 09:41:05 pm »
OK, guys, I have another problem with these files in DELTA format and I again ask for your help (even under the risk of being deemed inconvenient and banned from this forum forever!  ;)).

So, I have a file with 'real' descriptions coded in DELTA, as follows (just two species are included in this example). The first line (preceded by a '#' character) indicates the species name, and the subsequent lines represent the coded description of each species. The numbers before the commas correspond to the descriptor number (as listed in another file, formatted just like that I have used as an exemple when starting this thread), and the numbers after the commas indicate the attribute value (or "state") for that descriptor: so, for exemple, 2,2 codes the state 2 for descriptor 2, etc. Attribute-value pairs are separated by spaces, and blank lines separate the records for each species. Notice that the lines can be (and are) broken by carriage returns.

# E. australasicum <(F. Muell.) Koern.>/
1,2-10 2,2 3,2 4,2-6.5 5,1.3-2.2 6,1 7,3-5 8,1 9,2 10,2-10 11,4-6 12,2
13,2 14,14-35 15,2 16,2 17,2-3 18,2.5-3.5 19,1-2 20,4<to broadly
elliptic> 21,1.5-1.75 22,0.75-1 23,3 24,2 25,2 26,1 27,2 28,2
29<narrow->,4 30,1.2-1.6 31,0.5-0.6 32,2 33,2 34,2 35,1 36,1 37,1
38,3/4<rarely> 39,1 40,2 41,2 42,0.5-0.77 43,0.1 44,1 45,2 46,4< one
larger> 47,1 50,2 51,2 52,2 53,5 54,1 56,1/3<upper flowers typically
lacking perianth> 57,1 58,2 59,1 60,2 61,0.9 62,0.01 63,1 64,2 65,1 75,2
76,1 77,0.21-0.32 78,3 79,1 80,0.25-0.28 81,0.175-0.21 82,2 83,5

# E. australe <R. Br.>/
1,33-100 2,2 3,2 4,23-80 5,2-12 6,3 7,8-18 8,1 9,1<at base with dense
long hyaline hairs and scattered hairs along lamina> 10,33-100 11,5-7
12,1 13,2 14,120-150 15,2-1<sparsely> 16,1<to depressed hemispheric>
17,4-8 18,7-10.5 19,4 20,2<broad> 21,1.5-3.5 22,2.5-3.5 23,3 24,1<at
base of outer bracts with hyaline hairs> 25,2 26,2 27,2 28,4 29,1
30,2.5-3 31,1.75-2.5 32,3&4<tip inflexed> 33,1<apically with short white
papillae> 34,2 35,3 36,2 38,4 39,3 40,1 42,1.75-3 43,0.7-1.5 44,5 45,2
46,4 47,1 50,2 51,1<with apical fringe of white hairs> 52,2 53,5 54,2
56,4 57,1 58,1 59,2<, laterals> 60,8&12<with dissected margin> 61,2-3
62,0.75-1.25 63,6 64,1<at base of crest with patch of white hairs,
median sepal linear, 2-2.25 x 0.25-0.6 mm, obtuse, glabrous> 65,4 66,1
67,1 68,1 69,2 70,2-2.5 71,0.1-0.2 72,2 73,1<apically with white hairs,
hyaline hairs scattered on margin> 74,2 75,2 76,1 77,1.25-1.5 78,3 79,1
80,0.75 81,0.45-0.5 82,3<with c. 12 longitudinal rows of peltate hairs>
83,5

So, I want to read the data for each species as a single long line, that is, by removing the carriage returns, so that I can parse them as separate attributes. For exemple, given the first lines of the first record (for E. australasicum):

1,2-10 2,2 3,2 4,2-6.5 5,1.3-2.2 6,1 7,3-5 8,1 9,2 10,2-10 11,4-6 12,2
13,2 14,14-35 15,2 16,2 17,2-3 18,2.5-3.5 19,1-2 20,4<to broadly
elliptic>

after parsing I should have:

1,2-10
2,2
3,2
4,2-6.5
5,1.3-2.2
6,1
7,3-5
8,1
9,2
10,2-10
11,4-6
12,2
13,2
14,14-35
15,2
16,2
17,2-3
18,2.5-3.5
19,1-2
20,4<to broadly elliptic>

(Notice that text between '<' and '>' are to be treated as start and end quotes and should not be changed.)

 As a first attempt at tackling this vexing problem, I wrote the lame lines of code below:

Code: Pascal  [Select][+][-]
  1. program ReadDescriptions;
  2.  
  3. {$APPTYPE CONSOLE}
  4. {$MODE DELPHI}
  5.  
  6. uses
  7.         Classes, SysUtils, StrUtils;
  8.  
  9. var
  10.         infile: TextFile;
  11.         s: string;
  12.         a: TStringArray;
  13.         i: integer;
  14.         recs: integer = 0;
  15.        
  16. function ParseAttributes(Line: string): TStringArray;
  17. var
  18.         token: string;
  19.         arr: TStringArray;
  20. begin
  21.         token := StringReplace(StringReplace(Line, #10, #32, [rfReplaceAll]), #13, #32, [rfReplaceAll]);
  22.         token := Trim(DelSpace1(token));
  23.         arr := token.Split(' ', '<', '>');
  24.         Result := arr;
  25. end;
  26.        
  27. begin
  28.         AssignFile(infile, 'items_erio');
  29.         Reset(infile);
  30.         recs := 0
  31.     while not EoF(infile) do begin
  32.                 ReadLn(infile, s);
  33.                 s := Trim(s);
  34.                if Length(s) > 0 then begin
  35.                         if s[1] = '#' then begin
  36.                           Inc(recs);
  37.                            WriteLn(s);
  38.             end else
  39.                          a := ParseAttributes(s);
  40.                         for i := 0 to Length(a) - 1 do
  41.                                 WriteLn(a[i]);
  42.         end;
  43.     end;        
  44.         WriteLn(recs, ' records read');
  45. end.
  46.  

This code works, but does not strip the carriage returns from each line, so data like 20,4<to broadly
elliptic> are being parsed separately as '20,4<to broadly' and 'elliptic>'.

Could you wizards give me a hand with this?

Thanks in advance!
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

440bx

  • Hero Member
  • *****
  • Posts: 5302
Re: Reading a complex text file in Pascal
« Reply #40 on: October 26, 2019, 11:01:45 pm »
This code works, but does not strip the carriage returns from each line, so data like 20,4<to broadly
elliptic> are being parsed separately as '20,4<to broadly' and 'elliptic>'.

Could you wizards give me a hand with this?

Thanks in advance!
I'm going to suggest a "solution".  Note that solution is in quotes which means it is going to be an undesirable, not particularly elegant hack.  With that out of the way, here is the "solution":

1. You need to figure out if the number of "<" and ">" are equal.  If they appear the same number of times, it means the text between < and > is not broken.  That is the simple case, which your code already handles.

2. if the number of < and > is not equal then you need to split the input string into 2 pieces.  The first piece goes from the beginning of the line to the first space that occurs before the last "<".  You can use strrscan to determine the location of the last <, then another strrscan - starting at the location to determine the location of the last space that occurred before the <.  That gives you the beginning of the token that is broken in two lines.

The first piece goes from the beginning of the string to the last character before the space.  Copy that into a temp string and pass that to your ParseAttributes function.  Save a copy of the remainder of the line into another temp string.

3. read the next line and "glue" the remainder of the line you saved in the above step to the line you just read then repeat the whole process starting at step 1 (don't forget to add a space between the remainder and the new line.)

If that sounds more complicated than it should be, you are right.  That is a problem and, likely not the only one, you'll face as a result of not parsing the input file a character at a time but, that horse is dead.

Quick and dirty hacks often result in long and painful sequences of code.  The above is an example of that.

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v4.0rc3) on Windows 7 SP1 64bit.

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #41 on: October 26, 2019, 11:54:41 pm »
@440bx

I have written a program more or less along the lines you suggested. It works as expected, handling correctly the carriage returns/end of lines. But it looks ugly, and seems difficult to integrate with my already existing routine for reading the descriptions.

Here it is:

Code: Pascal  [Select][+][-]
  1. program Attributes;
  2.  
  3. {$APPTYPE CONSOLE}
  4. {$MODE DELPHI}
  5.  
  6. uses
  7.         Classes, SysUtils, StrUtils;
  8.        
  9. const
  10.         itemDirective   = '*ITEM DESCRIPTIONS';
  11.  
  12. var
  13.   Infile: TextFile;
  14.   itemsFile, TxtLine, Name, junk, value, token: string;
  15.   number, itemCount: Integer;
  16.   itemsFound, isComment: Boolean;
  17.   ch, aux: Char;
  18.  
  19. function Frequency(const C: char; const S: string): integer;
  20. var
  21.   i: Integer;
  22. begin
  23.   result := 0;
  24.   for i := 1 to Length(S) do
  25.     if S[i] = C then
  26.       inc(result);
  27. end;
  28.  
  29. Begin
  30.   itemCount := 0;
  31.   itemsFound := False;
  32.   isComment := False;
  33.   token := '';
  34.   junk  := '';
  35.   value := '';
  36.  
  37.   itemsFile := 'items_erio';
  38.  
  39.   { Open the items file }
  40.   If FileExists(itemsFile) Then
  41.   Begin
  42.                 AssignFile(Infile, itemsFile);
  43.                 Reset(Infile);
  44.   End
  45.   Else Begin
  46.                 WriteLn('File Not Found');  { File not found }
  47.                 Exit;
  48.   End;
  49.  
  50.   While (Not itemsFound) And Not EoF(Infile) Do
  51.   Begin
  52.     ReadLn(Infile, TxtLine);
  53.    
  54.     { Makes sure that the '*ITEM DESCRIPTIONS' directive
  55.     is present in the file }
  56.     If (Pos(itemDirective, TxtLine) > 0) Then
  57.       itemsFound := True;
  58.  
  59.     { Will run along the ITEMS file until it sees a '#'
  60.     then parse and read in the item descriptions }
  61.     If (itemsFound) Then
  62.         Begin    
  63.       While Not EoF(Infile) Do
  64.       Begin
  65.         Read(Infile, ch);
  66.        
  67.         { Skip the item name }
  68.         If (ch = '#' ) And (Not isComment) Then
  69.                 Begin
  70.           Inc(itemCount);
  71.           Read(Infile, ch);
  72.           ReadLn(Infile, Name);
  73.                   WriteLn('==============================');
  74.                   WriteLn(Name);
  75.                   WriteLn('==============================');
  76.         End;
  77.        
  78.         Case ch Of
  79.           '<': isComment := True;
  80.           '>': isComment := False;
  81.         End;
  82.        
  83.         If ((ch = ' ') Or (ch = #10) Or (ch = #13))
  84.            And (Not isComment)
  85.         Then Begin        
  86.                   token := StringReplace(token, #10, #32, [rfReplaceAll]);
  87.                   token := StringReplace(token, #13, #32, [rfReplaceAll]);
  88.           token := Trim(token);
  89.                   token := DelSpace1(token);
  90.          
  91.           If (Frequency(',', token) = 1) Then
  92.                   Begin
  93.             aux := token[Pos(',', token) + 1];
  94.             If (aux In ['0'..'9']) Then
  95.                         Begin
  96.                           junk  := ExtractDelimited(1, token, [',']);
  97.               value := Copy(token, Pos(',', token) + 1, Length(token));
  98.             End Else
  99.                         Begin
  100.               junk  := ExtractDelimited(1, token, ['<']);
  101.               value := Copy(token, Pos('<', token), Length(token));
  102.             End;
  103.           End Else
  104.                   Begin
  105.             junk  := ExtractDelimited(1, token, ['<']);
  106.             value := Copy(token, Pos('<', token), Length(token));
  107.           End;
  108.          
  109.           If (Frequency(',', junk) > 0) Then
  110.                   Begin
  111.             junk  := ExtractDelimited(1, token, [',']);
  112.             value := Copy(token, Pos(',', token) + 1, Length(token));
  113.           End;
  114.          
  115.           If (Frequency('<', junk) > 0) Then Begin
  116.             value := Concat(Copy(junk, Pos('<', junk), Length(token)), value);
  117.             junk  := ExtractDelimited(1, junk, ['<']);
  118.           End;
  119.          
  120.           number := StrToIntDef(junk, 0);
  121.           If (number > 0) Then
  122.                         WriteLn(number, ' : ', value);
  123.           token := '';
  124.         End;
  125.         token := Concat(token, ch);
  126.       End;
  127.     End;
  128.   End;
  129.   CloseFile(Infile);
  130.  
  131.   { Return the number of items in the file, otherwise an error code }
  132.   If itemsFound
  133.   Then WriteLn(itemCount, ' items read')
  134.   Else WriteLn('Invalid Items File');  { Invalid file }
  135. End.  { ReadAttributes }
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Reading a complex text file in Pascal
« Reply #42 on: October 27, 2019, 12:04:23 am »
Hi!

Solution: Dont treat every single line but concat the data of each record to one string.
Strings can be of any size and behave in this range (some hundred char) normal.
(Take care with strings > 1 mio chars: they get awfull slow)

Before we treat the next record we split the long string in one job.

Here we go:

Code: Pascal  [Select][+][-]
  1. var all: String='';
  2.  
  3.  while not EoF(infile) do begin
  4.                 ReadLn(infile, s);
  5.                 s := Trim(s);
  6.                if Length(s) > 0 then begin
  7.                           if s[1] = '#' then begin
  8.                           Inc(recs);
  9.                            WriteLn(s);
  10.                     if length(all) > 0 then
  11.                                  begin
  12.                                    a := ParseAttributes(all);
  13.                                   for i := 0 to Length(a) - 1 do    WriteLn(a[i]);
  14.                                   all := '';
  15.                              end;
  16.        
  17.                  end else
  18.                         all := all + s;
  19.         end;
  20.     end;     // while
  21. // data of last record still in "all"
  22.  if length(all) > 0 then
  23.                                  begin
  24.                                    a := ParseAttributes(all);
  25.                                   for i := 0 to Length(a) - 1 do    WriteLn(a[i]);
  26.                                 end;
  27.      
  28.         WriteLn(recs, ' records read');


I never worked on punch cards but a friend of mine on a DEC PDP-7. We  used the backside of the cards  as memos. And got allways stuck with the pen in the holes ....

Winni

maurobio

  • Hero Member
  • *****
  • Posts: 640
  • Ecology is everything.
    • GitHub
Re: Reading a complex text file in Pascal
« Reply #43 on: October 27, 2019, 12:30:14 am »
Hi, Winni!

Thank you very much! I will try your code.

Working with punched cards was a sure way to get one fully addicted to coffee.  ;)

Cheers,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 3.8 - FPC 3.2.2 on GNU/Linux Mint 19.1/20.3, Windows XP SP3, Windows 7 Professional, Windows 10 Home

MarkMLl

  • Hero Member
  • *****
  • Posts: 8393
Re: Reading a complex text file in Pascal
« Reply #44 on: October 28, 2019, 09:57:46 am »
Working with punched cards was a sure way to get one fully addicted to coffee.  ;)

And that was back in the era when the best that could be expected was murk from vending machine, and while customers were usually more friendly Burroughs certainly expected you to pay for your own refreshments.

I agree with what 440bx has said about parsers and horses. One approach might be to automatically merge the next card whenever you find yourself reading the start of a chevron-quoted string.

Another might be to pre-scan the entire file so that you can note the start of each record into an array or temporary index file. That would work even if the raw file was large, and would allow you to read each record (i.e. header+data cards) into a string before trying to process it without having to make any assumptions about whether it was correctly formatted.
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018