Best way to parse file.

BSaidus

Hero Member
Posts: 545
lazarus 1.8.4 Win8.1 / cross FreeBSD

Best way to parse file.

« on: January 26, 2023, 12:21:20 pm »

Hello.
Having this simple file content (which is the result of netstat -an -f inet unix command):

Name    Mtu   Network     Address              Ipkts Ifail    Opkts Ofail Colls
lo0     32768 <Link>                               0     0        0     0     0
lo0     32768 ::1/128     ::1                      0     0        0     0     0
lo0     32768 fe80::%lo0/ fe80::1%lo0              0     0        0     0     0
lo0     32768 127/8       127.0.0.1                0     0        0     0     0
pcn0    1500  <Link>      08:00:27:01:c1:ab        0     0        1     0     0
pcn0    1500  192.168.1/2 192.168.1.254            0     0        1     0     0
pcn1*   1500  <Link>      08:00:27:99:f5:80        0     0        0     0     0
em0*    1500  <Link>      08:00:27:17:f0:0f        0     0        0     0     0
em1     1500  <Link>      08:00:27:00:0c:c8        4     0        9     0     0
em1     1500  10.0.5/24   10.0.5.15                4     0        9     0     0
enc0*   0     <Link>                               0     0        0     0     0
pflog0  33136 <Link>                               0     0        0     0     0
 

I wonder which is the best way to parse this file in order to get in organized array of record type:

Code: Pascal [Select][+]

  type 
     if_net = record 
        Name,
        Mtu,
        Network,
        Address,
        Ipkts, 
        Ifail,
        Opkts,
        Ofail,
        Colls : String;     
     end;
 

Thank you.

« Last Edit: January 26, 2023, 12:25:07 pm by BSaidus »

Logged

lazarus 1.8.4 Win8.1 / cross FreeBSD
dhukmucmur vernadh!

marcov

Administrator
Hero Member
Posts: 11446
FPC developer.

Re: Best way to parse file.

« Reply #1 on: January 26, 2023, 12:34:23 pm »

That command doesn't work for me. (on debian 11 and 12, https://bugs.launchpad.net/ubuntu/+source/net-tools/+bug/1915903)

but I would probably do something like:

Code: [Select]



{$mode delphi}
uses classes,sysutils,types;

var n : tstringdynarray;
    s : string;
  anet : if_net;
begin
  n:=s.split([' ',#9]);
  if length(n)>0 then
     begin
       name:=n[0];
      mtu:=n[1];  // etc etc.
    end;
 end.

It might need some fiddling with the split() options (see manual) to get the behaviour for empty fields right, but I can't test due to the bug.

Alternately, you could make the record an array of string, and use properties like

Code: [Select]

  property name : string read strarr[0] write strarr[0];

etc to have human readable names for the fields.

Logged

wp

Hero Member
Posts: 11910

Re: Best way to parse file.

« Reply #2 on: January 26, 2023, 02:50:30 pm »

Probably not the best, but the first one which came to my mind: Read the file into a stringlist and then split the lines at the fixed positions given by the start of the header columns by means of the good-old Copy command.

parse_file.zip (2.82 kB - downloaded 22 times.)

Logged

Warfley

Hero Member
Posts: 1499

Re: Best way to parse file.

« Reply #3 on: January 26, 2023, 03:05:26 pm »

You can just split the string:

Code: Pascal [Select][+]

function read_config(const line: String): if_net;
var
  parts: TStringArray;
  has_address: boolean;
begin
  parts := line.split([' ', #9], TStringSplitOptions.ExcludeEmpty);
  has_address = length(parts) = 9;
  With Result do
  begin
    Name := parts[0];
    Mtu := parts[1];
    Network := parts[2];
    // very lazy way to only load address if the boolean is true
    Address := ifthen(has_address, parts[3], '');
    // ord(boolean) = 1 if true, 0 if false, so it will be offset by 1 if there is the address
    Ipkts := parts[3 + ord(has_address)];
    Ifail := parts[4 + ord(has_address)];
    Opkts := parts[5 + ord(has_address)];
    Ofail := parts[6 + ord(has_address)];
    Colls := parts[7 + ord(has_address)];
  end;  
end;

Logged

GitHub: https://github.com/Warfley

Zvoni

Hero Member
Posts: 2327

Re: Best way to parse file.

« Reply #4 on: January 26, 2023, 03:50:06 pm »

Quote

parts := line.split([' ', #9], TStringSplitOptions.ExcludeEmpty);

Eh? In this Case: NO!
Column Address can be empty, it would mix up the count of columns if you Exclude Empty
Though i see that you adjusted for that. But IMO way too convoluted

I'm with wp's approach with the StringList

« Last Edit: January 26, 2023, 03:51:44 pm by Zvoni »

Logged

One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Warfley

Hero Member
Posts: 1499

Re: Best way to parse file.

« Reply #5 on: January 26, 2023, 03:53:06 pm »

Quote from: Zvoni on January 26, 2023, 03:50:06 pm

Quote
parts := line.split([' ', #9], TStringSplitOptions.ExcludeEmpty);
Eh? In this Case: NO!
Column Address can be empty, it would mix up the count of columns if you Exclude Empty

I'm with wp's approach with the StringList

Yes this is accounted for in the code:

Code: Pascal [Select][+]

  has_address = length(parts) = 9;
...
    // very lazy way to only load address if the boolean is true
    Address := ifthen(has_address, parts[3], '');
    // ord(boolean) = 1 if true, 0 if false, so it will be offset by 1 if there is the address
    Ipkts := parts[3 + ord(has_address)];

if the address is there, it will be loaded into address and all the following lookups will be offset by one, if it isn't there, address will be set to be the empty string (''), and the follwoing fields are read from index 3 onwards.

As long as only Address can be missing, this is probably the easiest way to parse this data, and only 1/3 of the lines of code that are neede with the stringlist appraoch

« Last Edit: January 26, 2023, 03:57:31 pm by Warfley »

Logged

GitHub: https://github.com/Warfley

BSaidus

Hero Member
Posts: 545
lazarus 1.8.4 Win8.1 / cross FreeBSD

Re: Best way to parse file.

« Reply #6 on: January 26, 2023, 04:32:52 pm »

What do you think using RegExpr.
I'm in work now, I will give you feedback soon.
( welcom to any one can help on RegEx).
For the 1st line this regex work well.

Code: Pascal [Select][+]

^(\w+)\s+(\d+)\s+(\W\w+\W)(\w+|\s+)(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)
// works well for this line
lo0     32768 <Link>                               0     0        0     0     0     
// but not for this line :
lo0     32768 fe80::%lo0/ fe80::1%lo0              0     0        0     0     0
 

Logged

lazarus 1.8.4 Win8.1 / cross FreeBSD
dhukmucmur vernadh!

Warfley

Hero Member
Posts: 1499

Re: Best way to parse file.

« Reply #7 on: January 26, 2023, 04:44:35 pm »

Quote from: BSaidus on January 26, 2023, 04:32:52 pm

What do you think using RegExpr.
I'm in work now, I will give you feedback soon.
( welcom to any one can help on RegEx).
For the 1st line this regex work well.
Code: Pascal [Select][+][-]
^(\w+)\s+(\d+)\s+(\W\w+\W)(\w+|\s+)(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)
// works well for this line
lo0 32768 <Link> 0 0 0 0 0
// but not for this line :
lo0 32768 fe80::%lo0/ fe80::1%lo0 0 0 0 0 0

\w is not what you think it is. \w matches "words" but some characters are not words (/ for example).
But an alternative is much simpler, just match spaces vs non spaces:

Code: Pascal [Select][+]

^([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)?\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)

You have a sequence of non spaces followed by a sequence of spaces, this repeated 4 times, where the last one is optional, and then the a sequence of spaces, followed by a sequence of digits and this 5 times.

You can even make this shorter with the repeat syntax:

Code: Pascal [Select][+]

^(([^\s]+)\s+){3,4}(\s+(\d+)){5}

Logged

GitHub: https://github.com/Warfley

BSaidus

Hero Member
Posts: 545
lazarus 1.8.4 Win8.1 / cross FreeBSD

Re: Best way to parse file.

« Reply #8 on: January 26, 2023, 04:54:00 pm »

@Warfley Thank you,

you're a man.
Thank you all for help.
This is good for me, I'll proceed with RegEx.
Thnks.

Logged

lazarus 1.8.4 Win8.1 / cross FreeBSD
dhukmucmur vernadh!

Warfley

Hero Member
Posts: 1499

Re: Best way to parse file.

« Reply #9 on: January 26, 2023, 09:14:30 pm »

Just out of fun, and because I am currently working with GOLD and wanted to try a bit stuff out, I've made a GOLD grammar for building an LALR parser for reading this file:

Code: C [Select][+]

"Start Symbol" = <Table>
 
{ItemChar} = {Printable} - {Whitespace}
{NoNu} = {ItemChar} - {Number}
 
{Whitespace Ch} = {Whitespace} - {CR} - {LF}
 
Whitespace = {Whitespace Ch}+
Newline = {CR}{LF} | {CR} | {LF}
DataField = {ItemChar}*{NoNu}{ItemChar}*
NumberField = {Number}+
 
<Table> ::= <Header> NewLine <TableEntries>
 
<Header> ::= 'Name' 'Mtu' 'Network' 'Address' 'Ipkts' 'Ifail' 'Opkts' 'Ofail' 'Colls'
 
<TableEntries> ::= <TableEntry> NewLine <TableEntries>
                |
 
<TableEntry> ::= <Name> <MTU> <Network> <Address> <IPkts> <IFail> <OPkts> <Ofail> <Colls>
 
<Name> ::= DataField
 
<MTU> ::= NumberField
 
<Network> ::= DataField
 
<Address> ::= DataField
           |
 
<IPkts> ::= NumberField
 
<IFail> ::= NumberField
 
<OPkts> ::= NumberField
 
<Ofail> ::= NumberField
 
<Colls> ::= NumberField

So if you want to completely overengineer this thing, you can use this

Logged

GitHub: https://github.com/Warfley

Kays

Hero Member
Posts: 574
Whasup!?

Re: Best way to parse file.

« Reply #10 on: January 27, 2023, 12:56:50 am »

Quote from: BSaidus on January 26, 2023, 12:21:20 pm

I wonder which is the best way to parse this file in order to get in organized array of record type:

If it is current information retrieved from the same host machine your program is running:

The best way is not to parse it at all. The data you want as a record structure is already there in memory – albeit as numbers, not strings. netstat “simply” converts the numbers into human-readable form, i.e. (primarily) for consumption by humans.

Calling truss reveals what’s going on there:

Code: Bash [Select][+]

truss netstat -an -f inet

In particular there is the system call

Code: Text [Select][+]

__sysctlbyname("net.inet.tcp.pcblist" […]

I recommend to retrieve the net.inet.*.pcblist system control values by yourself and read the data as they are already present.

Now, I’ll need to check out the netstat source code, too, and I admit it’s not as easy as reading a bunch of strings, but it’s definitely the best way to obtain said data.

Logged

Yours Sincerely
Kai Burghardt

440bx

Hero Member
Posts: 4014

Re: Best way to parse file.

« Reply #11 on: January 27, 2023, 08:32:41 am »

if you don't mind writing code that actually parses the input, it could be done by parsing only the title line and saving the offsets to the start/end of the field (depending on the field.)

In the example you posted, 0 based offsets for each field are: 0, 8, 14, 26 for the first 4 fields and 51, 57, 66, 72, 79, the following presumes those field offsets do not change between lines.

Save those offsets and the pointer to the first field (which is what follows the title's line terminator.) Note also that each line seems to be constant length and appears to be the title length.

Now you have the offsets to each field.

The first 4 fields end at the first space found on or after the offset, replace that location with a null and now you have a null terminated string whose pointer you can save. The last 5 fields end at the next character (replace that space with a null) and start one character after the first space that precedes them (scan backwards for the first space and save the address of the next character).

IOW, instead of having a record of string, you have a record of pointers to null terminated strings. It may sound a bit complicated but it's actually quite simple and it will be as fast as possible because parsing is only done once (at least for the first 4 fields) and values are not moved around in memory (pointer to existing values are saved instead.)

Lastly, commenting that is the way you're going about can make the "algorithm" used quite obvious. I don't know if that is the best way to parse the file but, it definitely is one of the fastest (if not the fastest)

HTH.

Logged

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

BSaidus

Hero Member
Posts: 545
lazarus 1.8.4 Win8.1 / cross FreeBSD

Re: Best way to parse file.

« Reply #12 on: January 27, 2023, 03:49:49 pm »

Quote from: marcov on January 26, 2023, 12:34:23 pm

That command doesn't work for me. (on debian 11 and 12, https://bugs.launchpad.net/ubuntu/+source/net-tools/+bug/1915903)

Hi, I executed the command in OpenBSD OS.

Logged

lazarus 1.8.4 Win8.1 / cross FreeBSD
dhukmucmur vernadh!

KodeZwerg

Hero Member
Posts: 2050
Fifty shades of code.

Re: Best way to parse file.

« Reply #13 on: January 27, 2023, 03:55:05 pm »

I think your way of doing is just wrong.
When you can not gather that information by code, what purpose shall it have?
In my thinking, when I am not able to get information by code, I would simple display output of console in my app somewhere.

Logged

« Last Edit: Tomorrow at 31:76:97 xm by KodeZwerg »

Warfley

Hero Member
Posts: 1499

Re: Best way to parse file.

« Reply #14 on: January 27, 2023, 04:21:47 pm »

Quote from: KodeZwerg on January 27, 2023, 03:55:05 pm

I think your way of doing is just wrong.
When you can not gather that information by code, what purpose shall it have?
In my thinking, when I am not able to get information by code, I would simple display output of console in my app somewhere.

Thats the unix philosophy, instead of having to have one app with all the functionality, you have multiple specialized programs that do one thing very good and whose output can be reused by other programs. Just for comparison, the "ifconfig" program has 1000 lines of code. So when you need that functionality in your program, you can decide if you want to have a few lines of calling ifconfig, or you want to implement all of this yourself.

This is also very important for rights management. For example to send ICMP requests, you need root rights, but you might not want to give all applications that need this root rights. So they call the system applcations "ping", "traceroute", etc., which have the required rights, but are much easier to keep track of any security issues because these applications basically just do one thing. No use input or anything else.

Another example where this is useful is for things like reading out hardware information can be quite annoying. Linux systems usually provide pseudofiles for this, but every distro might choose to put the pseudo files into another directory. Using the system tools provided (like ip addr, netstat, ifconfig, etc.) can provide the information in a uniform manner.

And the main advantage of this is, that it is very simple to debug, as all of these programs give the data in both human and machine readable form, you can debug your APIs by simply looking at the program output.

So there are a lot of reasons to do this. It's one of the Windows deseases that Microsoft thought that everything must be accissable through code APIs and DLLs whose calls must be implemented in each program that tries to use them. By having different programs provide the data in both human and machine readable form, it is much easier to get access to that data and to learn how to use it.

Logged

GitHub: https://github.com/Warfley

Lazarus

Bookstore

Search

Recent

Author Topic: Best way to parse file. (Read 4853 times)

BSaidus

Best way to parse file.

marcov

Re: Best way to parse file.

wp

Re: Best way to parse file.

Warfley

Re: Best way to parse file.

Zvoni

Re: Best way to parse file.

Warfley

Re: Best way to parse file.

BSaidus

Re: Best way to parse file.

Warfley

Re: Best way to parse file.

BSaidus

Re: Best way to parse file.

Warfley

Re: Best way to parse file.

Kays

Re: Best way to parse file.

440bx

Re: Best way to parse file.

BSaidus

Re: Best way to parse file.

KodeZwerg

Re: Best way to parse file.

Warfley

Re: Best way to parse file.

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook