Recent

Author Topic: Text based log file parser  (Read 4567 times)

Rayvenhaus

  • Jr. Member
  • **
  • Posts: 70
Text based log file parser
« on: January 16, 2017, 03:50:18 pm »
I've searched high and low looking for an answer to this. Which I have found a ton of specialized parsers online, either I just don't understand them or I can't make them do what I need. I'm writing a Log Analyzer program.  Here's an example of one of the log files I am analyzing:

Quote
  04-Nov-2016 22:18:24 Begin v5.101/16.02.2009, 19:33 (Winter)
  04-Nov-2016 22:18:24 Running under: Windows NT 6.1 (build: 7601, Service Pack 1)
^ 04-Nov-2016 22:18:24 Connect From 75.100.226.242 #24554
  04-Nov-2016 22:18:24 Establishing BinkP transfer protocol
  04-Nov-2016 22:18:24      M_NUL : OPT LST UTF
= 04-Nov-2016 22:18:25    Station : Time Warp of the Future BBS, TCP/IP 1
= 04-Nov-2016 22:18:25    Address : 10:10/1
: 04-Nov-2016 22:18:25      SysOp : Robert E Starr JR from Cougar, WA.  USA
: 04-Nov-2016 22:18:25     Number : time.synchro.net
: 04-Nov-2016 22:18:25      Flags : CM,IBN,ITN,IFC,IFT,IHT,ISE,IMI,INT,TCP,TEL,VMP,BND
= 04-Nov-2016 22:18:25       Time : Sat, 05 Nov 2016 02:18:28 -0700
  04-Nov-2016 22:18:25 Non-password session
  04-Nov-2016 22:18:25 LIST
  04-Nov-2016 22:18:25 Handshake time - 1 seconds
  04-Nov-2016 22:18:25 Nothing for them
  04-Nov-2016 22:18:25 Aborting due to carrier loss
* 04-Nov-2016 22:18:25 Session aborted
  04-Nov-2016 22:18:25 Session traf: in: 0 (0b) out: 0 (0b) [287b/275b]
  04-Nov-2016 22:18:25 Session time: 00:00:01
  04-Nov-2016 22:18:25 End
  04-Nov-2016 22:19:44 Begin v5.101/16.02.2009, 19:33 (Winter)
I need to open the file, read in a line of text and parse it.  For example, this line "^ 04-Nov-2016 22:18:24 Connect From 75.100.226.242 #24554" tells me that the system connected to the address show.  So, I want to find the line with the "^" and then grab the Address and store it. The next one is a line that starts with a "=" and that contains the Name of the system, etc, etc, etc.  I've got the majority of my application working, I can read log files and display them, the next step is the parsing of the logs.

Can anyone help me get started?

derek.john.evans

  • Guest
Re: Text based log file parser
« Reply #1 on: January 16, 2017, 05:00:53 pm »
StrUtils.ExtractSubstr is a good general purpose token extractor.

Code: Pascal  [Select][+][-]
  1. procedure Parse(const A: string; out AType, ADate, ATime, AMessage: string);
  2. var
  3.   LPos: integer;
  4. begin
  5.   if Length(A) > 0 then begin
  6.     LPos := 1;
  7.     AType := ExtractSubstr(A, LPos, [' ']);
  8.     ADate := ExtractSubstr(A, LPos, [' ']);
  9.     ATime := ExtractSubstr(A, LPos, [' ']);
  10.     AMessage := Copy(A, LPos, MaxInt);
  11.   end;
  12. end;
  13.  

molly

  • Hero Member
  • *****
  • Posts: 2330
Re: Text based log file parser
« Reply #2 on: January 17, 2017, 02:52:08 am »
Hi Ravenhaus,

You wrote that you already have your log read and displayed.

How is the log stored into memory ? using a memo or perhaps a stringlist, or did you create your own custom (record) list with entries ?

If you wish to parse the log file, then what would you like to do with the parsed information ?

Are there much entries that need parsing, and how do you wish to store the parsed information. For example if you already have the log displayed in a memo, then would you like to have something like a find/find next entry or do you prefer to extract those lines and store them somewhere else ?

As you can see for yourself , more questions then answers :-)

Geepster is correct in that a line is easily parsed, especially when log entries are having a somewhat fixed format. In that case it is just matter of counting (and copying) a bunch of characters, recognizing those line you are interested in and extract the relevant information.

In case i would have to analyze such log and know there are many entries that i am interested in (and be able to reference that information), i would start with defining a structure which can hold all relevant information, so for example:

Code: Pascal  [Select][+][-]
  1. Type
  2.   TConnection = record
  3.     IP : Int64;
  4.     Port : Word;
  5.   end;
  6.  
  7.   TSession = record
  8.     Session_Start : TDateTime;
  9.     Session_End : TDateTime;
  10.     Connection : TConnection;
  11.    // etc etc.
  12.   end;
  13.  
  14. var
  15.   Sessions : Array of TSession;
  16.  

And fill in the array, based on what was parsed from the log. You could then use the array to search, or sort things, store it to disk again in your own format etc. In case more functionality is wanted then turn things into class(es).

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Text based log file parser
« Reply #3 on: January 17, 2017, 09:26:40 am »
I would design it like this:
Read the content of the log into a :MEMORY: Sqlite database with [edit]three fields: indicator, datetime, content, split on the first space and second space.
You can then use SQL statements like "select * from ... where indicator = "^", etc. to query the log file.
Should not be too hard to do and should be quite fast.

Advantages:
- Minimum code, I estimate around 200 lines.
- Querying the log is in plain SQL, that you probably already know.

Another solution that I use is to use RegEx to parse the log. If you are proficient with regular expressions.
That is probably slower, though
« Last Edit: January 17, 2017, 12:04:25 pm by Thaddy »
Specialize a type, not a var.

Rayvenhaus

  • Jr. Member
  • **
  • Posts: 70
Re: Text based log file parser
« Reply #4 on: January 17, 2017, 02:01:28 pm »
Hi Ravenhaus,

You wrote that you already have your log read and displayed.

How is the log stored into memory ? using a memo or perhaps a stringlist, or did you create your own custom (record) list with entries ?

I currently use a TStream to open the file and display them in memos on a TabSheet

If you wish to parse the log file, then what would you like to do with the parsed information ?

Are there much entries that need parsing, and how do you wish to store the parsed information. For example if you already have the log displayed in a memo, then would you like to have something like a find/find next entry or do you prefer to extract those lines and store them somewhere else ?

Yes, there can be a lot of entries in a log file, depending on that day's activity.  First thing I want to do is to display highlights from the log file on a label for example.  Going forward I will be usings a database of some kind to store connection, and other, info for statistics. Once I have the database set up, then I will be generating status reports daily.

As you can see for yourself , more questions then answers :-)

Geepster is correct in that a line is easily parsed, especially when log entries are having a somewhat fixed format. In that case it is just matter of counting (and copying) a bunch of characters, recognizing those line you are interested in and extract the relevant information.

In case i would have to analyze such log and know there are many entries that i am interested in (and be able to reference that information), i would start with defining a structure which can hold all relevant information, so for example:

Code: Pascal  [Select][+][-]
  1. Type
  2.   TConnection = record
  3.     IP : Int64;
  4.     Port : Word;
  5.   end;
  6.  
  7.   TSession = record
  8.     Session_Start : TDateTime;
  9.     Session_End : TDateTime;
  10.     Connection : TConnection;
  11.    // etc etc.
  12.   end;
  13.  
  14. var
  15.   Sessions : Array of TSession;
  16.  

And fill in the array, based on what was parsed from the log. You could then use the array to search, or sort things, store it to disk again in your own format etc. In case more functionality is wanted then turn things into class(es).

I'll look into the record type and see what info I can dig up.

Thanks for the info and the help.

 

TinyPortal © 2005-2018