What intermediate dataformat should I use to analyse a CSV file?

Bart

Hero Member
Posts: 5290

What intermediate dataformat should I use to analyse a CSV file?

« on: February 15, 2020, 03:33:17 pm »

Hi,

I have data in the form of a CSV file.
The file has data like:
Age, Total days of admittence, Diagnosis, Barthel index, Destination after discharge, ...

Now I would like to do some statistics on the data, like calculate the percentage of patients admitted with diagnosis "CVA", calculate the percentage of destination="Home" with patient with diagnosis="CVA", calculate the mean Barthel index of patients with diagnosis="CVA" and destination="Home" versus destination<>"Home".

Complication is that not all fields are known for all patients (the field is left blank in this case).

Parsing the CSV file is not a real problem.

My main question is: what kind of internal data structure should I use, so that I can easily "query" it for questions like the above?

And before anyone asks: No, I don't have access to a statistical analysis program (SPSS or the like), neither do I have any database engine installed on the machine in question.
(The data are exported form an Excel file, I don't even have access to MS Access. I work in a nursing home and there are simply no funds for that.)

The dataset is currently limited to about 160 patients, so I could do this all by hand, but the same excercise will have to be done next year (on appr. the same amount of data).
And of course, since I like programming, it will be a nice excercise for me.

Bart

Logged

jamie

Hero Member
Posts: 6131

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #1 on: February 15, 2020, 03:47:15 pm »

There is a TCSVDataSet on the Data access tab.

Drop a TDataSource on the form and some sort of TDBControlxxx? to view the data ?

You do need to know your Field names so you can create the names for the dataset

« Last Edit: February 15, 2020, 03:49:09 pm by jamie »

Logged

The only true wisdom is knowing you know nothing

fred

Full Member
Posts: 201

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #2 on: February 15, 2020, 03:52:18 pm »

My first thought would be a Sqlite database, ZMSQL or JCSV (Jans CSV Components) and run select count etc.

Logged

Bart

Hero Member
Posts: 5290

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #3 on: February 15, 2020, 04:13:00 pm »

Quote from: fred on February 15, 2020, 03:52:18 pm

My first thought would be a Sqlite database, ZMSQL or JCSV (Jans CSV Components) and run select count etc.

No such software is available on the machine in question, nor will it be installed on my request.

Bart

Logged

wp

Hero Member
Posts: 11923

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #4 on: February 15, 2020, 04:30:47 pm »

Quote from: Bart on February 15, 2020, 04:13:00 pm

Quote from: fred on February 15, 2020, 03:52:18 pm
My first thought would be a Sqlite database, ZMSQL or JCSV (Jans CSV Components) and run select count etc.

No such software is available on the machine in question, nor will it be installed on my request.

Bart

This is not "software" in the sense of external programs - they are just Lazarus packages, except for Sqlite which is a dll to be distributed along with your exe in its folder. If you are allowed to add an exe to the user's computer then you should also be allowed to add sqlite3.dll.

I agree that SQL would be the easiest way to calculate the percentages under some conditions. Otherwise you must put the data into sorted arrays and count yourself - but this should not be too difficult either.

Logged

fred

Full Member
Posts: 201

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #5 on: February 15, 2020, 04:37:09 pm »

Since the data is about 160 rows even loading it in a stringgrid and running some for loops would be fast enough.

Logged

winni

Hero Member
Posts: 3197

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #6 on: February 15, 2020, 04:48:50 pm »

fred was faster ..

Hi!

I would use a StringGrid with only 160 rows.

To make it reusable make some hardcoded functions which could be used next year.

Winni

Logged

jamie

Hero Member
Posts: 6131

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #7 on: February 15, 2020, 07:38:56 pm »

Since this is nothing more then comma text, you can load the complete source into a TStringList..

each item would be the record...

To split the record per item you can use SPLIT or another TStringList.CommaText := TheMainStringList.Item[?];

From that point on, the secondary Stringlist will have the fields broken down, one per item.

Logged

The only true wisdom is knowing you know nothing

Bart

Hero Member
Posts: 5290

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #8 on: February 15, 2020, 10:18:04 pm »

Parsing the CSV is not he problem.

@All: thanks for the suggestions so far.

Bart

Logged

af0815

Hero Member
Posts: 1291

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #9 on: February 15, 2020, 10:24:56 pm »

The first question for me is always.

How much i have today (you write 160)
How much i have in 10 years ( :-) )

How much datafields may i have and how much memory i have to spend.

If you have no problem in 10 yrs with the size of data then with the memory from today -> use collections if you do not want external db.
Collections can be sorted, if you design the compare right - it works more than one level deep. Make a collection with dummydata (10yrs size) and test the speed of sorting and counting.

Logged

regards
Andreas

Bart

Hero Member
Posts: 5290

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #10 on: February 15, 2020, 10:37:32 pm »

Quote from: af0815 on February 15, 2020, 10:24:56 pm

How much i have in 10 years ( :-) )

Well, appr. 1600 then.
If I still do then, what I do now.
It's only data from my own ward.

It's not meant for scientific publication.

In the (distant) future we may gather more data, when we move to a more decent electronic patient dossier.
If we then want to do more substantial work ont that dataset, we'll seek collabaration with one of the universities and dataprocessing will be done in an appropriate application (one not written by me).
But that's just distant dreams (or nightmares??).

Could you give a small example on how you would use collections for such data?
I never used that before.

Bart

Logged

winni

Hero Member
Posts: 3197

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #11 on: February 15, 2020, 10:42:56 pm »

Hi!

I have a StringGrid with 75.000 Rows and 14 Cols.

Sequential searching is <1 second. Sorting is lightning fast - whatever they do internal.

So there is no problem that you could run in time critical situations.

The data on the HD is a simple CSV.

Winni

« Last Edit: February 15, 2020, 10:44:29 pm by winni »

Logged

af0815

Hero Member
Posts: 1291

Re: What intermediate dataformat should I use to analyse a CSV file?

« Reply #12 on: February 15, 2020, 10:47:58 pm »

Quote from: Bart on February 15, 2020, 10:37:32 pm

Could you give a small example on how you would use collections for such data?
I never used that before.

A good starting point is https://wiki.freepascal.org/TCollection

edit:
Something about sorting is here https://forum.lazarus.freepascal.org/index.php?topic=38905.0

« Last Edit: February 15, 2020, 10:56:52 pm by af0815 »

Logged

regards
Andreas

Lazarus

Bookstore

Search

Recent

Author Topic: What intermediate dataformat should I use to analyse a CSV file? (Read 1176 times)

Bart

What intermediate dataformat should I use to analyse a CSV file?

jamie

Re: What intermediate dataformat should I use to analyse a CSV file?

fred

Re: What intermediate dataformat should I use to analyse a CSV file?

Bart

Re: What intermediate dataformat should I use to analyse a CSV file?

wp

Re: What intermediate dataformat should I use to analyse a CSV file?

fred

Re: What intermediate dataformat should I use to analyse a CSV file?

winni

Re: What intermediate dataformat should I use to analyse a CSV file?

jamie

Re: What intermediate dataformat should I use to analyse a CSV file?

Bart

Re: What intermediate dataformat should I use to analyse a CSV file?

af0815

Re: What intermediate dataformat should I use to analyse a CSV file?

Bart

Re: What intermediate dataformat should I use to analyse a CSV file?

winni

Re: What intermediate dataformat should I use to analyse a CSV file?

af0815

Re: What intermediate dataformat should I use to analyse a CSV file?

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook