Hi,
I have data in the form of a CSV file.
The file has data like:
Age, Total days of admittence, Diagnosis, Barthel index, Destination after discharge, ...
Now I would like to do some statistics on the data, like calculate the percentage of patients admitted with diagnosis "CVA", calculate the percentage of destination="Home" with patient with diagnosis="CVA", calculate the mean Barthel index of patients with diagnosis="CVA" and destination="Home" versus destination<>"Home".
Complication is that not all fields are known for all patients (the field is left blank in this case).
Parsing the CSV file is not a real problem.
My main question is: what kind of internal data structure should I use, so that I can easily "query" it for questions like the above?
And before anyone asks: No, I don't have access to a statistical analysis program (SPSS or the like), neither do I have any database engine installed on the machine in question.
(The data are exported form an Excel file, I don't even have access to MS Access. I work in a nursing home and there are simply no funds for that.)
The dataset is currently limited to about 160 patients, so I could do this all by hand, but the same excercise will have to be done next year (on appr. the same amount of data).
And of course, since I like programming, it will be a nice excercise for me.
Bart