Recent

Author Topic: How can I sort String Column Numerically  (Read 2963 times)

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #15 on: July 25, 2024, 02:31:38 am »
Quote
However, TCSVDataset conveniently has another method named LoadCSVFromFile that does leave the definitions intact when loading the data.
Yes, but it does not read the non-string fields.
What do I have to do to make it read the non-string fields?

Quote
In the attachment (csv_fielddefs_3) there is a modified version of the previous demo: At first it should be noted that using LoadCSVFromFile the FileName field must be empty because this locks the file for opening with LoadCSVFromFile.
Ok, understood and thank you for explaining the *reason why*

Quote
Again, the demo creates FieldDefs with an integer and a string field and opens the dataset; the DBGrid is empty initially as there are not data yet. The field types are displayed correctly, however.
I tested your code and yes this is the behaviour.

Quote
Now type in some data and exit the application - because there is no FileName, the records must be saved manually by calling SaveToCSVFile in the OnDestroy handler of the form.
Ah, Thank you again I just learnt something useful.

Quote
Open the generated data file ("data.csv") in an external editor and convince yourself that the data are stored correctly. Close the editor, and run the application again. Now the data file exists and is read via LoadFromCSVFile. The data types are still correct, but you'll notice that the integer column is empty now!
Yes, The data is stored correctly. The data types are correct but the integer column is empty. I have attached a screenshot.

Quote
As I said before: The TCSVDataset is useful only with string fields.
True and agreed. If I go through the TBufDataset source and carefully study how they have done it. How likely is it I may be able to implement the same behaviour in a TCSVDataset? Might be possible?

Quote
If you want to read non-string fields from a CSV file I would use a different type of dataset, maybe TDbf or TBufDataset. In the attachment (csv_fielddefs_4) you find a small application how to do this. The essential part is that you must write a reader for the CSV data yourself, but this is easy - the code shows a way using classic Pascal file access. In case of more complex CSV files (e.g. line breaks in the text fields) I'd recommend to use the TCSVDocument.
Thank you very much @wp I just learnt a whole lot of extremely useful stuff.
« Last Edit: July 25, 2024, 03:17:27 am by Aruna »
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #16 on: July 25, 2024, 02:43:16 am »
@wp:
I understand your reply with alternative solutions (very much appreciated btw) but I have a hunch that the objective of Aruna is not to work around the issue rather to grasp the current situation wrt TCVSDataSet (and see if it perhaps can be improved).
Yes very much appreciate the time and effort @wp spent putting together the alternative solutions. @TRon you would be right I am simply trying to give TCVSDataset the same functionality that TBufDataset has. Unless there is a reason not to do so? I am trying to familiarise myself with the TCVSDataset source first then TBufDataset source to see if I may be able to implement the same functionality in TCVSDataset where non-string fields will respect FieldDefs types.

Quote
I am completely on par with your view that TCVSDataSet has its limitations (hence why I never use it myself) not in the least because it committed itself to a certain RFC.
Which RFC? I am curious.
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #17 on: July 25, 2024, 04:08:43 am »
Which RFC? I am curious.
That would be RFC 4180 CSV.

If you look for that in the source-code then you can find that mentioned.

fwiw: the dataset itself has all the functionality required to do what you want. E.g. you can create a database using all types of available fielddefs (date, string, integer, float) and the TCVSDataSet component will sort accordingly and as intended.

Where things seem to go wrong imho is the reading of the actual data from the file. IMO It should respect the fielddefs (and not "overwrite" them). Perhaps a separate import method or similar could be a possible solution (perhaps a class helper).

Once loaded into memory it is normal that the data in memory does not change its (storage) type so converting things at runtime is not the road to take as that would not be normal database behaviour (whatever is considered normal).

SQL databases for example can influence this by casting/converting the fields from a query to another type (that is a feature of the SQL language). In such cases if you want to change the type then one can simply change the query to match expectations. CVSDataset however does not have such luxury.

If you want more detailed information then I would have to dive even deeper into the sources of CVSDataset but in all honestly I personally find that a bit of a waste of time as it is pretty easy to import a CSV to f.e. a sqlite table though I completely understand your reasoning why you want to improve things (and if possible I can certainly try and see what can be done to help).
All software is open source (as long as you can read assembler)

Zvoni

  • Hero Member
  • *****
  • Posts: 2614
Re: How can I sort String Column Numerically
« Reply #18 on: July 25, 2024, 08:35:41 am »
Maybe as a workaround (and yes: i'm aware that this workaround is basically nonsense for just sorting numeric data):
Load the CSV as you're used to, throw the records into an "InMemory"-SQLite-Database, query from there, and done.

Otherwise, the only way i see it possible to sort numeric Data in numerical form would be to leftPad the numerical Strings with Zeros (Hidden Column?)
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #19 on: July 26, 2024, 02:34:51 pm »
Which RFC? I am curious.
That would be RFC 4180 CSV.
Thank you.
Quote
If you look for that in the source-code then you can find that mentioned.
I will search for it in the source.

Quote
fwiw: the dataset itself has all the functionality required to do what you want. E.g. you can create a database using all types of available fielddefs (date, string, integer, float) and the TCVSDataSet component will sort accordingly and as intended.
I will experiment with this and see.
Quote
Where things seem to go wrong imho is the reading of the actual data from the file. IMO It should respect the fielddefs (and not "overwrite" them). Perhaps a separate import method or similar could be a possible solution (perhaps a class helper).
Yes your correct, it is when you read the file data things go sideways. I have never written a class helper. Would you have any example code on how to write a class helper please?
 
Quote
Once loaded into memory it is normal that the data in memory does not change its (storage) type so converting things at runtime is not the road to take as that would not be normal database behaviour (whatever is considered normal).
Alright, understood.

Quote
SQL databases for example can influence this by casting/converting the fields from a query to another type (that is a feature of the SQL language). In such cases if you want to change the type then one can simply change the query to match expectations. CVSDataset however does not have such luxury.
Thank you and understood again!

Quote
If you want more detailed information then I would have to dive even deeper into the sources of CVSDataset but in all honestly I personally find that a bit of a waste of time as it is pretty easy to import a CSV to f.e. a sqlite table though I completely understand your reasoning why you want to improve things (and if possible I can certainly try and see what can be done to help).
You have been very helpful I have no wish to waste any more of your valuable time. I will do this slowly as and when time permits me. Thank you for all your help, advice and guidance.
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #20 on: July 26, 2024, 05:53:18 pm »
I have never written a class helper. Would you have any example code on how to write a class helper please?
// - https://www.freepascal.org/docs-html/ref/refch10.html
// - https://wiki.freepascal.org/Helper_types

so f.e.:
Code: Pascal  [Select][+][-]
  1. uses
  2.   csvdataset;
  3.  
  4. type
  5.   TCVSDSHelper = class helper for TCSVDataSet
  6.     procedure ImportFromFile(aFilename: string);
  7.   end;
  8.  
  9. procedure TCVSDSHelper.ImportFromFile(aFilename: string);
  10. begin
  11.   // your code here
  12. end;
  13.  
  14. var
  15.   DS:  TCVSDataSet;
  16. begin
  17.   DS := TCVSDataSet.Create(nil);
  18.   // do stuff
  19.   DS.ImportFromFile('somefilename');
  20.   // do stuff
  21.   DS.Free;
  22. end.
  23.  

All software is open source (as long as you can read assembler)

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #21 on: July 27, 2024, 12:41:56 pm »
OK, a proof of concept tells us that that a helper is able to make things work behind the scenes but it is very dumb/stupid in that it loads the data alternatively with a stringlist.

The real culprit seems to be located in the packetreader in unit csvdataset (see highlighted line).

Code: Pascal  [Select][+][-]
  1. procedure TCSVDataPacketReader.LoadFieldDefs(var AnAutoIncValue: integer);
  2. Var
  3.   FN : String;
  4.   I : Integer;
  5.  
  6. begin
  7.   FParser:=TCSVParser.Create;
  8.   FParser.Assign(FOptions);
  9.   FParser.SetSource(Stream);
  10.   FCurrentRow:=0;
  11.   ReadNextRow;
  12.   If Assigned(CreateFieldDefs) then
  13.    begin
  14.    if (CreateFieldDefs.Count<>Fline.Count) then
  15.      DatabaseErrorFmt('CSV File Field count (%d) does not match dataset field count (%d).',[Fline.Count,CreateFieldDefs.Count],Dataset.FieldDefs.Dataset);
  16.    If FOptions.FirstLineAsFieldNames then
  17.      For I:=0 to FLine.Count-1 do
  18.        If (CompareText(FLine[i],CreateFieldDefs[i].Name)<>0) then
  19.          DatabaseErrorFmt('CSV File field %d: name "%s" does not match dataset field name "%s".',[I,FLine[i],CreateFieldDefs[i].Name],Dataset.FieldDefs.Dataset);
  20.    Dataset.FieldDefs.Assign(CreateFieldDefs);
  21.    end
  22.   else if (FLine.Count>0) then
  23.     For I:=0 to FLine.Count-1 do
  24.       begin
  25.       If FOptions.FirstLineAsFieldNames then
  26.         FN:=FLine[i]
  27.       else
  28.         FN:=Format('Column%d',[i+1]);
  29.       Dataset.FieldDefs.Add(FN,ftString,Foptions.DefaultFieldLength);  // <<-- hardcoded ftString
  30.       end;
  31.   if FOptions.FirstLineAsFieldNames then
  32.    ReadNextRow;
  33. end;
  34.  
« Last Edit: July 27, 2024, 12:57:00 pm by TRon »
All software is open source (as long as you can read assembler)

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #22 on: July 29, 2024, 09:43:30 pm »
OK, a proof of concept tells us that that a helper is able to make things work behind the scenes but it is very dumb/stupid in that it loads the data alternatively with a stringlist.

The real culprit seems to be located in the packetreader in unit csvdataset (see highlighted line).

Code: Pascal  [Select][+][-]
  1. procedure TCSVDataPacketReader.LoadFieldDefs(var AnAutoIncValue: integer);
  2. Var
  3.   FN : String;
  4.   I : Integer;
  5.  
  6. begin
  7.   FParser:=TCSVParser.Create;
  8.   FParser.Assign(FOptions);
  9.   FParser.SetSource(Stream);
  10.   FCurrentRow:=0;
  11.   ReadNextRow;
  12.   If Assigned(CreateFieldDefs) then
  13.    begin
  14.    if (CreateFieldDefs.Count<>Fline.Count) then
  15.      DatabaseErrorFmt('CSV File Field count (%d) does not match dataset field count (%d).',[Fline.Count,CreateFieldDefs.Count],Dataset.FieldDefs.Dataset);
  16.    If FOptions.FirstLineAsFieldNames then
  17.      For I:=0 to FLine.Count-1 do
  18.        If (CompareText(FLine[i],CreateFieldDefs[i].Name)<>0) then
  19.          DatabaseErrorFmt('CSV File field %d: name "%s" does not match dataset field name "%s".',[I,FLine[i],CreateFieldDefs[i].Name],Dataset.FieldDefs.Dataset);
  20.    Dataset.FieldDefs.Assign(CreateFieldDefs);
  21.    end
  22.   else if (FLine.Count>0) then
  23.     For I:=0 to FLine.Count-1 do
  24.       begin
  25.       If FOptions.FirstLineAsFieldNames then
  26.         FN:=FLine[i]
  27.       else
  28.         FN:=Format('Column%d',[i+1]);
  29.       Dataset.FieldDefs.Add(FN,ftString,Foptions.DefaultFieldLength);  // <<-- hardcoded ftString
  30.       end;
  31.   if FOptions.FirstLineAsFieldNames then
  32.    ReadNextRow;
  33. end;
  34.  

@TRon thank you very much for searching through the source code to find this. Now I have to find a way to change that so it will not convert everything to ftString on file read and data load. How did you highlight that line in yellow?
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #23 on: July 29, 2024, 10:36:35 pm »
@TRon thank you very much for searching through the source code to find this.
You're welcome.

Quote
Now I have to find a way to change that so it will not convert everything to ftString on file read and data load.
a small excerpt from my very simple (and wrong because does not adhere to ISO standard) stringlist loader:
Code: Pascal  [Select][+][-]
  1. var
  2.   Fields : TStringArray;
  3.   Line : string; // e.g. for  line in stringlist do
  4.   ( Self = dataset )
  5. ...
  6.     Fields := Line.Split([',']);
  7.     if Length(Fields) < Self.FieldDefs.Count then
  8.     begin
  9.       writeln('more fielddefs than there are fields in the file');
  10.       break;
  11.     end;
  12.  
  13.     for idx := 0 to Self.FieldDefs.Count-1 do
  14.     begin
  15.       case Self.FieldDefs[idx].DataType of
  16.         ftInteger  : Self.FieldByName(Self.FieldDefs[idx].Name).AsInteger := Fields[idx].ToInteger;
  17.         ftString   : Self.FieldByName(Self.FieldDefs[idx].Name).AsString  := Fields[idx];
  18.         ftDateTime : Self.FieldByName(Self.FieldDefs[idx].Name).AsDateTime := StrToDateTime(Fields[idx]);
  19.         // TODO: add support for other field types, see https://www.freepascal.org/docs-html/fcl/db/tfieldtype.html
  20.       end;
  21.     end;
  22.  

Quote
How did you highlight that line in yellow?
See here at the very bottom of that paragraph.
All software is open source (as long as you can read assembler)

 

TinyPortal © 2005-2018