Recent

Author Topic: How can I sort String Column Numerically  (Read 2957 times)

Aruna

  • Sr. Member
  • ****
  • Posts: 353
How can I sort String Column Numerically
« on: July 21, 2024, 02:37:42 am »
Hello, If you look at the attached screenshot in the first column ( index ) the data displayed is of type string. When I sort the data how can I get it to sort numerically? I am using a CSVDataset, Datasource and DBGrid.
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #1 on: July 21, 2024, 12:19:31 pm »
Isn't the index column recognized as a number (by the fielddefs) ? E.g. is the index column literally stored as a string (I believe quoted for CSVdataset) ?
All software is open source (as long as you can read assembler)

wp

  • Hero Member
  • *****
  • Posts: 12279
Re: How can I sort String Column Numerically
« Reply #2 on: July 21, 2024, 01:25:53 pm »
TCSVDataset only knows strings as fields. Therefore, it is not possible to switch to a non-string sort order. I personally consider TCSVDataset only as a means to bringt CSV files into the "database world", but would not do any serious work with it. There first step, after reading, should be to copy the TCSVDataset over to another more powerful dataset.

Or, do not use a dataset at all and avoid the database overhead. Read the csv file into a standard TStringGrid (LoadFromCSVFile) where you have the event OnCompareCells to control sort order.

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #3 on: July 21, 2024, 02:14:23 pm »
TCSVDataset only knows strings as fields.
Not very fond to argue with someone more knowledgeable on the subject (as I do  not use csvdataset) but are you sure ?

If I create a csvdataset with an index field with a fielddef as integer, add some backwards iterating index values to the table (they are displayed backwards as inserted when showing the values) , then set the IndexFieldNames to index (indicating to sort ascending) and iterate the table then it iterates the index field from lowest to highest value.

I do not know the details wrt loading and storing values to an actual file so that might perhaps work confusing for our different view on the subject.
« Last Edit: July 21, 2024, 02:18:06 pm by TRon »
All software is open source (as long as you can read assembler)

wp

  • Hero Member
  • *****
  • Posts: 12279
Re: How can I sort String Column Numerically
« Reply #4 on: July 21, 2024, 03:33:39 pm »
TCSVDataset only knows strings as fields.
Not very fond to argue with someone more knowledgeable on the subject (as I do  not use csvdataset) but are you sure ?
100% sure when the header line in the CSV file is used to define the fields (CSVDataset.FirstLineAsFieldNames = true) because then there's nothing which tells the dataset which fields should be numeric, for example.
When the FirstLineAsFieldNames is false, the fields must be defined before loading the CSV File (FieldDefs), but I noticed then when I define a numeric field as ftInteger, it simply is not displayed in a DBGrid. All the simple dataset types of Lazarus (TCSVDataset, TFixDataset, TSdfDataset) are not usable for anything else but strings.

As an alternative you could use a TsWorksheetDataset (in the fpspreadsheet package) which is able to read CSV files (besides "real" spreadsheet" files), but has been made to support the most important field types, including some kind of auto-fieldtype-detection.

Zvoni

  • Hero Member
  • *****
  • Posts: 2614
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #6 on: July 23, 2024, 09:19:31 pm »
Isn't the index column recognized as a number (by the fielddefs) ? E.g. is the index column literally stored as a string (I believe quoted for CSVdataset) ?
This is the default behavior I have seen. By default, if you do not specify a FieldDef for a specific column in your CSV file, TCSVDataSet will recognize that field as a string (ftString). This default behavior is because CSV files store all data as text, and TCSVDataSet reads and interprets each field accordingly.

To explicitly define how fields are recognized and treated by TCSVDataSet, you typically use FieldDefs. As you can see below we can have a ftInteger type field. I am trying to see if this will sort numerically.
Code: Pascal  [Select][+][-]
  1. Common Data Types (TFieldType):
  2.     ftString: Represents string data.
  3.     ftInteger: Represents integer (whole number) data.
  4.     ftFloat: Represents floating-point (decimal) data.
  5.     ftDate: Represents date data.
  6.     ftBoolean: Represents boolean (true/false) data.
  7.     ftDateTime: Represents date and time data
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #7 on: July 23, 2024, 09:42:16 pm »
Does TCSVDataset support schema.ini?
https://learn.microsoft.com/en-us/previous-versions/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-ver16&redirectedfrom=MSDN
No, the TCSVDataSet component in Lazarus (from the CSVDocument package) does not natively support a schema.ini file
for defining field types, formats, or other metadata related to CSV files.

Understanding schema.ini:
In Windows environments, particularly with Microsoft products like Access and Excel, a schema.ini file is used to define the structure of CSV files.
It specifies information such as field names, data types, delimiters, and more. However, this file format and its usage are specific to Windows and Microsoft products.

TCSVDataSet in Lazarus:
Field Definitions: In Lazarus, you define the structure of CSV files using FieldDefs in the TCSVDataSet component.
This approach allows you to specify the field names and their data types programmatically within your Lazarus application.

Alternatives:
If you need to specify CSV file structure externally (similar to schema.ini), you would typically need to implement custom logic to read
such metadata from another file format (like JSON or XML) or from a database. This involves parsing the metadata file and dynamically
setting up FieldDefs in TCSVDataSet accordingly.

Conclusion:
While TCSVDataSet in Lazarus does not support schema.ini files directly, you can achieve similar functionality by programmatically defining FieldDefs within your Lazarus application. This approach gives you flexibility in defining and manipulating CSV file structures based on your specific requirements and use cases.

All the above was generated by chatGPT.  :)
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #8 on: July 23, 2024, 09:51:57 pm »
TCSVDataset only knows strings as fields. Therefore, it is not possible to switch to a non-string sort order. I personally consider TCSVDataset only as a means to bringt CSV files into the "database world", but would not do any serious work with it. There first step, after reading, should be to copy the TCSVDataset over to another more powerful dataset.

Or, do not use a dataset at all and avoid the database overhead. Read the csv file into a standard TStringGrid (LoadFromCSVFile) where you have the event OnCompareCells to control sort order.
The reason I want to fully understand TCVSDataset is because I am thinking of putting together a few demo applications for each component. We do not have any demos that are beginner-friendly or demonstrate clearly, the brick walls one can hit or the weird errors that make no sense until someone more experienced shows you how and why?

I was thinking maybe have code that actually cause and demonstrates errors, then explain what causes them and show how to fix/resolve them? The end-user will have a much faster learning curve than what is available now?  (I could be wrong.)
 
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

wp

  • Hero Member
  • *****
  • Posts: 12279
Re: How can I sort String Column Numerically
« Reply #9 on: July 23, 2024, 11:27:52 pm »
All the above was generated by chatGPT.  :)
As usual, ChatGPT writes a nice story, but it is not the truth...

The attached demo creates a TCSVDataset with two fields, an integer field and a string field, by using the FieldDefs. When you run the demo, the DBGrid attached to the dataset is empty, since the data file does not yet exist, the grid at the right shows that these fields are created correctly. Fine.

Now enter some data into the DBGrid, an integer in the first column and a string in the second column, and save and close. Run the demo a second time. The entered values are loaded from the CSVFile, because there is a data file now, but when you look at the grid at the right you'll see that the integer field has transformed to a string field, ignoring the field type at creation.

Probably, this is a bug and should be reported (CSVDataset should respect previously created fielddefs when loading a data file). But it demonstrates that ChatGPT is talking nonsense.

« Last Edit: July 24, 2024, 12:03:42 am by wp »

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #10 on: July 24, 2024, 02:51:49 pm »
All the above was generated by chatGPT.  :)
Quote
As usual, ChatGPT writes a nice story, but it is not the truth...
I just use it to get an idea of what is out there in the public domain.
And most times it has pointed me in the right direction. Ultimately one has to carefully double-check and test that the code it serves up is accurate and error-free.
Quote
Quote
The attached demo creates a TCSVDataset with two fields, an integer field and a string field, by using the FieldDefs. When you run the demo, the DBGrid attached to the dataset is empty, since the data file does not yet exist, the grid at the right shows that these fields are created correctly. Fine.
Thank you, very nice. I plugged in a routine that sorts the column when a user clicks the DBGrid column title and the first time things work as expected. The ftInteger column respects and behaves as a integer.
But the second time around like you pointed out all the columns in the csv file are loaded as ftString.

Quote
Quote
Now enter some data into the DBGrid, an integer in the first column and a string in the second column, and save and close. Run the demo a second time. The entered values are loaded from the CSVFile, because there is a data file now, but when you look at the grid at the right you'll see that the integer field has transformed to a string field, ignoring the field type at creation.
Yes I see what you mean. I tried to change the field type at runtime back to ftInteger but it does not seem to work. IS it possibe to change a ftstring column to ftInteger at run time?
Quote
Quote
Probably, this is a bug and should be reported (CSVDataset should respect previously created fielddefs when loading a data file).
I want to do some more testing before I file a bug report.
Quote
Quote
But it demonstrates that ChatGPT is talking nonsense.
For me it is simply a tool to be used with caution and when all else fails I allow common sense to prevail. A couple of day sago I asked chatGPT to give me source code for a complete php CRUD application. And guess what? It did exactly that and error free. I was quite stunned and also quite scared afterwords at the possibilities in future. They will not need people like us anymore which will be very sad  :'(

So is there a way to change a CSVDataset column field type at run time from ftString to ftInteger please?
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #11 on: July 24, 2024, 03:57:27 pm »
So is there a way to change a CSVDataset column field type at run time from ftString to ftInteger please?
Sorry for not getting back to you sooner regarding this topic Aruna.

The (short) answer is unfortunately no.

The long answer is, yes you can change the type of the field at runtime but unfortunately that doesn't (automatically) convert the actual contents that was loaded to the correctly typed field-value.

Like wp I am leaning towards believing this being a bug.

When you define the fielddefs and then open/activate a dataset all the definitions are gone (re-init from the stored data from file). However, TCSVDataset conveniently has another method named LoadCSVFromFile that does leave the definitions intact when loading the data. Alas in that case the data (also) does not get automatically converted to the correct/required fieldtype when being loaded (which I would have expected it should).

I currently do not have the time to delve deeper into the reason(s) as of why.
All software is open source (as long as you can read assembler)

Aruna

  • Sr. Member
  • ****
  • Posts: 353
Re: How can I sort String Column Numerically
« Reply #12 on: July 24, 2024, 04:28:34 pm »
Quote
So is there a way to change a CSVDataset column field type at run time from ftString to ftInteger please?
Sorry for not getting back to you sooner regarding this topic Aruna.
Hey TRon, absolutely no worries. There is no rush or urgency. I still want to see if we can get the CSVDataset to recognize and respect the field type defined per column. 

Quote
The (short) answer is unfortunately no.

The long answer is, yes you can change the type of the field at runtime but unfortunately that doesn't (automatically) convert the actual contents that was loaded to the correctly typed field-value.
Alright understood and thank you. So I found this LoadFromCSVFile but I have no idea how to go about trying to implement the required behaviour ( ..yet) I am thinking of going through the source file lcsvutils.pas line 25 and see if I can make heads or tails from the code.

Quote
Like wp I am leaning towards believing this being a bug.
Fair enough. So if I want to fix this bug, how do I go about doing that please?
[
Quote
When you define the fielddefs and then open/activate a dataset all the definitions are gone (re-init from the stored data from file). However, TCSVDataset conveniently has another method named LoadCSVFromFile that does leave the definitions intact when loading the data. Alas in that case the data (also) does not get automatically converted to the correct/required fieldtype when being loaded (which I would have expected it should).
Thank you again for the explanantion. I am wondering how difficult it will be if I was to go about trying to get CSVDataset to automatically convert all fieldtypes to teh correct/required type?

Quote
I currently do not have the time to delve deeper into the reason(s) as of why.
No worries and thank you very much for your time spent on this. Very much appreciated.
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

wp

  • Hero Member
  • *****
  • Posts: 12279
Re: How can I sort String Column Numerically
« Reply #13 on: July 24, 2024, 06:23:34 pm »
However, TCSVDataset conveniently has another method named LoadCSVFromFile that does leave the definitions intact when loading the data.
Yes, but it does not read the non-string fields. In the attachment (csv_fielddefs_3) there is a modified version of the previous demo: At first it should be noted that using LoadCSVFromFile the FileName field must be empty because this locks the file for opening with LoadCSVFromFile. Again, the demo creates FieldDefs with an integer and a string field and opens the dataset; the DBGrid is empty initially as there are not data yet. The field types are displayed correctly, however. Now type in some data and exit the application - because there is no FileName, the records must be saved manually by calling SaveToCSVFile in the OnDestroy handler of the form. Open the generated data file ("data.csv") in an external editor and convince yourself that the data are stored correctly. Close the editor, and run the application again. Now the data file exists and is read via LoadFromCSVFile. The data types are still correct, but you'll notice that the integer column is empty now!

As I said before: The TCSVDataset is useful only with string fields.

If you want to read non-string fields from a CSV file I would use a different type of dataset, maybe TDbf or TBufDataset. In the attachment (csv_fielddefs_4) you find a small application how to do this. The essential part is that you must write a reader for the CSV data yourself, but this is easy - the code shows a way using classic Pascal file access. In case of more complex CSV files (e.g. line breaks in the text fields) I'd recommend to use the TCSVDocument.

TRon

  • Hero Member
  • *****
  • Posts: 3129
Re: How can I sort String Column Numerically
« Reply #14 on: July 25, 2024, 12:54:57 am »
@wp:
I understand your reply with alternative solutions (very much appreciated btw) but I have a hunch that the objective of Aruna is not to work around the issue rather to grasp the current situation wrt TCVSDataSet (and see if it perhaps can be improved).

I am completely on par with your view that TCVSDataSet has its limitations (hence why I never use it myself) not in the least because it committed itself to a certain RFC.
« Last Edit: July 25, 2024, 12:57:27 am by TRon »
All software is open source (as long as you can read assembler)

 

TinyPortal © 2005-2018