Recent

Author Topic: Taazz nice work on SDFDataset - and my nasty test code  (Read 26742 times)

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #15 on: September 19, 2012, 03:32:22 pm »
New tests:
- removed some tests with quotes added by taazz that tested csv instead of sdf
- use fixed directory name for output instead of asking with inputbox; although this loses functionality, it allows easier integration into fpc's dbtestframework tests
- removed duplicate tests in tcsdfdata.pp and testsdfmultiline.pas from multiline. tcsdfdata will result in a patch for the dbtestframework test
- added write+read test to test that fields that are saved to disks are read back resulting in the same content
- changed build modes so that default build mode does not create debug files. The benefit: you can run the tests from within the ide without breaking on assertions/exceptions. If you want to debug the tests, you can run in debug mode
- used the exact test data from bug 19610 to make it clearer what is expected.
- edit: split out corner case as promised, uploaded newtests2.zip

Discussion: I think sdfdataset should read and write sdf data created with strictdelimiter=false, because strictdelimiter is false by default and there is no option in the existing sdfdataset to change strictdelimiter.

In this case, test TestDelphiCompat_DelimiterTrue is not relevant and should be commented out.
(But I think it should be retained as comments in case the situation changes).

Thanks,
BigChimp
« Last Edit: September 19, 2012, 03:40:33 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #16 on: September 19, 2012, 03:50:33 pm »
taazz: yes, if you want to write a csv dataset, be my guest, but sdf is different than csv (sigh).

In my opinion, the sdf dataformat is just plain )$*(%#()*$% ugly. However, interoperability with Delphi is therefore (I think) about the only reason anybody would want this abomination.
I would definitely try to support everything a normal Delphi app would spit out - suggested: with strictdelimiter:=false, as that is the default.
Unfortunately, because of lack of a better alternative, loads of people (including me in the beginning) insisted on using sdfdataset to load their csv files, which often works but breaks horribly on boundary conditions where
- sdf specs and/or
- the Delphi implementation which deviates a bit from the spec and/or
- the FPC implementation, which differs rather more from the spec, see bug 19610
differ from what any sane csv format would be.

Any link for the SDF Specs? See at the bottom what I found.


Looking at the Delphi output for bug report 19610:

Code: [Select]
normal_string;quoted_string;"quoted;delimiter";quoted and space;"""quoted_and_starting_quote";"""quoted, starting quote, and space";quoted_with_tab character;quoted_multi
line;  UnquotedSpacesInfront;UnquotedSpacesAtTheEnd   ;"  ""Spaces before quoted string""";Spaces after quoted string;   ;

The delimiter is ';' on this record?

gives:
(The numbers below indicate field number)
Code: [Select]
Resulting elements with strictdelimiter false:
0normal_string
1quoted_string
2quoted;delimiter
3quoted and space
4"quoted_and_starting_quote
5"quoted, starting quote, and space
6quoted_with_tab character
7quoted_multi
line
8UnquotedSpacesInfront
9UnquotedSpacesAtTheEnd
10Spaces before quoted string
11Spaces after quoted string
12

Well, perhaps supporting the spaces after quoted string thing is too much.

no that's the easy part understanding the sdf specs with out reading them is a bit tedious ;)


I'm almost done cleaning up the test cases to closely match the Delphi test program in 19610.
I'll separate out the Spaces after quoted string case, and remove some of your added quote tests.

I don't mind the spaces I do want to know the quote character used the data values in memory aka how the program sees them and on file so I can understand the problem clearly.

From what I have seen so far by changing the delimiter to ';' all those lines should work as is with the current implementation.


Understand and agree with your further changes, but I think those may actually be better done in a CSV dataset.

I would really like to see an RFC 4180 compliant CSV dataset and I would *strongly* suggest you take a look at combining csvdocument (see the wiki), as it's csvparser beautifully supports all the intricacies of RFC4180, as well as Excel mode etc.
This means we don't need to implement a parser of our own.

I really don't like SDF/CSV datasets. I would never use one. They are a memory hog except if they are used for a few hundred records and their performance is terrible because they use strings as the buffer and constantly converting from and to string on each operation.

I haven't looked at csvDocument my self but when it comes to csv I prefer a record parser that I can use to import the data to a more robust database this be it access/mysql/firebird or any other database engine that does not require me to load all the data in memory to work with.

The only true use of a csv files is to exchange data import and export and in some extreme cases where the import might take a while eg dts service with heavy calculations before or during inserts. I had to work with csv documents on the size of a 800M to 1.4G each to load those in memory (no import mind you) would require 3x the time it took to import them.


The rest of the dataset support could be built on this, e.g. by using memds or bufdataset or possibly ripping out the sdfdataset code (which I have my doubts about but by now you know much more about it than I).

Writing out the csv to file should once again be easy as csvdocument has a class for that as well.

I'll polish up the test cases for sdfdataset and post them...

Awaiting with interest to hear your opinion!
Thanks,
BigChimp

I'm here to make the sdf dataset work as expected I'm not going to use it my self at all, so if no one is going to use it either I think I will
just implement the delphi compatibility so every one that uses it can steel do so and leave it at that.

I'm against any kind of memory dataset for anything more than a glorified memory container for extremely complex forms eg. booking tickets on a ship with various port stops in between and travel planing with linked ships, flights, trains etc, and those only for performance reasons, this means no data editing, no data inserting, no deleting except on extreme cases.

While bufDataset is nice to have and it helps a lot with the SQLDB framework it needs to be as slim and fast as possible to avoid bottlenecks, I wouldn't build an un-cached dataset on top of it. 


Any chance of getting my hands on delphi documentation for sdf? the following link seems to indicate that an extra file is required with the header information about the file.
http://www.delphigroups.info/2/76/115068.html

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #17 on: September 19, 2012, 04:15:24 pm »
Any link for the SDF Specs? See at the bottom what I found.

bug 19610 has a link to the relevant Delphi help file entry:
http://docwiki.embarcadero.com/VCL/en/Classes.TStrings.DelimitedText,
and del4.zip attachment in 19610 has test code that shows how Delphi (Turbo Delphi, Delphi XE) really handles sdf.

no that's the easy part understanding the sdf specs with out reading them is a bit tedious ;)
[..]
I don't mind the spaces I do want to know the quote character used the data values in memory aka how the program sees them and on file so I can understand the problem clearly.

From what I have seen so far by changing the delimiter to ';' all those lines should work as is with the current implementation.
See above for specs + test program.


I really don't like SDF/CSV datasets. I would never use one. They are a memory hog except if they are used for a few hundred records and their performance is terrible because they use strings as the buffer and constantly converting from and to string on each operation.

I haven't looked at csvDocument my self but when it comes to csv I prefer a record parser that I can use to import the data to a more robust database this be it access/mysql/firebird or any other database engine that does not require me to load all the data in memory to work with.
Sorry, I may not have been clear: CSVDOCUMENT HAS SUCH A PARSER  >:( >:( >:(
Please reread my previous emails, or better, look at tcsvparser in tcsvdocument.pas (if I remember the names correctly).
(Sorry, but it seems like I'm forever repeating to people that sdf is not csv, that you can use the parser or the in-memory csvdocument functionality in csvdocument etc.)

I'm here to make the sdf dataset work as expected I'm not going to use it my self at all, so if no one is going to use it either I think I will
just implement the delphi compatibility so every one that uses it can steel do so and leave it at that.
Sounds good.

Any chance of getting my hands on delphi documentation for sdf? the following link seems to indicate that an extra file is required with the header information about the file.
http://www.delphigroups.info/2/76/115068.html
No idea what that is, perhaps some ancient format Delphi used. The docs I linked to above don't mention those schema files.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #18 on: September 19, 2012, 04:16:45 pm »
Discussion: I think sdfdataset should read and write sdf data created with strictdelimiter=false, because strictdelimiter is false by default and there is no option in the existing sdfdataset to change strictdelimiter.

It doesn't matter strictdelimiter is not used at all tstringlist does not format anything, everything is controlled from the sdfdataset.
Even if tstringlist was responsible for the formating you can't use it to break the field values with strictdelimiter = false because it will create a new line for each space it encounters.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #19 on: September 19, 2012, 04:27:35 pm »
It doesn't matter strictdelimiter is not used at all tstringlist does not format anything, everything is controlled from the sdfdataset.
Even if tstringlist was responsible for the formating you can't use it to break the field values with strictdelimiter = false because it will create a new line for each space it encounters.
Sorry, my scenario:
Delphi application writes data to sdf file using multiple tstringlist.delimitedtext
FPC application reads them using sdfdataset.
Does your reasoning still apply?

Writing out spaces in fields with strictdelimiter off does not generate line endings AFAIR? See the test results in del4.zip
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #20 on: September 19, 2012, 04:31:42 pm »
I haven't looked at csvDocument my self but when it comes to csv I prefer a record parser that I can use to import the data to a more robust database this be it access/mysql/firebird or any other database engine that does not require me to load all the data in memory to work with.

The only true use of a csv files is to exchange data import and export and in some extreme cases where the import might take a while eg dts service with heavy calculations before or during inserts. I had to work with csv documents on the size of a 800M to 1.4G each to load those in memory (no import mind you) would require 3x the time it took to import them.
Let me leave you with some thoughts:
1. sqlquery uses bufdataset, right?
2. if you implement bufdataset.loadfromcsv...
3.<fiddling with datatypes>
4. set the SELECT/INSERT SQL statment
5. mark the query dirty or whatever it's called in FPC speak, then apply updates... voila csv import into your database, right?
Of course, as you mention, memory limits as things are buffered in memory.
For smaller sets, quite easy perhaps.

Seem to vaguely remember there is a sqlquery.savetofile/loadfromfile or something functionality... ludob posted about this regarding offline support for datasets. CSV support could tie in there

However, of course you can also roll your own routines etc,

Ok, off for a while.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #21 on: September 19, 2012, 04:34:50 pm »
Sorry, my scenario:
Delphi application writes data to sdf file using multiple tstringlist.delimitedtext
FPC application reads them using sdfdataset.
Does your reasoning still apply?

I don't know I have to check the del4.zip to see. does it have a file included (downloading as I type this).

Writing out spaces in fields with strictdelimiter off does not generate line endings AFAIR? See the test results in del4.zip

No writing out spaces does not create line endings reading them in with out quotes and strict = false does, in delphi too.

EDIT:
The files included seem to have delimiter = ';' not ',' have you tried the existing tests setting the delimiter = ';' before opening the file?
testing as we speak.

EDIT2:

There is a bug in the sdfdata.pp in 1258
Code: [Select]
    //Change the code below
        //AnsiQuotedStr(Str, FFieldQuote);
        Str := AnsiQuotedStr(Str, FFieldQuote);

also the sdfdataset has a peculiar way of determining the field size and after it is opened it will not be changed so change the setup procedure lines 614..616 to

Code: [Select]
  TestDataset.Schema.Add('ID=20');
  TestDataset.Schema.Add('NAME=100');
  TestDataset.Schema.Add('BIRTHDAY=30');
« Last Edit: September 19, 2012, 05:51:50 pm by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #22 on: September 20, 2012, 07:12:57 am »
EDIT2:

There is a bug in the sdfdata.pp in 1258
Code: [Select]
    //Change the code below
        //AnsiQuotedStr(Str, FFieldQuote);
        Str := AnsiQuotedStr(Str, FFieldQuote);
Yep, that seems nasty. I'll update it in my copy here and wait for more changes ;)

also the sdfdataset has a peculiar way of determining the field size and after it is opened it will not be changed so change the setup procedure lines 614..616 to

Code: [Select]
  TestDataset.Schema.Add('ID=20');
  TestDataset.Schema.Add('NAME=100');
  TestDataset.Schema.Add('BIRTHDAY=30');
Ok.

Latest version of my files can always be downloaded from
https://bitbucket.org/reiniero/fpc_laz_patch_playground/src
directory sdfdataset

Thanks,
BigChimp
« Last Edit: September 20, 2012, 07:18:51 am by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #23 on: September 20, 2012, 10:45:25 am »
Sorry, correction to test code - had overlooked some spaces:
sdfdataset test: correction: corner case strict delimiter true has extra spaces in the first empty line
Committed to repository

Also removed some ;s at the end of test cases (which would lead to creation of a new field) but that has no influence on the tests as they were.

ATM writing compiler test cases for tstringlist.delimitedtext to make sure fpc has the right test cases to work against in fixing .delimitedtext/sdf support.
Checking test case output with Turbo Delphi 2006.
Test case (work in progress) can be found in tw19610.pp in repository (root directory)
Edit: got it working; all tests run on Turbo Delphi 2006; opening new thread for feedback requests.
« Last Edit: September 20, 2012, 12:49:48 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #24 on: September 20, 2012, 06:59:07 pm »
a small update.

I'm working on the space's (leading and trailing) problem at the moment. I'm trying various methods to keep them with out much success so far. I'll try to post something new in the weekend no promises though.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #25 on: September 22, 2012, 11:05:59 pm »
Please Find attached the latest version.

In this version I have rewrote the field parser.
It now tries to guess if there is a quoted value in a field and if it finds one it removes the quotes preserving all the data outside the quotes (before and after).
Curently there is no way to get back only the data inside the quote unless the data outside the quotes are only spaces and you have selected to trim them.

Changed the record initialization process to fill the empty space with #1 instead of #32(' ') making it easier to distinguish between record empty space and spaces that are part of the
field's data.

Introduced 2 new properties TrimTrailingSpaces & TrimLeadingSpaces. The default behavior is not to trim anything.

The current implementation preserves all the data in a file allowing the end user to delete or trim what ever he wants to.

Didn't had the time to change how a field's value is quoted when saving yet, as far as I can see if any non standard characters are found in the value they are quoted.

Regards
Jo.

PS.
BC you think this is good enough?
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #26 on: September 23, 2012, 05:24:27 pm »
Sorry, Jo, I've been busy. I might have a look at it today or in a day or two.

Have you run the test suite (https://bitbucket.org/reiniero/fpc_laz_patch_playground/src, sdfdataset,testgui.lpi) against your changes?
I think that would be my yardstick to see whether it has improved....

Will get back to you...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #27 on: September 24, 2012, 07:15:09 am »
Uploaded tests zip for convenience:
https://bitbucket.org/reiniero/fpc_laz_patch_playground/downloads/sdfdataset_tests_24Sept2012.zip
New tests as always are very welcome. Please see the readme.txt for details.

Notes:
1. Noticed gui testrunner Laz trunk+FPC trunk just hung on test run. Added some try..except around writelns that seemed to solve things.


Test results:
1. trunk sdfdata: 4 failures:
TestDelphiCompat_DelimiterFalse
TestDelphiCompat_DelimiterFalseCornerCases
TestDelphiCompat_DelimiterTrue
TestDelphiCompat_DelimiterTrueCornerCases

2. taazz sdfdata: 3 failures:
TestDelphiCompat_DelimiterFalse now works.
The failure in TestDelphiCompat_DelimiterTrue is on a different string than the the trunk failure

Delimiter false is the default mode, so that is a big win.

Result XML file and the more useful console output lines with detailed differences are attached to the tests download mentioned above.


Remarks:
1. taazz, perhaps renaming maxintlength etc to a clearer name (SDFDataMaxInt or something) may be a good idea? Otherwise people may perhaps mistake it for a RTL defined constant?
2. TestDelphiCompat_DelimiterFalseCornerCases gives data containing "TRUE" now!?! See console output file. Perhaps you need to fiddle with TrimLeadingSpaces for TestDelphiCompat_DelimiterFalseCornerCases, TestDelphiCompat_DelimiterTrueCornerCases. Could that help?

In general, I'm not worried very much about the cornercases tests as those are just too weird. It would be nice if it could work for delimitertrue as well, but I suppose the code as is is already a major improvement, because delimiter=false is the default in Delphi.

Finally, it's not up to me but to the developers whether they'll commit it... but having a documented set of improvements on test cases is a very good way of proving our point that a patch makes sense.

Thanks,
BigChimp
« Last Edit: September 24, 2012, 05:18:32 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #28 on: September 24, 2012, 05:56:28 pm »
@taazz: mailng list thread on Delphi compatibility
http://lists.freepascal.org/lists/fpc-pascal/2012-September/034957.html
... apparently it's not necessary!?!?

I'll wait for confirmation. If so, I'll remove the test code. I only wanted to improve sdfdata so it can better interoperate with sdf data. If that is not required, I don't see the need for the test code.

Thanks,
BigChimp
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #29 on: September 24, 2012, 06:43:20 pm »
Hi FC,

First I will like to thank you for all you support. Your test cases where impressive. This think started as a simple task (multiline support for the records) and ended rewriting the record parser to support a lot more. This is the final enhancement on the parsing that I'm going to do.

It now passes all the tests except the one called TestDelphiCompat_DelimiterTrueCornerCases although it can parse the 1st field correctly it will never parse the rest as delphi does. After the changes I've made, I made sure that the space character is not considered a break character anywhere, they can be trimmed  if the user selects to, but that's about it.

Find attached the latest changes in this one I have introduced a new property AlwaysDeQuote when true the parser will remove the quotes regardless of the their position in the string, if false it will remove them only if the string starts with a quote. And this is the default behavior for CSV files. The quote remover is smart enough to not loose any data when parsing and unquote multiple quoted values in the string correctly.

I don't think that I should try to support the last case it will never be used and I don't think it is desirable either.

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

TinyPortal © 2005-2018