Recent

Author Topic: Stringgrid vs. LoadFromCSVFile  (Read 51836 times)

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Stringgrid vs. LoadFromCSVFile
« Reply #15 on: June 04, 2012, 03:28:37 pm »
Ah: perhaps the patch in 19610 can help. Been a while, I forgot the details. Will test it.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #16 on: June 04, 2012, 09:57:31 pm »
I was correct then in thinking SetDelimitedText was broke...

We may want to use DelimitedText when again when this patch has been incorporated in next stable fpc version.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #17 on: June 08, 2012, 08:57:15 am »
Consider this data:

Code: [Select]
   test   ,   "a quote"  ,normal_string

Note the leading and trailing spaces in the first two fields.

How should we treat them?
  • Trim leading and trailing spaces (current behaviour)
  • Preserve spaces
  • Reject data as invalid

CSV data IMO means that only Delimiter (unless in a quoted field) should be taken a field separator. So the spaces would have to be part of the data, although they do not comply with rfc 4180.


Root cause of this problem was/is faulty implementation of TStrings.GetdelimitedText and TStrings.SetdelimitedText, see issue 19610 in the bugtracker.

I tend to re-implement this using the fixes suggested in that issue.

Bart

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Stringgrid vs. LoadFromCSVFile
« Reply #18 on: June 08, 2012, 10:30:10 am »
So the spaces would have to be part of the data, although they do not comply with rfc 4180.
Where do you see this in rfc 4180?
The ABNF grammar says, http://tools.ietf.org/html/rfc4180 
Quote
Code: [Select]
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E
So spaces have to be treated as any normal character, quoted or not.

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Stringgrid vs. LoadFromCSVFile
« Reply #19 on: June 08, 2012, 10:34:38 am »
I'd say we follow RFC4180:
http://tools.ietf.org/html/rfc4180
At least it is some kind of standard and you can build applications that interoperate based on that.

Quote
CSV data IMO means that only Delimiter (unless in a quoted field) should be taken a field separator. So the spaces would have to be part of the data, although they do not comply with rfc 4180.
Well, not following RFC4180 means we're defining yet another standard. Not a good idea, I think.
However, see section 2 point 4: "Spaces are considered part of a field and should not be ignored." So, this *does* comply with RFC4180.

Quote
Root cause of this problem was/is faulty implementation of TStrings.GetdelimitedText and TStrings.SetdelimitedText, see issue 19610 in the bugtracker.
Note: DelimitedText deals with *SDF* format, which is very similar to but not necessarily the same as RFC4180.

However, afer implementing the patch in 19610 (which aligns DelimitedText behaviour with Delphi), if we modify the output of DelimitedText to always quote fields that have spaces or quotes or line endings (CR and/or LF), we still maintain Delphi compatibility (see tests in bug report) and we end up outputting both valid SDF and CSV (if you set CommaText to true).
(So yes, I totally agree with your last post in that issue suggesting we quote stuff.)

On the import side, the SDF behaviour of not stripping trailing and leading spaces actually seems to match RFC4180.
After 19610 is implemented, we could take a look at whether reading csv data using delimitedtext works using a test program (which can be submitted to the compiler test program suite). I suspect it won't completely work... and fix that if necessary.

Then we could patch the stringgrid loadfromcsvfile and the save routine, as well as any other csv import/export routine floating around in FPC and Lazarus that we can lay our hands on.

*Edited for clarity
« Last Edit: June 08, 2012, 11:16:50 am by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #20 on: June 08, 2012, 01:48:43 pm »
I'd say we follow RFC4180:
http://tools.ietf.org/html/rfc4180
At least it is some kind of standard and you can build applications that interoperate based on that.

Sounds allright.

Note: DelimitedText deals with *SDF* format, which is very similar to but not necessarily the same as RFC4180.

What is SDF format? (had no luck looking it up in wikipedia)

After 19610 is implemented, we could take a look at whether reading csv data using delimitedtext works using a test program (which can be submitted to the compiler test program suite). I suspect it won't completely work... and fix that if necessary.

We'll have to wait for next stable release with this patch (and even then ifdef it as long as 2.6.0 is supported by Lazarus trunk).

Regarding current behaviour: should we strip leading/trainling spaces (in unquoted fields) (as we do now)?
A quick test with Excel shows that (default setting when importing: do not use #32 as separator) these spaces will become part of the field.

Bart

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Stringgrid vs. LoadFromCSVFile
« Reply #21 on: June 08, 2012, 02:09:22 pm »
Note: DelimitedText deals with *SDF* format, which is very similar to but not necessarily the same as RFC4180.

What is SDF format? (had no luck looking it up in wikipedia)
It's Delphi specific. See the Delphi documentation on delimitedtext (as mentioned in my posts in the bug report)

After 19610 is implemented, we could take a look at whether reading csv data using delimitedtext works using a test program (which can be submitted to the compiler test program suite). I suspect it won't completely work... and fix that if necessary.

We'll have to wait for next stable release with this patch (and even then ifdef it as long as 2.6.0 is supported by Lazarus trunk).
We can ask for a backport to 2.6.1.
In any case, we can put in a conditional test for ({IF FPC_FULLVERSION or something) and get it tested and out the door for SVN and snapshot users.

Regarding current behaviour: should we strip leading/trainling spaces (in unquoted fields) (as we do now)?
A quick test with Excel shows that (default setting when importing: do not use #32 as separator) these spaces will become part of the field.
Well:
I'd say we follow RFC4180:
http://tools.ietf.org/html/rfc4180
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #22 on: June 08, 2012, 11:11:07 pm »
I'd say we follow RFC4180:
http://tools.ietf.org/html/rfc4180

So: spaces are part of the field, and should not be ignored...

I'll make it so.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #23 on: June 09, 2012, 05:23:47 pm »
@BigChimp: I didn't realize you were the same person I was discussing the TStrings.Get/SetDelimitedText issue with in the bugtracker...  :-[

Bart

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Stringgrid vs. LoadFromCSVFile
« Reply #24 on: June 09, 2012, 06:21:46 pm »
No worries, Bart, it's not like the names are the same (the username is but not the displayname in Mantis) ;)
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #25 on: June 09, 2012, 07:33:21 pm »
I adjusted the parsing for LoadfromCsvStream in r37598.

  • leading and trailing spaces are always part of the field
  • the exception is that trailing spaces after the end-quote of a quoted field are trimmed
  • also Tab characters are allowed as whitespace

Please (asdf and anyone else interested), test as much scenarios as one can think of and report any (possible) errors.

Bart

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Stringgrid vs. LoadFromCSVFile
« Reply #26 on: June 10, 2012, 06:23:15 pm »
Quote
test as much scenarios as one can think of and report any (possible) errors.
3 scenarios, 3 errors  :( I stopped thinking for more scenarios.
Code: [Select]
procedure TForm1.Button1Click(Sender: TObject);
var
  mem:TMemoryStream;
begin
  mem:=TMemoryStream.Create;
  try
    mem.WriteAnsiString('" 123 "," 456 "," 789 "'+LineEnding);
    mem.WriteAnsiString(' 123 , 456 , 789 '+LineEnding);
    mem.WriteAnsiString('"",,789'+LineEnding);
    mem.WriteAnsiString('" ", ,789'+LineEnding);
    mem.WriteAnsiString('"123","456","789"'+LineEnding);
    mem.WriteAnsiString('123,456,789'+LineEnding);
    mem.WriteAnsiString('123,456,789');
    mem.Position:=0;
    StringGrid1.LoadFromCSVStream(mem);
  finally
    mem.destroy;
  end;
end;
Result attached.
According rfc4180, line 1=2, line 3=4, line 5=6=7.

Edit: svn 37604.
Edit: precision: line 3 and 4 are not equal but they should result both in 3 fields. Since spaces are not visible in a stringgrid the visible output should be equal.
« Last Edit: June 10, 2012, 07:07:47 pm by ludob »

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #27 on: June 10, 2012, 10:28:47 pm »
Edit: svn 37604.
Edit: precision: line 3 and 4 are not equal but they should result both in 3 fields. Since spaces are not visible in a stringgrid the visible output should be equal.

Once you edit the fields, the spaces are there.
I don't really understand this remark.

I'll look into it.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5713
    • Bart en Mariska's Webstek
Re: Stringgrid vs. LoadFromCSVFile
« Reply #28 on: June 10, 2012, 11:02:49 pm »
Code: [Select]
procedure TForm1.Button1Click(Sender: TObject);
var
  mem:TMemoryStream;
begin
  mem:=TMemoryStream.Create;
  try
    mem.WriteAnsiString('" 123 "," 456 "," 789 "'+LineEnding);
    mem.WriteAnsiString(' 123 , 456 , 789 '+LineEnding);
    mem.WriteAnsiString('"",,789'+LineEnding);
    mem.WriteAnsiString('" ", ,789'+LineEnding);
    mem.WriteAnsiString('"123","456","789"'+LineEnding);
    mem.WriteAnsiString('123,456,789'+LineEnding);
    mem.WriteAnsiString('123,456,789');
    mem.Position:=0;
    StringGrid1.LoadFromCSVStream(mem);
  finally
    mem.destroy;
  end;
end;

This also writes other data (length of the string) to the stream, therefore the output is "weird".
Put the same data in a text file and the result will be OK.

Trye this one instead:
Code: [Select]
procedure TForm1.Button5Click(Sender: TObject);
var
  mem:TMemoryStream;
  S: String;
begin
  mem:=TMemoryStream.Create;
  try
    S := '" 123 "," 456 "," 789 "' + LineEnding +
         ' 123 , 456 , 789 ' + LineEnding +
         '"",,789' + LineEnding +
         '" ", ,789' + LineEnding +
         '"123","456","789"' + LineEnding +
         '123,456,789' + LineEnding +
         '123,456,789';
    mem.Write(S[1],Length(S));
    mem.Position := 0;
    StringGrid1.LoadFromCSVStream(mem,',',true);
  finally
    mem.destroy;
  end;
end;

Result: see attachment.

Bart
« Last Edit: June 10, 2012, 11:06:31 pm by Bart »

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Stringgrid vs. LoadFromCSVFile
« Reply #29 on: June 11, 2012, 08:36:34 am »
This also writes other data (length of the string) to the stream, therefore the output is "weird".
Put the same data in a text file and the result will be OK.
Oops. My bad.
I promise I'll stop assuming that a function WriteAnsiString simply writes an AnsiString and will read the documentation next time ;)

 

TinyPortal © 2005-2018