Recent

Author Topic: CSVDocument 0.4 released  (Read 13098 times)

vvzh

  • Jr. Member
  • **
  • Posts: 58
CSVDocument 0.4 released
« on: May 11, 2011, 08:26:27 am »
CSVDocument is a library for processing CSV (comma-separated values) files.

New in this release:
* TCSVBuilder class replaced QuoteCSVString function.
* Support for RFC 4180 specification.
* Default settings are now RFC 4180 compliant.
* Performance improvements.
* Lazarus package.
Detailed list of changes

Thanks to LuizAmérico and mattias for their contributions to new release.

Ask

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 687
Re: CSVDocument 0.4 released
« Reply #1 on: May 11, 2011, 11:40:01 am »
Maybe it is a good idea to add TDataset-compatible version of TCSVDocument.
This way, CSV data will be easily accessible, for example, from TDBGrid or TChart.

vvzh

  • Jr. Member
  • **
  • Posts: 58
Re: CSVDocument 0.4 released
« Reply #2 on: May 11, 2011, 09:57:39 pm »
As far as I know, there is already TSDFDataset that serves this purpose. I do not use TDataset-like components in my current projects though, so I know almost nothing about TSDFDataset features. The only feature I remember is that TSDFDataset requires you to know maximum field size in advance. That was a drawback for me, and that is why I created TCSVDocument. But this limitation also gives significant speed / memory usage improvement and to know maximum field size beforehand is quite typical for relational databases - what TDataset is supposed to work with. So I believe TSDFDataset can be more suitable for this kind of task.

Of course I may be wrong. In that case I would need help/patches from someone who actually uses TDataset-like classes.

VTwin

  • Hero Member
  • *****
  • Posts: 1227
  • Former Turbo Pascal 3 user
Re: CSVDocument 0.4 released
« Reply #3 on: May 21, 2011, 08:51:53 pm »
vvzh,

Thanks for your work on this, it is very useful. I have been using 0.3 and will try out 0.4.

Is there a way to use CSVDocument to sequentially process a file line by line, rather than loading the entire document?

I need to read files that are incorrectly quoted, and may contain commas within the fields. Because of the known format, I can correct them, but have to do it before passing it to CSVDocument.

Thanks,
Frederick
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 15.3.2: Lazarus 3.8 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 3.8 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 3.8 (64 bit on VBox)

vvzh

  • Jr. Member
  • **
  • Posts: 58
Re: CSVDocument 0.4 released
« Reply #4 on: May 22, 2011, 09:39:46 am »
Is there a way to use CSVDocument to sequentially process a file line by line, rather than loading the entire document?
Well, there is a way to process a file field by field, not exactly line by line. To detect if the current field starts the new line you can use the attached code (see the end of this message).
However, note that CSV parser does not allow you to intercept field parsing, so unquoted delimiters will result in two fields, not one. The only way to handle them is to merge these two fields after the second field is parsed, that would require remembering the first field content and maintaining separate field indexes for current col and row of the CSVDocument you write the result to.

I need to read files that are incorrectly quoted, and may contain commas within the fields. Because of the known format, I can correct them, but have to do it before passing it to CSVDocument.
Then I believe you have at least two choices:
1) Use CSV parser and merge cells that were split by incorrect separator (as described above);
2) Use TStringList for per-line file correction, then load the result into TCSVDocument
Which way is easier depend on the incorrect field format (that you use for correction) and on the remaining document content. If correction can be done by a couple of StringReplace calls and other document fields are simple non quoted fields I would go for option (2). If the remaining document contains other properly quoted values, multi-line values I would try option (1).

The code:
Code: [Select]
var
  FileStream: TFileStream;
  Document: TCSVDocument;
  Parser: TCSVParser;
  PrevRow: Integer;
begin
  PrevRow := -1;
  FileStream := TFileStream.Create('filename', fmOpenRead);
  Document := TCSVDocument.Create;
  Parser := TCsvParser.Create;
  // todo: set delimiter, quote char, etc.
  Parser.IgnoreOuterWhitespace := False;
  Parser.SetSource(FileStream);
  while Parser.ParseNextCell do
  begin    
    if Parser.CurrentRow > PrevRow then
    begin
      // handle new line
      // current cell text is contained in Parser.CurrentCellText,
      // current column index is Parser.CurrentCol
      // you can write parsed fields to Document using Document.Cells[i, j];
    end
    PrevRow := Parser.CurrentRow;
  end;  
  FreeAndNil(FileStream);
  FreeAndNil(Parser);
  // Document can be used here
  FreeAndNil(Document);
end;
« Last Edit: May 22, 2011, 09:42:41 am by vvzh »

VTwin

  • Hero Member
  • *****
  • Posts: 1227
  • Former Turbo Pascal 3 user
Re: CSVDocument 0.4 released
« Reply #5 on: May 23, 2011, 01:06:37 am »
vvzh,

That is very helpful. I think the second approach makes sense in this case. Many thanks, I appreciate the assistance.

Cheers,
Frederick
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 15.3.2: Lazarus 3.8 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 3.8 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 3.8 (64 bit on VBox)

Shebuka

  • Sr. Member
  • ****
  • Posts: 429
Re: CSVDocument 0.4 released
« Reply #6 on: June 07, 2011, 11:34:19 am »
Hi, can i read with your component a csv file that is on a remote server? (something like http://my.server.com/addresses.csv)

vvzh

  • Jr. Member
  • **
  • Posts: 58
Re: CSVDocument 0.4 released
« Reply #7 on: June 07, 2011, 01:54:33 pm »
No, you will need to download it first to file or TMemoryStream. To do so, you can use Ararat Synapse and its HttpGetBinary function:

Code: [Select]
uses ..., HttpSend;
...
var
  Document: TCSVDocument;
  Buffer: TMemoryStream;
begin 
  Document := TCSVDocument.Create;
  Document.Delimiter := ';';
  try
    Buffer := TMemoryStream.Create;
    try
      if not HttpGetBinary('http://my.server.com/addresses.csv', Buffer) then
        raise Exception.Create('Cannot load document from remote server');
      Buffer.Position := 0;   
      Document.LoadFromStream(Buffer);
    finally
      FreeAndNil(Buffer);
    end;
    ProcessCSVDocument(Document);
  finally
    FreeAndNil(Document);
  end;
end;

The code snippet above is somewhat dirty and completely untested, so you will probably have to adjust it to your needs.

Shebuka

  • Sr. Member
  • ****
  • Posts: 429
Re: CSVDocument 0.4 released
« Reply #8 on: June 07, 2011, 04:40:59 pm »
Thank you for code, i'll try it ;)

But is there a way to use lNet instead? I'm just using lNet in my app to connect to my Server Side App in TCP and maybe there can be conflicts with Synapse?

vvzh

  • Jr. Member
  • **
  • Posts: 58
Re: CSVDocument 0.4 released
« Reply #9 on: June 07, 2011, 06:10:02 pm »
I have not used lNet in my projects, so I do not have lNet-based solution and cannot say for sure if there are any conflicts.

From a quick look at lNet documentation it seems to be more low-level compared to Synapse. Though it should be possible to implement similar logic on top of lNet it would require more code.

I doubt lNet would conflict with Synapse, but it is a good idea to check it first.

 

TinyPortal © 2005-2018