Is there a way to use CSVDocument to sequentially process a file line by line, rather than loading the entire document?
Well, there is a way to process a file field by field, not exactly line by line. To detect if the current field starts the new line you can use the attached code (see the end of this message).
However, note that CSV parser does not allow you to intercept field parsing, so unquoted delimiters will result in two fields, not one. The only way to handle them is to merge these two fields after the second field is parsed, that would require remembering the first field content and maintaining separate field indexes for current col and row of the CSVDocument you write the result to.
I need to read files that are incorrectly quoted, and may contain commas within the fields. Because of the known format, I can correct them, but have to do it before passing it to CSVDocument.
Then I believe you have at least two choices:
1) Use CSV parser and merge cells that were split by incorrect separator (as described above);
2) Use TStringList for per-line file correction, then load the result into TCSVDocument
Which way is easier depend on the incorrect field format (that you use for correction) and on the remaining document content. If correction can be done by a couple of StringReplace calls and other document fields are simple non quoted fields I would go for option (2). If the remaining document contains other properly quoted values, multi-line values I would try option (1).
The code:
var
FileStream: TFileStream;
Document: TCSVDocument;
Parser: TCSVParser;
PrevRow: Integer;
begin
PrevRow := -1;
FileStream := TFileStream.Create('filename', fmOpenRead);
Document := TCSVDocument.Create;
Parser := TCsvParser.Create;
// todo: set delimiter, quote char, etc.
Parser.IgnoreOuterWhitespace := False;
Parser.SetSource(FileStream);
while Parser.ParseNextCell do
begin
if Parser.CurrentRow > PrevRow then
begin
// handle new line
// current cell text is contained in Parser.CurrentCellText,
// current column index is Parser.CurrentCol
// you can write parsed fields to Document using Document.Cells[i, j];
end
PrevRow := Parser.CurrentRow;
end;
FreeAndNil(FileStream);
FreeAndNil(Parser);
// Document can be used here
FreeAndNil(Document);
end;