Lazarus

Programming => LCL => Topic started by: SymbolicFrank on March 13, 2023, 02:41:49 pm

Title: TCSVExporter encoding
Post by: SymbolicFrank on March 13, 2023, 02:41:49 pm
If I export a DBF table (encoded in WIN-1252) to CSV, it ends up being a mix of UTF-8 and WIN-1252, which the database doesn't like. Is there an option to have a consistent output format?

Edit: the problem is in line-endings and such in strings. Can those be filtered out? It looks like the functions that do the actual work (in TCSVBuilder) aren't virtual.

From the documentation of the database:
"Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character."
Title: Re: TCSVExporter encoding
Post by: SymbolicFrank on March 13, 2023, 03:46:05 pm
I came up with this:

Code: Pascal  [Select][+][-]
  1. type
  2.   TSpecialFormatSettings = class(TCSVFormatSettings)
  3.   public
  4.     Property UseDisplayText;
  5.   end;
  6.  
  7. var
  8.   csv: TCSVExporter;
  9. begin
  10.   csv := TCSVExporter.Create(nil);
  11.  (csv.FormatSettings as TSpecialFormatSettings).UseDisplayText := False;
  12. end;

But the compiler doesn't like it. I think I have to rewrite this one:

Code: Pascal  [Select][+][-]
  1. TCustomDatasetExporter = Class(TComponent)
  2.   Function FormatField(F : TField) : UTF8String; virtual;
  3. end;

That's where all the action happens.

Title: Re: TCSVExporter encoding
Post by: wp on March 13, 2023, 07:35:38 pm
There is a little-known property in TStringField and a little-known event in TDataset which allow to execute code page conversion: TStringField.Transliterate and TDataset.OnTranslate. I am not sure whether they are supported by all database systems, but at least for dBase they are:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   i: Integer;
  4. begin
  5.   // Open dataset
  6.   Dbf1.Open;
  7.   // Here we prepare the dbf file for code page translation of the string fields:
  8.   // String fields must have Transliterate = true, and there must be an event
  9.   // handler for OnTranslate which actually does the translation.
  10.   for i:=0 to Dbf1.FieldCount-1 do
  11.     if Dbf1.Fields[i] is TStringField then
  12.       TStringField(Dbf1.Fields[i]).Transliterate := true;
  13.   Dbf1.OnTranslate := @Dbf1Translate;
  14.   // Redraw the GUI to redisplay the db-sensitive controls as UTF8.
  15.   Invalidate;
  16. end;
Code: Pascal  [Select][+][-]
  1. function TForm1.Dbf1Translate(Dbf: TDbf; Src, Dest: PChar; ToOem: Boolean): Integer;
  2. var
  3.   s, d: String;
  4.   cp: String;
  5. begin
  6.   // Determine the codepage used by the dbf file
  7.   cp := 'CP' + IntToStr(Dbf.DbfFile.UseCodePage);
  8.   // "ToOem" means: convert from UTF8 to the codepage of the dbf
  9.   if ToOem then
  10.     d := ConvertEncoding(Src, EncodingUTF8, cp)
  11.   else
  12.     // Otherwise: convert from the dbf codepage to UTF8
  13.     d := ConvertEncoding(Src, cp, EncodingUTF8);
  14.   StrCopy(Dest, PChar(d));
  15.   Result := StrLen(Dest);
  16. end;

Have a look at the attached demo.
Title: Re: TCSVExporter encoding
Post by: SymbolicFrank on March 15, 2023, 08:57:43 am
Thanks!
Title: Re: TCSVExporter encoding
Post by: SymbolicFrank on March 21, 2023, 12:35:05 pm
Another question: I'm copying data from one database to another (multiple formats). Binary data always causes an error, because it doesn't fit inside a CSV file. If that happens, I move the data with parameterized insert queries and set the blob-field to null.

Could that be done some way as well?
TinyPortal © 2005-2018