Recent

Author Topic: TCSVExporter encoding  (Read 810 times)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
TCSVExporter encoding
« on: March 13, 2023, 02:41:49 pm »
If I export a DBF table (encoded in WIN-1252) to CSV, it ends up being a mix of UTF-8 and WIN-1252, which the database doesn't like. Is there an option to have a consistent output format?

Edit: the problem is in line-endings and such in strings. Can those be filtered out? It looks like the functions that do the actual work (in TCSVBuilder) aren't virtual.

From the documentation of the database:
"Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character."
« Last Edit: March 13, 2023, 03:14:43 pm by SymbolicFrank »

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: TCSVExporter encoding
« Reply #1 on: March 13, 2023, 03:46:05 pm »
I came up with this:

Code: Pascal  [Select][+][-]
  1. type
  2.   TSpecialFormatSettings = class(TCSVFormatSettings)
  3.   public
  4.     Property UseDisplayText;
  5.   end;
  6.  
  7. var
  8.   csv: TCSVExporter;
  9. begin
  10.   csv := TCSVExporter.Create(nil);
  11.  (csv.FormatSettings as TSpecialFormatSettings).UseDisplayText := False;
  12. end;

But the compiler doesn't like it. I think I have to rewrite this one:

Code: Pascal  [Select][+][-]
  1. TCustomDatasetExporter = Class(TComponent)
  2.   Function FormatField(F : TField) : UTF8String; virtual;
  3. end;

That's where all the action happens.


wp

  • Hero Member
  • *****
  • Posts: 11916
Re: TCSVExporter encoding
« Reply #2 on: March 13, 2023, 07:35:38 pm »
There is a little-known property in TStringField and a little-known event in TDataset which allow to execute code page conversion: TStringField.Transliterate and TDataset.OnTranslate. I am not sure whether they are supported by all database systems, but at least for dBase they are:
  • After opening the dataset, iterate over the fields, identify stringfields and set their Transliterate to true.
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   i: Integer;
  4. begin
  5.   // Open dataset
  6.   Dbf1.Open;
  7.   // Here we prepare the dbf file for code page translation of the string fields:
  8.   // String fields must have Transliterate = true, and there must be an event
  9.   // handler for OnTranslate which actually does the translation.
  10.   for i:=0 to Dbf1.FieldCount-1 do
  11.     if Dbf1.Fields[i] is TStringField then
  12.       TStringField(Dbf1.Fields[i]).Transliterate := true;
  13.   Dbf1.OnTranslate := @Dbf1Translate;
  14.   // Redraw the GUI to redisplay the db-sensitive controls as UTF8.
  15.   Invalidate;
  16. end;
  • Assign a handler to the TDataset event OnTranslate to perform the code page conversion. It is called with different value of the "ToOEM" parameter when string field values are read and written. Do the conversion by means the routines in unit LConvEncoding
Code: Pascal  [Select][+][-]
  1. function TForm1.Dbf1Translate(Dbf: TDbf; Src, Dest: PChar; ToOem: Boolean): Integer;
  2. var
  3.   s, d: String;
  4.   cp: String;
  5. begin
  6.   // Determine the codepage used by the dbf file
  7.   cp := 'CP' + IntToStr(Dbf.DbfFile.UseCodePage);
  8.   // "ToOem" means: convert from UTF8 to the codepage of the dbf
  9.   if ToOem then
  10.     d := ConvertEncoding(Src, EncodingUTF8, cp)
  11.   else
  12.     // Otherwise: convert from the dbf codepage to UTF8
  13.     d := ConvertEncoding(Src, cp, EncodingUTF8);
  14.   StrCopy(Dest, PChar(d));
  15.   Result := StrLen(Dest);
  16. end;

Have a look at the attached demo.
  • Button1 ("Open 'normally'") opens the dbase file without the transliteration - and shows the fields in the DBGrid incorrectly; and an export by means of the TCSVExporter has "incorrect" characters in the resulting csv file (when you open it in Notepad++, however, you can set its code page to OEM850 and get the German umlauts correctly). Probably the situation that you are complaining about.
  • Button2 ("Open & transliterate") activates the transliteration. Now the fields are displayed correctly in the DBGrid, and the CSV export is correct as well.
« Last Edit: March 13, 2023, 07:41:10 pm by wp »

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: TCSVExporter encoding
« Reply #3 on: March 15, 2023, 08:57:43 am »
Thanks!

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: TCSVExporter encoding
« Reply #4 on: March 21, 2023, 12:35:05 pm »
Another question: I'm copying data from one database to another (multiple formats). Binary data always causes an error, because it doesn't fit inside a CSV file. If that happens, I move the data with parameterized insert queries and set the blob-field to null.

Could that be done some way as well?

 

TinyPortal © 2005-2018