Forum > LCL

TCSVExporter encoding

(1/1)

SymbolicFrank:
If I export a DBF table (encoded in WIN-1252) to CSV, it ends up being a mix of UTF-8 and WIN-1252, which the database doesn't like. Is there an option to have a consistent output format?

Edit: the problem is in line-endings and such in strings. Can those be filtered out? It looks like the functions that do the actual work (in TCSVBuilder) aren't virtual.

From the documentation of the database:
"Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character."

SymbolicFrank:
I came up with this:


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---type  TSpecialFormatSettings = class(TCSVFormatSettings)  public    Property UseDisplayText;  end; var  csv: TCSVExporter;begin  csv := TCSVExporter.Create(nil); (csv.FormatSettings as TSpecialFormatSettings).UseDisplayText := False;end;
But the compiler doesn't like it. I think I have to rewrite this one:


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---TCustomDatasetExporter = Class(TComponent)  Function FormatField(F : TField) : UTF8String; virtual;end;
That's where all the action happens.

wp:
There is a little-known property in TStringField and a little-known event in TDataset which allow to execute code page conversion: TStringField.Transliterate and TDataset.OnTranslate. I am not sure whether they are supported by all database systems, but at least for dBase they are:

* After opening the dataset, iterate over the fields, identify stringfields and set their Transliterate to true.
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure TForm1.Button2Click(Sender: TObject);var  i: Integer;begin  // Open dataset  Dbf1.Open;  // Here we prepare the dbf file for code page translation of the string fields:  // String fields must have Transliterate = true, and there must be an event  // handler for OnTranslate which actually does the translation.  for i:=0 to Dbf1.FieldCount-1 do    if Dbf1.Fields[i] is TStringField then      TStringField(Dbf1.Fields[i]).Transliterate := true;  Dbf1.OnTranslate := @Dbf1Translate;  // Redraw the GUI to redisplay the db-sensitive controls as UTF8.  Invalidate;end;
* Assign a handler to the TDataset event OnTranslate to perform the code page conversion. It is called with different value of the "ToOEM" parameter when string field values are read and written. Do the conversion by means the routines in unit LConvEncoding
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function TForm1.Dbf1Translate(Dbf: TDbf; Src, Dest: PChar; ToOem: Boolean): Integer;var  s, d: String;  cp: String;begin  // Determine the codepage used by the dbf file  cp := 'CP' + IntToStr(Dbf.DbfFile.UseCodePage);  // "ToOem" means: convert from UTF8 to the codepage of the dbf  if ToOem then    d := ConvertEncoding(Src, EncodingUTF8, cp)  else    // Otherwise: convert from the dbf codepage to UTF8    d := ConvertEncoding(Src, cp, EncodingUTF8);  StrCopy(Dest, PChar(d));  Result := StrLen(Dest);end;
Have a look at the attached demo.

* Button1 ("Open 'normally'") opens the dbase file without the transliteration - and shows the fields in the DBGrid incorrectly; and an export by means of the TCSVExporter has "incorrect" characters in the resulting csv file (when you open it in Notepad++, however, you can set its code page to OEM850 and get the German umlauts correctly). Probably the situation that you are complaining about.
* Button2 ("Open & transliterate") activates the transliteration. Now the fields are displayed correctly in the DBGrid, and the CSV export is correct as well.

SymbolicFrank:
Thanks!

SymbolicFrank:
Another question: I'm copying data from one database to another (multiple formats). Binary data always causes an error, because it doesn't fit inside a CSV file. If that happens, I move the data with parameterized insert queries and set the blob-field to null.

Could that be done some way as well?

Navigation

[0] Message Index

Go to full version