Recent

Author Topic: TDbf and Encoding  (Read 6154 times)

bigeno

  • Sr. Member
  • ****
  • Posts: 266
TDbf and Encoding
« on: November 22, 2013, 09:44:03 am »
I fight for a long time with encoding in dbf, please help. I can't get correct Polish characters.

I use TableLevel 4. I try LanguageID as $23,  $C8, $00 (null)
strings : MyDbf.Fields.AsString:=   as UTF8, UTF8ToCP1250, UTF8ToCP852, UTF8ToANSI

I checked all combinations and nothing, (Windows 8 PL), and open files in ESRI (arsgis) or OpenOffice.

The only progress is LanguageID:=$23 and strings as UTF8ToANSI, then in OpenOffice with opened as UTF-8 I see correct chars in data, but not in FiledsDef.

clauslack

  • Sr. Member
  • ****
  • Posts: 275
Re: TDbf and Encoding
« Reply #1 on: November 22, 2013, 01:06:28 pm »
What charset use your dbf files?

Regards

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: TDbf and Encoding
« Reply #2 on: November 22, 2013, 01:34:47 pm »
UTF-8 or any other unicode format isn't supported in TDBF (though you might be able to do something with raw bytes or something... I vaguely remember there is an encoding none or something).

What FPC version are you using? You might try trunk/2.7.1 as it has some fixes (you could use e.g. fpcup to do a parallel independent install).

I suppose running
Code: [Select]
chcp on the command line gives
Quote
Active code page: 852
for you? Could you check?

Language ID $23 would then make sense, yes, and if your input text is in utf8, I suppose UTF8ToCP852 or UTF8ToANSI should work...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

bigeno

  • Sr. Member
  • ****
  • Posts: 266
Re: TDbf and Encoding
« Reply #3 on: November 22, 2013, 05:33:08 pm »
You saved me, Big thanks BigChimp.
My active code page is 852
I use fpc 2.6 fixes

Now, when I set lang to $23 and use UTF8ToCP852 then I see correct chars in Excel (just open), OpenOffice (as DOS 852), ArcGis (just open).

But not in FieldDefs, maybe I put strings in wrong way ?
for FieldDefs I use
Code: [Select]
MyDbf.FieldDefs.Add(UTF8ToCP852(s),ftString,40,false)for data I use
Code: [Select]
MyDbf.Fields[i].AsString:=UTF8ToCP852(s)When I put same polish chars in Field names and data and open in hexeditor then the hex values are different.

and other question, when I use UTF8ToAnsi then I don't get correct results. Why ? If my codepage is 852 then it should be the same ?!

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: TDbf and Encoding
« Reply #4 on: November 22, 2013, 06:23:12 pm »
You have to distinguish between fieldnames and field data.
Field names are normal Pascal identifiers, and so restricted to Ansi characters 'A'..'Z', 'a'..'z' '_' and (provided it is not the first character in the name) '0'..'9'. You can have a fieldname 'girl' or 'fille', or 'dziewczyna', but not a fieldname 'Mädchen'. So your problem with fieldnames may not be one of encoding but of illegal characters in the code (characters which would be perfectly OK if they were data).
Encoding is an issue then mainly for data, not for names used in Pascal programs, which are completely inflexible in this regard (at least as far as FPC is concerned, I do not know if there are other Pascal compilers that allow non-ansi characters in identifiers).
The Lazarus editor may encode Pascal identifiers and the rest of your program code as UTF8. However, this does not matter, since the lower UTF8 character range is identical to the earlier ANSI encoding.
« Last Edit: November 22, 2013, 06:26:08 pm by howardpc »

bigeno

  • Sr. Member
  • ****
  • Posts: 266
Re: TDbf and Encoding
« Reply #5 on: November 22, 2013, 07:01:32 pm »
Ok, that is not big problem, I can replace national characters with ASCII equivalent for field names.
And I think I understand the difference between UTF8ToAnsi UTF8To852, 852 is for console but Windows GUI is 1250.

 

TinyPortal © 2005-2018