Recent

Author Topic: Codepage, convert Ansi to Utf8  (Read 21976 times)

Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Codepage, convert Ansi to Utf8
« on: July 28, 2016, 03:34:18 pm »
Maybe this related to Lazarus,

I am using FPC 3.0 and Last revesion of Lazarus from the repository.

I have ansi string come from database want to add it to memo, it convert it wrong string, i reproduce it like this code
Code: [Select]
var
  t: ansiString;
begin
      t := #$D2#$C7#$E5#$D1; //this my name zaher in arabic windows1256 codepage
      Memo1.Lines.Add(UTF8Encode(t)); //it is add wrong ���� or ???? without UTF8Encode

Windows8.1

How can i convert it?, using the codepage in windows or at least i set it by my self i tryied to set DefaultSystemCodePage but nothing good.

Thanks

parcel

  • Full Member
  • ***
  • Posts: 143
Re: Codepage, convert Ansi to Utf8
« Reply #1 on: July 28, 2016, 03:47:57 pm »
There is a "LConvEncoding" unit in lazarus. It can easily convert encodings.

Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #2 on: July 28, 2016, 03:50:55 pm »
It is work before as i tho, and i should not use any convert tools to do it, i have my own one for that.
It should work fine without converting when assigning ansistring to utf8string for example.

Phil

  • Hero Member
  • *****
  • Posts: 2737
Re: Codepage, convert Ansi to Utf8
« Reply #3 on: July 28, 2016, 03:51:49 pm »
CP1256ToUTF8 function in LConvEncoding unit is what you want.


Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #4 on: July 28, 2016, 04:44:51 pm »
I know that function, why not converted automatically? based on system codepage?

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #5 on: July 28, 2016, 05:50:39 pm »
How can i convert it?, using the codepage in windows or at least i set it by my self

Code: Pascal  [Select][+][-]
  1. var
  2.   t: ansistring; { I would use String }
  3. begin
  4.   t := #$D2#$C7#$E5#$D1; //this my name zaher in arabic windows1256 codepage
  5.   SetCodePage(RawByteString(t), 1256, False);
  6.   Memo1.Lines.Add(t);
  7. end;

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Codepage, convert Ansi to Utf8
« Reply #6 on: July 28, 2016, 05:53:06 pm »
Add {$CodePage UTF8} to your source and save it with UTF8-encoding.
Then just plain write your name in the source (in Arabic) and just do Memo1.Lines.Add(t);

Note that Utf8Encode expects a UnicodeString (or WideString) as input.

If you read strings from a database, you will have to convert them manually, since lazarus has no way of knowing what the codepage should be.

Bart

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #7 on: July 28, 2016, 06:04:18 pm »
I know that function, why not converted automatically? based on system codepage?

Variable t has codepage 0:
Code: Pascal  [Select][+][-]
  1.   Memo1.Lines.Add('CodePage: '+IntToStr(StringCodePage(t)));

Codepage 0 is DefaultSystemCodePage which, in Lazarus, is 65001 or UTF8:
Code: Pascal  [Select][+][-]
  1.   Memo1.Lines.Add('DefaultSystemCodePage: '+IntToStr(DefaultSystemCodePage));

Meaning no conversion is needed, while in fact variable t contains ANSI string.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: Codepage, convert Ansi to Utf8
« Reply #8 on: July 28, 2016, 09:03:04 pm »
It is work before as i tho, and i should not use any convert tools to do it, i have my own one for that.
It should work fine without converting when assigning ansistring to utf8string for example.
Zaher, the default codepage of AnsiString is now changed to UTF-8 when using Lazarus 1.6+. This change breaks some code, including yours. Your data in a DB uses Windows system codepage, thus it must be converted explicitly. You should convert as early as possible and then use UTF-8 everywhere.
A long term plan should be to get rid of the old Windows system codepages and use Unicode everywhere.
Details:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #9 on: July 28, 2016, 09:17:24 pm »
@engkin Yes SetCodePage works fine, now I need a way to make automatically, maybe get the codepage from the system (windows)

How can i convert it?, using the codepage in windows or at least i set it by my self

Code: Pascal  [Select][+][-]
  1. var
  2.   t: ansistring; { I would use String }
  3. begin
  4.   t := #$D2#$C7#$E5#$D1; //this my name zaher in arabic windows1256 codepage
  5.   SetCodePage(RawByteString(t), 1256, False);
  6.   Memo1.Lines.Add(t);
  7. end;

 @JuhaManninen That is true, I still have db uses old codepages

Thanks for other :)

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #10 on: July 28, 2016, 10:53:10 pm »
now I need a way to make automatically, maybe get the codepage from the system (windows)

Windows: GetACP
Cross platform: WideStringManager.GetStandardCodePageProc(scpAnsi)

Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #11 on: September 20, 2017, 04:03:50 pm »
sorry to Back to this old topic
This work fine for me, but why i need to set code page of my system ansi, why it not set in the system, i meant at least i dont need to call WideStringManager.GetStandardCodePageProc(scpAnsi) every time i load my string
Code: [Select]
var
  s: string;
  t: rawbytestring;

        SetCodePage(t, WideStringManager.GetStandardCodePageProc(scpAnsi), false);
        s := AnsiToUtf8(t);
or is there a cached of ansi system codepage? WideStringManager.GetStandardCodePageProc(scpAnsi) ?
DefaultSystemCodePage not work for me

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: Codepage, convert Ansi to Utf8
« Reply #12 on: September 20, 2017, 04:58:25 pm »
Zaher, could you possibly convert your DB to use UTF-8?
The Windows codepages can be seen as a historical remain, still supported for backwards compatibility only.
You cannot present different languages in one text when they use different codepages. The problem started to get out of hand already at 1990's when internet became more common. Textual data was passed around globally and often many characters were shown as "?".
Then came Unicode and solved all those problems!
Now it is already late 2017, IMO everybody should already use Unicode.

Could you convert all your data with a script at once? After that it would be smooth sailing.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Zaher

  • Hero Member
  • *****
  • Posts: 681
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #13 on: September 20, 2017, 05:08:47 pm »
No this time it is not DB, i sloved it by setting the client of DB to convert it.
It is Ansi csv file want to load it into utf8 (Lazarus)
I sloved it by caching      WideStringManager.GetStandardCodePageProc(scpAnsi) 
in SystemAnsiCodePage (my variable) then using it

SetCodePage(t, SystemAnsiCodePage, false);

I am afraid i am repeating something done in Lazaurs works like it.

this is my code
https://sourceforge.net/p/minilib/source/ci/master/tree/connection/src/mncCSV.pas#l362

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #14 on: September 20, 2017, 10:26:31 pm »
No this time it is not DB, i sloved it by setting the client of DB to convert it.
It is Ansi csv file want to load it into utf8 (Lazarus)
I sloved it by caching      WideStringManager.GetStandardCodePageProc(scpAnsi) 
in SystemAnsiCodePage (my variable) then using it

SetCodePage(t, SystemAnsiCodePage, false);

I am afraid i am repeating something done in Lazaurs works like it.

this is my code
https://sourceforge.net/p/minilib/source/ci/master/tree/connection/src/mncCSV.pas#l362

Using SystemAnsiCodePage := WideStringManager.GetStandardCodePageProc(scpAnsi) assumes the data in the CSV files encoded on the same system or another system with identical ANSI code page. Is that the intended usage?

If you save your CSV files in some ANSI code page, at least save which CP you used as well. I see you already have a boolean setting ANSIContents, but I don't see which CP. Maybe I missed it?

But again, as Juha said, convert all your data (CSV, Database, ... etc). Just curious, why do you still use ANSI?

 

TinyPortal © 2005-2018