Recent

Author Topic: Codepage, convert Ansi to Utf8  (Read 21802 times)

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #15 on: September 20, 2017, 10:34:26 pm »
Quote
But again, as Juha said, convert all your data (CSV, Database, ... etc). Just curious, why do you still use ANSI?
Yes i did it in my projects, even old ones.

But, for CSV and even ansi Text files (exported from old project not mine), I am trying to take the same the behavior of Notepad++ by taking the codepage of system and use it to convert, it is works fine without define the codepage that already given by system.

Also Delphi have this behavior, when converting from Ansi to utf8 it take the CP form system, in Delphi no need to SetCodepage, (not sure about new releases i want to recheck again).

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #16 on: September 22, 2017, 01:41:03 pm »
Ah, I understand it now.

It works in Windows only, in Linux there is no default codepage for Ansi, so
Applications/Project should ask about the codepage before converting it, same as different between Excel and LibreOffice Calc.
LibreOffice Calc asks about codepage when converting.



JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Codepage, convert Ansi to Utf8
« Reply #17 on: September 22, 2017, 04:20:34 pm »
But, for CSV and even ansi Text files (exported from old project not mine), ...
You can convert also those exported files to Unicode, can't you?
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Codepage, convert Ansi to Utf8
« Reply #18 on: September 22, 2017, 04:27:04 pm »
Quote
But again, as Juha said, convert all your data (CSV, Database, ... etc). Just curious, why do you still use ANSI?
Yes i did it in my projects, even old ones.

But, for CSV and even ansi Text files (exported from old project not mine), I am trying to take the same the behavior of Notepad++ by taking the codepage of system and use it to convert, it is works fine without define the codepage that already given by system.
That's a recipe for disaster. Never mind Delphi. (That also fails)
E.g.: I have a lot of Russian (1251 and KOI) files in Ansi. Converting based on the default system codepage (in my case 1252) ? Just curious.... :D
Specialize a type, not a var.

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #19 on: September 22, 2017, 06:03:51 pm »
Yes i can convert it by external command,

But if we look at AnsiToUTF8 at least it should converts it from Ansi Latin 1252 by default not, convert it from utf8.

I think, still there is a problem in converting, i need to reproduce it in simple program.

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #20 on: September 22, 2017, 06:36:37 pm »
Hmm, yes there is a problem
AnsiString is CP_ACP, that mean it will take DefaultSystemCodePage for converting

but in Windows, DefaultSystemCodePage set to utf-8 65001 , for that AnsiToUTF8 not work with me it convert from utf8 to utf8
if i fixed it by calling windows api GetACP that will fix it
This code work fine,
Code: [Select]
const
  MyName: ansistring = #$D2#$C7#$E5#$D1;
begin
  DefaultSystemCodePage := GetACP;
  Edit1.Text := ansitoutf8(MyName);
end;

Now why DefaultSystemCodePage set to utf-8 65001 ?
in this file
\dev\fpc\3.0\src\rtl\win\syswin.inc: InitWin32Widestrings procedure it call GetACP

Some where DefaultSystemCodePage override set to utf8 in Windows

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Codepage, convert Ansi to Utf8
« Reply #21 on: September 22, 2017, 07:08:54 pm »
Zaher, apparently you have not read this:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
The current Unicode support in Lazarus breaks code that depends on Windows codepages.
One more reason to use Unicode always ...
There is also a section about dealing with Windows codepages:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage

I remember you participated in some Unicode discussion but I may be wrong. It was some other topic.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #22 on: September 22, 2017, 07:10:20 pm »
Now why DefaultSystemCodePage set to utf-8 65001 ?

That's Lazarus, UTF8 everywhere.

There is an error in your example, you assigned your name to a variable without setting the correct code page. Use SetCodePage. Without setting the correct code page it will not give the correct text on a Russian system, for instance.

#$D2#$C7#$E5#$D1 has no meaning without the correct code page.

Edit:
ansitoutf8 is *not* needed when you set the correct code page. The compiler already check and covert texts.
« Last Edit: September 22, 2017, 07:16:53 pm by engkin »

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #23 on: September 22, 2017, 07:26:22 pm »
to understand DefaultSystemCodePage let me ask.

What the initial value of DefaultSystemCodePage in Windows?

if it
DefaultSystemCodePage:=GetACP();

DefaultSystemCodePage should be 1256 in my system but it is not, it is utf8.


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #24 on: September 22, 2017, 07:34:29 pm »
to understand DefaultSystemCodePage let me ask.

What the initial value of DefaultSystemCodePage in Windows?

if it
DefaultSystemCodePage:=GetACP();

DefaultSystemCodePage should be 1256 in my system but it is not, it is utf8.
It is 1256 if you don't use LCL. IIRC, LazUTF8 changes it to UTF8. Try a simple console app.

But again, that is an error. Depending on system code page to convert an Ansi string does not make sense. Might work on your system, but will not work on a different system.
« Last Edit: September 22, 2017, 07:38:53 pm by engkin »

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #25 on: September 22, 2017, 07:40:16 pm »
to understand DefaultSystemCodePage let me ask.

What the initial value of DefaultSystemCodePage in Windows?

if it
DefaultSystemCodePage:=GetACP();

DefaultSystemCodePage should be 1256 in my system but it is not, it is utf8.
It is 1256 if you don't use LCL. IIRC, LazUTF8 changes it to UTF8.

Try a simple console app.

That is it, it changed to Utf8, because when I changed it back to GetACP (1256) AnsiToUtf8 works fine, also my lazarus project.

can i ask why changed to utf8 in LCL?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: Codepage, convert Ansi to Utf8
« Reply #26 on: September 22, 2017, 08:44:54 pm »
can i ask why changed to utf8 in LCL?
Because the UTF16 solution in RTL and other FPC libs is not ready. Changing the default encoding of AnsiString enabled an easy but robust Unicode system. It is also amazingly compatible with Delphi at source code level ... unless you want to use Windows codepages.

Actually converting from a Windows codepage is easy. A SetCodePage() should be enough as explained here:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
However Windows codepages have inherent problems (not related to Lazarus) which Unicode has solved for well over 20 years.
I still recommend you use Unicode, or ask your customers to use it.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Codepage, convert Ansi to Utf8
« Reply #27 on: September 22, 2017, 08:58:42 pm »
Quote
why changed to utf8 in LCL?

Procedures that used to work with [default system] ANSI code page strings had limited character set. For instance, on your system that meant Arabic and ASCII, you would not be able to use Greek or Hebrew with your default encoding.

On Windows, these same procedures used to work with ANSI system functions. Like MessageBoxA. To move to unicode and use MessageBoxW a unicode encoding is needed. UTF8 is one of them and seems to be the default encoding on most Linux based systems.

The second paragraph titled "RTL with default codepage UTF-8" in the link provided to you by Juha answers your question as well.

Zaher

  • Hero Member
  • *****
  • Posts: 679
    • parmaja.org
Re: Codepage, convert Ansi to Utf8
« Reply #28 on: September 22, 2017, 09:56:08 pm »
Ok, I got it, I will declare a global variable, that I mentioned above "SystemAnsiCodePage" to use it in my library before converting to utf8.

I still recommend you use Unicode, or ask your customers to use it.
Yes, all my new projects is unicode, but still want access csv files with old project I dont own it, small job to feed my kids :P


Thank you all

 

TinyPortal © 2005-2018