I agree with taazz. If you are using Ansi text data somewhere, you probably have some issues with them.
First of all, when 'UTF-8 in RTL' is activated, the usual FPC/LCL conversion functions are -most probably- no more working: AnsiToUTF8 / UTF8ToAnsi and SysToUTF8 / UTF8ToSys.
According to the documentation (see
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#RTL_with_default_codepage_UTF-8), these functions must be replaced by: WinCPToUTF8 / UTF8ToWinCP.
So, add LazUTF8 in the 'uses' clause (if not already present), and replace all these old conversion functions. That doesn't seem a big deal, after all.
But it's not as simple ...
Hereafter, an illustration of what I'm beginning to call the
Ansi nightmare ...
1/ The test programPreliminary: this sample is intended to work only with FPC 3.0+.
Start a new Lazarus project with an edit box, a check box and a push button (I'm attaching such a sample project).
Here is the interesting part of the code:
uses
LazUTF8, Windows;
procedure TForm1.Button1Click(Sender: TObject);
var s: string;
var ws: widestring;
var UnicodeEnabledOS: boolean;
begin
UnicodeEnabledOS := CheckBox1.Checked;
//
s := UTF8ToWinCP(Edit1.Text);
if UnicodeEnabledOS then
begin
ws := widestring(s);
MessageBoxW(0,PWideChar('Hello '+ws+' !'),'Greetings W',0);
end
else
MessageBoxA(0,PChar('Hello '+s+' !'),'Greetings A',0);
end;
The code might seem a bit 'strange'; in a real code, it would be coded differently.
But for my demonstration purposes, I've been of course forced to carefully choose my instructions, their orders, ... Anyway, even if it's 'strange', it's certainly not incorrect.
2/ Tests with No 'UTF-8 in RTL'Everything is OK.
You can verify it when running the project, by introducing some non ASCII characters in the edit box (like "Ändern", for instance) and pressing the push button. Both for the Ansi and the Unicode cases, the message box text is OK.
3/ Tests with 'UTF-8 in RTL' activatedThis time, it's OK with the Ansi API (by chance, I'd wished to say), but not for the Unicode (i.e. wide) API. Though the recommended "UTF8ToWinCP" function has been used, and though it returns the correct data.
Ahhh, yes ! Currently, the LCL is
lying for (almost ?) all of its conversion functions. Concretely, it means that the code page returned for the function result is incorrect (as for UTF8ToConsole earlier in this topic).
It's not a a problem. It might be fixed in the future in the LCL, but nowadays we can still fix it by ourselves easily. Just add the following instruction:
...
s := UTF8ToWinCP(Edit1.Text);
SetCodepage(rawbytestring(s), GetACP, false); // <--- Add me
if UnicodeEnabledOS then
...
The Unicode case is now working properly: great, we've finally made it !
Ehhh, wait a minute... The Ansi case is no more working !
We are supposed to have just 'fixed' a potential issue. The data returned by UTF8ToWinCP function are OK; we have now set the correct code page for these data, and it's not working. Uhhh ?
Unfortunately, considering the compiler point of view I'm afraid it's quite "logical and correct", if you look closer at the source code. Of course, as I've written before, I've carefully chosen my sample ...
Just a clue for the explanation of this last problem. Make a test by just modifying the Ansi API call:
...
// MessageBoxA(0,PChar('Hello '+s+' !'),'Greetings A',0); // Modify
MessageBoxA(0,PChar(s),'Greetings A',0); // me
...
And yes, it's working correctly again (the Ansi case, I mean), and it's also "logical and correct".
4/ My conclusionsDo use only UTF8 text data when activating the 'UTF-8 in RTL' LCL option. This concerns by definition only the 1-byte string variables ("string", "ansistring", "utf8string", ...), of course.
or
Don't activate the 'UTF-8 in RTL' LCL option, if you plan to use ANSI text data, etc.Unless, you're perfectly familiar (this is not my case) with the Free Pascal code page support, how it's used in the LCL, and all the consequences of this using concerning the 1-byte strings.
.