Forum > Linux
AnsiString to WideString conversion problem
pumuqui:
I have an AnsiString with just one character 'ó' (o with accent, 0xF3), which I assign to a WideString.
On Windows the WideString gets the value 0xF300, but on Linux 0x3F00, which is shown as a questión mark.
On both systems the locale is set to es_ES. On Linux (OpenSuse 10.2 64-bit) I have the line
--- Code: ---RC_LANG="es_ES.ISO-8859-1"
--- End code ---
in file /etc/sysconfig/language.
I also tried with "es_ES.UTF-8", but no luck.
The only idea to solve the problem is to manually convert the string to WideChar format, but I'm quite sure that I am missing something obvious.
If anyone could give me a hint...
Laksen:
Have you included the cwstring unit when you compile on Linux?
http://www.freepascal.org/docs-html/rtl/cwstring/index.html
ivan17:
implementation
uses LConvEncoding;
procedure TForm1.Button2Click(Sender: TObject);
var s: AnsiString;
begin
s := #$F3;
Edit2.Text := CP1252ToUTF8(s);
end;
pumuqui:
Thanks for the fast answers.
I included the cwstring unit in my program uses clause, without any change.
The only way to get it to work was with the following code (proposed by ivan17):
--- Code: ---uses LConvEncoding, lclproc;
function StringToWideString(const str: String): WideString;
var strUTF8: String;
begin
strUTF8 := CP1252ToUTF8(str);
result := UTF8ToUTF16(strUTF8);
end;
--- End code ---
I don't know, why the normal conversion (AnsiToUTF8) dosn't work, since the locale in Linux is set to es_ES.UTF-8 and I thought that AnsiToUTF8 takes the system locale into account when converting strings.
--- Code: ---PSERVER:~ # locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
--- End code ---
On the other side, a call to GetSystemEncoding returns 'ansi'.
Something's going wrong there. Perhaps it's a Linux adjustment (OpenSuse 11.2) I missed so far?
And another question: I found the function UTF8ToUTF16 in the Lazarus unit lclproc, isn't there a fpc unit offering the same functionality?
Thanks again.
theo:
UTF-8 Linux does not know what your (actually the text files's) Ansi Encoding is.
It could be ISO-8859-1 but also cyrillic like KOI8-R etc.
So you have to tell it what to convert using CP1252ToUTF8 etc.
Instead of UTF8ToUTF16, you could also use UTF8Decode.
But one question: Lazarus uses UTF-8, not ANSI and not WideString (UTF-16/UCS-2).
Are you sure you need to convert from ANSI to WideString?
Navigation
[0] Message Index
[#] Next page