Forum > Linux

AnsiString to WideString conversion problem

(1/2) > >>

pumuqui:
I have an AnsiString with just one character 'ó' (o with accent, 0xF3), which I assign to a WideString.

On Windows the WideString gets the value 0xF300, but on Linux 0x3F00, which is shown as a questión mark.

On both systems the locale is set to es_ES. On Linux (OpenSuse 10.2 64-bit) I have the line

--- Code: ---RC_LANG="es_ES.ISO-8859-1"

--- End code ---
in file /etc/sysconfig/language.
I also tried with "es_ES.UTF-8", but no luck.

The only idea to solve the problem is to manually convert the string to WideChar format, but I'm quite sure that I am missing something obvious.

If anyone could give me a hint...

Laksen:
Have you included the cwstring unit when you compile on Linux?

http://www.freepascal.org/docs-html/rtl/cwstring/index.html

ivan17:
implementation

  uses LConvEncoding;   

procedure TForm1.Button2Click(Sender: TObject);
var s: AnsiString;
begin
  s := #$F3;
  Edit2.Text := CP1252ToUTF8(s);
end;

pumuqui:
Thanks for the fast answers.

I included the cwstring unit in my program uses clause, without any change.

The only way to get it to work was with the following code (proposed by ivan17):

--- Code: ---uses LConvEncoding, lclproc;

function StringToWideString(const str: String): WideString;
var strUTF8: String;
begin
  strUTF8 := CP1252ToUTF8(str);
  result := UTF8ToUTF16(strUTF8);
end;

--- End code ---
I don't know, why the normal conversion (AnsiToUTF8) dosn't work, since the locale in Linux is set to es_ES.UTF-8 and I thought that AnsiToUTF8 takes the system locale into account when converting strings.

--- Code: ---PSERVER:~ # locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=

--- End code ---
On the other side, a call to GetSystemEncoding returns 'ansi'.
Something's going wrong there. Perhaps it's a Linux adjustment (OpenSuse 11.2) I missed so far?
And another question: I found the function UTF8ToUTF16 in the Lazarus unit lclproc, isn't there a fpc unit offering the same functionality?

Thanks again.

theo:
UTF-8 Linux does not know what your (actually the text files's) Ansi Encoding is.
It could be ISO-8859-1 but also cyrillic like KOI8-R etc.
So you have to tell it what to convert using CP1252ToUTF8 etc.

Instead of UTF8ToUTF16, you could also use UTF8Decode.

But one question: Lazarus uses UTF-8, not ANSI and not WideString (UTF-16/UCS-2).
Are you sure you need to convert from ANSI to WideString?

Navigation

[0] Message Index

[#] Next page

Go to full version