Recent

Author Topic: AnsiString to WideString conversion problem  (Read 22441 times)

pumuqui

  • New member
  • *
  • Posts: 7
AnsiString to WideString conversion problem
« on: November 17, 2010, 01:43:05 pm »
I have an AnsiString with just one character 'ó' (o with accent, 0xF3), which I assign to a WideString.

On Windows the WideString gets the value 0xF300, but on Linux 0x3F00, which is shown as a questión mark.

On both systems the locale is set to es_ES. On Linux (OpenSuse 10.2 64-bit) I have the line
Code: [Select]
RC_LANG="es_ES.ISO-8859-1"
in file /etc/sysconfig/language.
I also tried with "es_ES.UTF-8", but no luck.

The only idea to solve the problem is to manually convert the string to WideChar format, but I'm quite sure that I am missing something obvious.

If anyone could give me a hint...

Laksen

  • Hero Member
  • *****
  • Posts: 755
    • J-Software
Re: AnsiString to WideString conversion problem
« Reply #1 on: November 17, 2010, 02:52:31 pm »
Have you included the cwstring unit when you compile on Linux?

http://www.freepascal.org/docs-html/rtl/cwstring/index.html

ivan17

  • Full Member
  • ***
  • Posts: 173
Re: AnsiString to WideString conversion problem
« Reply #2 on: November 18, 2010, 12:04:27 am »
implementation

  uses LConvEncoding;   

procedure TForm1.Button2Click(Sender: TObject);
var s: AnsiString;
begin
  s := #$F3;
  Edit2.Text := CP1252ToUTF8(s);
end;


pumuqui

  • New member
  • *
  • Posts: 7
Re: AnsiString to WideString conversion problem
« Reply #3 on: November 18, 2010, 06:36:50 pm »
Thanks for the fast answers.

I included the cwstring unit in my program uses clause, without any change.

The only way to get it to work was with the following code (proposed by ivan17):
Code: [Select]
uses LConvEncoding, lclproc;

function StringToWideString(const str: String): WideString;
var strUTF8: String;
begin
  strUTF8 := CP1252ToUTF8(str);
  result := UTF8ToUTF16(strUTF8);
end;
I don't know, why the normal conversion (AnsiToUTF8) dosn't work, since the locale in Linux is set to es_ES.UTF-8 and I thought that AnsiToUTF8 takes the system locale into account when converting strings.
Code: [Select]
PSERVER:~ # locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
On the other side, a call to GetSystemEncoding returns 'ansi'.
Something's going wrong there. Perhaps it's a Linux adjustment (OpenSuse 11.2) I missed so far?
And another question: I found the function UTF8ToUTF16 in the Lazarus unit lclproc, isn't there a fpc unit offering the same functionality?

Thanks again.

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: AnsiString to WideString conversion problem
« Reply #4 on: November 18, 2010, 07:19:59 pm »
UTF-8 Linux does not know what your (actually the text files's) Ansi Encoding is.
It could be ISO-8859-1 but also cyrillic like KOI8-R etc.
So you have to tell it what to convert using CP1252ToUTF8 etc.

Instead of UTF8ToUTF16, you could also use UTF8Decode.

But one question: Lazarus uses UTF-8, not ANSI and not WideString (UTF-16/UCS-2).
Are you sure you need to convert from ANSI to WideString?
« Last Edit: November 18, 2010, 07:21:38 pm by theo »

Zoran

  • Hero Member
  • *****
  • Posts: 1831
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: AnsiString to WideString conversion problem
« Reply #5 on: November 18, 2010, 07:20:43 pm »
I don't know, why the normal conversion (AnsiToUTF8) dosn't work, since the locale in Linux is set to es_ES.UTF-8 and I thought that AnsiToUTF8 takes the system locale into account when converting strings.

Your system locale is "es_ES.UTF-8", that is obviously UTF8 encoding, not cp1252.

And here is what the fpc manual says about AnsiToUtf8:
Quote
The current codepage is fetched from the system, if internationalization support is enabled. It can be UTF-8, in which case the function simply returns S.

On the other hand your character was encoded in cp1252, so you needed CP1252ToUTF8 function. In windows AnsiToUTF8 worked for you, because cp1252 is your system encoding there.

On the other side, a call to GetSystemEncoding returns 'ansi'.
Where do you find this function?

And another question: I found the function UTF8ToUTF16 in the Lazarus unit lclproc, isn't there a fpc unit offering the same functionality?

I am also confused. There is UTF8Decode function, which, as far as I know, does just that.
« Last Edit: November 18, 2010, 07:22:14 pm by Zoran »

pumuqui

  • New member
  • *
  • Posts: 7
Re: AnsiString to WideString conversion problem
« Reply #6 on: November 19, 2010, 11:21:40 am »
OK, the thing turns out to be a little bit more complicated.
Quote
Your system locale is "es_ES.UTF-8", that is obviously UTF8 encoding, not cp1252.
Yes, you are right, but I wrote in my my first post that I also tested with the locale set to es_ES.ISO-8859-1, with the same result. For the tests I describe below I changed the locale again to ISO-8859-1.
Quote
Where do you find this function?
In the same unit as CP1252ToUTF8, LConvEncoding.

I made some more tests and found out something interesting.
The application where I'm getting those problems is a daemon-application, using package LazDaemon. It's a synchronization application, to transfer data between a database (from where the strings in ISO8859-1 codification come) to Windows-CE devices (where the strings are needed in UTF16 format).
I changed my main lpr file to include some debug output:
Code: [Select]
Program SmartPPCServer;

{$Define _UTFTEST_}

Uses
{$IFDEF UNIX}
{$IFDEF UseCThreads}
  CThreads,
{$ENDIF}
  cwstring,
{$ENDIF}
  LazDaemon,
  DaemonApp
  { add your units here }
  {$IfDef _UTFTEST_}
  , LConvEncoding, StrUtils, lclproc
  {$EndIf}
  , UnitSmartPPCDaemonMapper, UnitSmartPPCDaemon;

{$IFDEF WINDOWS}{$R SmartPPCServer.rc}{$ENDIF}

{$IfDef _UTFTEST_}
var
  strISO8859_1, strUTF8, hexStr: AnsiString;
  wStr: WideString;
{$EndIf}

begin
  {$IfDef _UTFTEST_}
  WriteLn('System encoding: ' + GetSystemEncoding);
  // ISO8859_1 string: 'áéíóúüñÁÉÍÓÚÜѺª'
  strISO8859_1 := #$E1#$E9#$ED#$F3#$FA#$FC#$F1#$C1#$C9#$CD#$D3#$DA#$DC#$D1#$BA#$AA;
  SetLength(hexStr, Length(strISO8859_1) * 2);
  BinToHex(PChar(strISO8859_1), PChar(hexStr), Length(strISO8859_1));
  WriteLn('Source: ' + strISO8859_1 + ' [' + hexStr + ']');
  strUTF8 := AnsiToUTF8(strISO8859_1);
  SetLength(hexStr, Length(strUTF8) * 2);
  BinToHex(PChar(strUTF8), PChar(hexStr), Length(strUTF8));
  WriteLn('AnsiToUTF8: [' + hexStr + ']');
  strUTF8 := CP1252ToUTF8(strISO8859_1);
  SetLength(hexStr, Length(strUTF8) * 2);
  BinToHex(PChar(strUTF8), PChar(hexStr), Length(strUTF8));
  WriteLn('CP1252ToUTF8: [' + hexStr + ']');
  wStr := UTF8ToUTF16(strUTF8);
  SetLength(hexStr, Length(wStr) * 4);
  BinToHex(PChar(PWideChar(wStr)), PChar(hexStr), Length(wStr) * 2);
  WriteLn('UTF8ToUTF16: [' + hexStr + ']');
  wStr := strISO8859_1;
  SetLength(hexStr, Length(wStr) * 4);
  BinToHex(PChar(PWideChar(wStr)), PChar(hexStr), Length(wStr) * 2);
  WriteLn('Assignment: [' + hexStr + ']');
  {$EndIf}

  Application.Title:='SmartPPCServer';
  Application.Initialize;
  Application.Run;
end.

When running this application from the command line, everything is as expected:
Code: [Select]
firebird@PSERVER:~/bin> ./SmartPPCServer -r -t
System encoding: iso88591
Source: áéíóúüñÁÉÍÓÚÜѺª [E1E9EDF3FAFCF1C1C9CDD3DADCD1BAAA]
AnsiToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
CP1252ToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
UTF8ToUTF16: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Assignment: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Terminado (killed)
Code: [Select]
PSERVER:/ibdata/bin # ps aux | grep SmartPPC
firebird  5611  0.1  0.4  54100  4740 pts/9    Sl+  14:56   0:00 ./SmartPPCServer -r -t

But when starting the daemon, something goes wrong:
Code: [Select]
PSERVER:/ibdata/bin # /etc/init.d/smartppcserver start
Starting smartppcserver System encoding: ansi
Source: áéíóúüñÁÉÍÓÚÜѺª [E1E9EDF3FAFCF1C1C9CDD3DADCD1BAAA]
AnsiToUTF8: [3F3F3F3F3F3F3F3F3F3F3F3F3F3F3F3F]
CP1252ToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
UTF8ToUTF16: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Assignment: [3F003F003F003F003F003F003F003F003F003F003F003F003F003F003F003F00]
                                                                                         done
Code: [Select]
PSERVER:/ibdata/bin # ps aux | grep SmartPPC
firebird  5859  0.1  0.4  51620  4608 pts/8    Sl   15:44   0:00 /ibdata/bin/SmartPPCServer -r -t

I don't see the difference, the process is running under the same user in both cases.
The daemon is started the standard OpenSuse way:
Code: [Select]
[...]
SMARTPPCSERVER_BIN=/ibdata/bin/SmartPPCServer
SMARTPPCSERVER_OPTIONS="-r -t"
[...]
        /sbin/startproc -u firebird -e $SMARTPPCSERVER_BIN $SMARTPPCSERVER_OPTIONS
[...]

jhvhs

  • New Member
  • *
  • Posts: 18
Re: AnsiString to WideString conversion problem
« Reply #7 on: November 20, 2010, 02:40:50 pm »
Hi.

I haven't got an installation of openSUSE, but it seems that the difference between the two executions is in the environment. Most of the environment is suppressed by the -e switch of startproc. Try ommitting it.

HTH

zeljko

  • Hero Member
  • *****
  • Posts: 1596
    • http://wiki.lazarus.freepascal.org/User:Zeljan
Re: AnsiString to WideString conversion problem
« Reply #8 on: November 20, 2010, 06:34:28 pm »
OpenSuse is pretty ugly with locales.
what says echo $LANG in console ?
Try to put writeln(toSomeLogFile,GetEnvironmentVariable('LANG')) and see if $LANG from console is different than LANG from daemon.
if it's so you must eg. export LANG=en_US.UTF-8 from your daemon start script.

pumuqui

  • New member
  • *
  • Posts: 7
Re: AnsiString to WideString conversion problem
« Reply #9 on: November 22, 2010, 08:54:10 am »
Seems that I found the culprit(s).

On one side it was the '-e' switch of startproc, but not alone.

The environment variables accessible for a program when run from the command line or as a daemon are as follows in OpenSuse 11.2.
Code: [Select]
PSERVER:/home/firebird # ./utftest
LANG: es_ES.ISO-8859-1
LC_ALL:
LC_MESSAGES:
System encoding: iso88591
PSERVER:/home/firebird # /etc/init.d/utftest start
Starting utftest
LANG: es_ES.ISO-8859-1
LC_ALL: POSIX
LC_MESSAGES:
System encoding: ansi
So, when started as a daemon, the LC_ALL variable is set to POSIX, and this is due to some lines in the script /etc/rc.status, which is sourced from the daemon startup script:
Code: [Select]
# Do _not_ be fooled by non POSIX locale
LC_ALL=POSIX
export LC_ALL
Well, the solution would be to set the LC_ALL variable again to an empty string before starting the program, on the other hand, for some reason the people of OpenSuse included those lines into their stardard startup procedure. Not sure what would be the best way to proceed...

 

TinyPortal © 2005-2018