Forum > Linux
AnsiString to WideString conversion problem
Zoran:
--- Quote from: pumuqui on November 18, 2010, 06:36:50 pm ---I don't know, why the normal conversion (AnsiToUTF8) dosn't work, since the locale in Linux is set to es_ES.UTF-8 and I thought that AnsiToUTF8 takes the system locale into account when converting strings.
--- End quote ---
Your system locale is "es_ES.UTF-8", that is obviously UTF8 encoding, not cp1252.
And here is what the fpc manual says about AnsiToUtf8:
--- Quote ---The current codepage is fetched from the system, if internationalization support is enabled. It can be UTF-8, in which case the function simply returns S.
--- End quote ---
On the other hand your character was encoded in cp1252, so you needed CP1252ToUTF8 function. In windows AnsiToUTF8 worked for you, because cp1252 is your system encoding there.
--- Quote from: pumuqui on November 18, 2010, 06:36:50 pm ---On the other side, a call to GetSystemEncoding returns 'ansi'.
--- End quote ---
Where do you find this function?
--- Quote from: pumuqui on November 18, 2010, 06:36:50 pm ---And another question: I found the function UTF8ToUTF16 in the Lazarus unit lclproc, isn't there a fpc unit offering the same functionality?
--- End quote ---
I am also confused. There is UTF8Decode function, which, as far as I know, does just that.
pumuqui:
OK, the thing turns out to be a little bit more complicated.
--- Quote ---Your system locale is "es_ES.UTF-8", that is obviously UTF8 encoding, not cp1252.
--- End quote ---
Yes, you are right, but I wrote in my my first post that I also tested with the locale set to es_ES.ISO-8859-1, with the same result. For the tests I describe below I changed the locale again to ISO-8859-1.
--- Quote ---Where do you find this function?
--- End quote ---
In the same unit as CP1252ToUTF8, LConvEncoding.
I made some more tests and found out something interesting.
The application where I'm getting those problems is a daemon-application, using package LazDaemon. It's a synchronization application, to transfer data between a database (from where the strings in ISO8859-1 codification come) to Windows-CE devices (where the strings are needed in UTF16 format).
I changed my main lpr file to include some debug output:
--- Code: ---Program SmartPPCServer;
{$Define _UTFTEST_}
Uses
{$IFDEF UNIX}
{$IFDEF UseCThreads}
CThreads,
{$ENDIF}
cwstring,
{$ENDIF}
LazDaemon,
DaemonApp
{ add your units here }
{$IfDef _UTFTEST_}
, LConvEncoding, StrUtils, lclproc
{$EndIf}
, UnitSmartPPCDaemonMapper, UnitSmartPPCDaemon;
{$IFDEF WINDOWS}{$R SmartPPCServer.rc}{$ENDIF}
{$IfDef _UTFTEST_}
var
strISO8859_1, strUTF8, hexStr: AnsiString;
wStr: WideString;
{$EndIf}
begin
{$IfDef _UTFTEST_}
WriteLn('System encoding: ' + GetSystemEncoding);
// ISO8859_1 string: 'áéíóúüñÁÉÍÓÚÜѺª'
strISO8859_1 := #$E1#$E9#$ED#$F3#$FA#$FC#$F1#$C1#$C9#$CD#$D3#$DA#$DC#$D1#$BA#$AA;
SetLength(hexStr, Length(strISO8859_1) * 2);
BinToHex(PChar(strISO8859_1), PChar(hexStr), Length(strISO8859_1));
WriteLn('Source: ' + strISO8859_1 + ' [' + hexStr + ']');
strUTF8 := AnsiToUTF8(strISO8859_1);
SetLength(hexStr, Length(strUTF8) * 2);
BinToHex(PChar(strUTF8), PChar(hexStr), Length(strUTF8));
WriteLn('AnsiToUTF8: [' + hexStr + ']');
strUTF8 := CP1252ToUTF8(strISO8859_1);
SetLength(hexStr, Length(strUTF8) * 2);
BinToHex(PChar(strUTF8), PChar(hexStr), Length(strUTF8));
WriteLn('CP1252ToUTF8: [' + hexStr + ']');
wStr := UTF8ToUTF16(strUTF8);
SetLength(hexStr, Length(wStr) * 4);
BinToHex(PChar(PWideChar(wStr)), PChar(hexStr), Length(wStr) * 2);
WriteLn('UTF8ToUTF16: [' + hexStr + ']');
wStr := strISO8859_1;
SetLength(hexStr, Length(wStr) * 4);
BinToHex(PChar(PWideChar(wStr)), PChar(hexStr), Length(wStr) * 2);
WriteLn('Assignment: [' + hexStr + ']');
{$EndIf}
Application.Title:='SmartPPCServer';
Application.Initialize;
Application.Run;
end.
--- End code ---
When running this application from the command line, everything is as expected:
--- Code: ---firebird@PSERVER:~/bin> ./SmartPPCServer -r -t
System encoding: iso88591
Source: áéíóúüñÁÉÍÓÚÜѺª [E1E9EDF3FAFCF1C1C9CDD3DADCD1BAAA]
AnsiToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
CP1252ToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
UTF8ToUTF16: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Assignment: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Terminado (killed)
--- End code ---
--- Code: ---PSERVER:/ibdata/bin # ps aux | grep SmartPPC
firebird 5611 0.1 0.4 54100 4740 pts/9 Sl+ 14:56 0:00 ./SmartPPCServer -r -t
--- End code ---
But when starting the daemon, something goes wrong:
--- Code: ---PSERVER:/ibdata/bin # /etc/init.d/smartppcserver start
Starting smartppcserver System encoding: ansi
Source: áéíóúüñÁÉÍÓÚÜѺª [E1E9EDF3FAFCF1C1C9CDD3DADCD1BAAA]
AnsiToUTF8: [3F3F3F3F3F3F3F3F3F3F3F3F3F3F3F3F]
CP1252ToUTF8: [C3A1C3A9C3ADC3B3C3BAC3BCC3B1C381C389C38DC393C39AC39CC391C2BAC2AA]
UTF8ToUTF16: [E100E900ED00F300FA00FC00F100C100C900CD00D300DA00DC00D100BA00AA00]
Assignment: [3F003F003F003F003F003F003F003F003F003F003F003F003F003F003F003F00]
done
--- End code ---
--- Code: ---PSERVER:/ibdata/bin # ps aux | grep SmartPPC
firebird 5859 0.1 0.4 51620 4608 pts/8 Sl 15:44 0:00 /ibdata/bin/SmartPPCServer -r -t
--- End code ---
I don't see the difference, the process is running under the same user in both cases.
The daemon is started the standard OpenSuse way:
--- Code: ---[...]
SMARTPPCSERVER_BIN=/ibdata/bin/SmartPPCServer
SMARTPPCSERVER_OPTIONS="-r -t"
[...]
/sbin/startproc -u firebird -e $SMARTPPCSERVER_BIN $SMARTPPCSERVER_OPTIONS
[...]
--- End code ---
jhvhs:
Hi.
I haven't got an installation of openSUSE, but it seems that the difference between the two executions is in the environment. Most of the environment is suppressed by the -e switch of startproc. Try ommitting it.
HTH
zeljko:
OpenSuse is pretty ugly with locales.
what says echo $LANG in console ?
Try to put writeln(toSomeLogFile,GetEnvironmentVariable('LANG')) and see if $LANG from console is different than LANG from daemon.
if it's so you must eg. export LANG=en_US.UTF-8 from your daemon start script.
pumuqui:
Seems that I found the culprit(s).
On one side it was the '-e' switch of startproc, but not alone.
The environment variables accessible for a program when run from the command line or as a daemon are as follows in OpenSuse 11.2.
--- Code: ---PSERVER:/home/firebird # ./utftest
LANG: es_ES.ISO-8859-1
LC_ALL:
LC_MESSAGES:
System encoding: iso88591
PSERVER:/home/firebird # /etc/init.d/utftest start
Starting utftest
LANG: es_ES.ISO-8859-1
LC_ALL: POSIX
LC_MESSAGES:
System encoding: ansi
--- End code ---
So, when started as a daemon, the LC_ALL variable is set to POSIX, and this is due to some lines in the script /etc/rc.status, which is sourced from the daemon startup script:
--- Code: ---# Do _not_ be fooled by non POSIX locale
LC_ALL=POSIX
export LC_ALL
--- End code ---
Well, the solution would be to set the LC_ALL variable again to an empty string before starting the program, on the other hand, for some reason the people of OpenSuse included those lines into their stardard startup procedure. Not sure what would be the best way to proceed...
Navigation
[0] Message Index
[*] Previous page