Thanks
rvk for your post, but we misunderstood: the problem is not the additional "0" at the end. The result is wrong, because function UnicodeStringToUCS4String() should split the input into it's characters (not bytes). That means, an input of e.g. character "ä" = 2 Bytes = "195 164" should be converted into 1 value, not 2 values.
Thanks a lot
Martin_fr for your reply. You are right, UnicodeString is not Utf8String, I did not pay attention to it. As you can see from the above output, string 's' obviously is in UTF8, not Unicode.
As recommended I added {$codepage utf8} to the top of my source. But after this string 's' contained Windows-charset (Ansi 1252?) - very strange (currently I'm on Linux):
len(s)=14
41 42 20 E4 F6 FC DF 20 C4 D6 DC 20 31 32
len(z)=15
41 42 20 E4 F6 FC DF 20 C4 D6 DC 20 31 32 00 Info: my sourcefile was already in UTF8.
Then I changed my code to:
...
var s0: UTF8String;
s: UnicodeString;
begin
s0:='AB äöüß ÄÖÜ 12'; // should store as UTF8
s:=UnicodeString(s0); // should convert to Unicode to make UnicodeStringToUCS4String() happy
...
but the result again was:
len(s)=21
41 42 20 C3 A4 C3 B6 C3 BC C3 9F 20 C3 84 C3 96 C3 9C 20 31 32
len(z)=22
41 42 20 C3 A4 C3 B6 C3 BC C3 9F 20 C3 84 C3 96 C3 9C 20 31 32 00 Does anybody know, why above
s:=UnicodeString(s0);does not convert from UTF8 to Unicode?