Can you correct me if I am wrong but basically Indy will convert any strings it reads to the encoding defined with
IOHandler.DefStringEncoding
and then convert it to UTF-16 no matter what. Then if no encoding has been defined the output string passed by ReadLn() will be osDefault or if defined via
IOHandler.DefAnsiEncoding
then the given encoding?
When reading an incoming string, the bytes are decoded straight to UTF-16 using DefStringEncoding (unless overridden by the reading method's optional AByteEncoding parameter). By default, DefStringEncoding is ASCII. If no encoding is specified, the IdGlobal.GIdDefaultTextEncoding encoding is used (also ASCII by default).
If the string being returned is an AnsiString, the UTF-16 is converted to Ansi using DefAnsiEncoding (unless overridden by the reading method's optional ADestEncoding parameter). By default, DefAnsiEncoding is OSDefault (user's locale on Windows, UTF-8 on other systems). If no encoding is specified, the IdGlobal.GIdDefaultTextEncoding encoding is used.
The reverse is true with sending strings:
When sending an outgoing string, the characters are encoded from UTF-16 to bytes using DefStringEncoding (unless overridden by the sending method's optional AByteEncoding parameter). If no encoding is specified, the IdGlobal.GIdDefaultTextEncoding encoding is used.
If the string being sent is an AnsiString, the Ansi data is converted to UTF-16 using DefAnsiEncoding (unless overridden by the sending method's optional ASrcEncoding parameter). If no encoding is specified, the IdGlobal.GIdDefaultTextEncoding encoding is used.
So to make it short ; DefStringEncoding is the encoding I am expecting from the server, DefAnsiEncoding the encoding returned by ReadLn() and in between in it will always be UTF-16 ( - as I am guessing that would be the only encoding that would allow no loss? -) ?
Basically, yes.
To be fair my UI doesn't expect anything as I started to code it on Linux and everything always been fine (including Kanjis) without changing the default charset for any of the components. I guess it would be because the default charset in Linux supports it. So it has been a bit of a tricky problem only on Windows.
Welcome to the wonderful world of legacy Ansi handling
All the reason why the majority of the world has moved to UTFs. If you recompile Indy to enable {$MODE DelphiUnicode} or {$MODESWITCH UnicodeStrings} in IdCompilerDefines.inc, the String type becomes UnicodeString, and DefAnsiEncoding and ASrcEncoding/ADestEncoding parameters disappear. So things will
probably start working OK (I haven't tested it, though).
My code on Linux doesn't define either DefStringEncoding or DefAnsiEncodind (that I have just discovered tonight), it just works.
Even though DefAnsiEncoding defaults to UTF-8 on Linux, I would expect data loss if you don't set DefStringEncoding to UTF-8 to match, since DefStringEncoding is always US-ASCII by default, so kanjis and such would get lost before being converted to Ansi.