I meant to explain that const cs is by default system code page string (not RowByteString), but then your fist message tells us that it actually is a RawByteString (code page 0). So I am also confused...
What first message? The constant string is not RawByteString, it is an UTF-8 string because the source file is encoded as UTF-8.
However the compiler interprets it as having system code page, then does a wrong conversion from system code page to UTF-8.
{$codepage utf8} or -FcUTF8 override that, then the compiler treats constant strings as UTF-8 and the last assignment would go right.
The new UTF-8 support in Lazarus works without {$codepage utf8} or -FcUTF8 most of the time. Why? It is rather counter-intuitive.
The reason is that the default String encoding is switched to UTF-8 at run-time, yet the constants are evaluated at compile-time.
So the compiler (wrongly) thinks the constant String is encoded with system code page. Then it sees a String variable with default encoding (which will be changed to UTF-8 at run-time but the compiler does not know it). Thus, same default encodings, no conversion needed, the compiler happily copies the characters and everything goes right, while actually it was fooled twice during the process.
Here are some details about the issue:
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_LiteralsThis was another example why
not to use UTF8String. In any case it leads to useless encoding checks in the generated code and in worst case it leads to wrong conversion.
People are asking what is the future proof String type. It is plain
String. Just pretend it is Delphi compatible and it works like magic most of the time.
The 2 exceptions are:
1. Input/output string data has system encoding. Then it must be explicitly converted.
2. Dealing with individual codepoints beyond ASCII area. Fortunately that is not needed often in "normal" "typical" programming.