I found it helpful to put my UTF8 and UTF16 test data into files on disk, which I could check with multiple editors, to make sure they were in particular formats and encodings, before involving FPC (or Delphi). For me, that obviates the need to declare string constants with complex encodings and worry whether Lazarus etc is modifying the contents as intended. Please see attached -- a selection of 1-2 kbyte files. The file extension is .utx so you can connect the files with the editor of your choice. Note that tests where the file name includes non-Latin letters can prove more difficult, which is why there are a couple file *names* in Russian and Arabic.
The downside to this approach is that the file i/o has to be rock solid, and I am stuck there with my FPC 3.0 tests, which is why I wanted to jump in here. This in particular seem "wrong" compared to Delphi's idea:
* TStringList.Load of a file, where the file contains a UTF16 Byte Order Mark ("BOM") does not seem to be able to load that data directly into a string (UnicodeString). The length of the data is roughly 2x larger than I would expect. Measured with Length(), Delphi gives a length of 21 and FPC gives a length of 43. This is for the attached chinese.16.utx file.
A quote from the Delphi docwiki: "If the Encoding parameter is not given, then the strings are loaded using the appropriate encoding. The value of the encoding is obtained by calling the GetBufferEncoding routine of the TEncoding class." (
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/Classes_TStrings_LoadFromFile@string.html )
In terms of sanity checking the contents of files, I can highly recommend two editors: TedNPad (for very quick display of encoding and BOM on the status bar) and Unipad (for excellent display of details about individual chars).
This is my function for loading via TStringList - confirmed working in Unicode Delphi for quite a few years:
function TTest_ucLogFil.TStringList_File_To_String(
const InFilespec: string): string;
var
y: TStringList;
begin
y := nil;
Result := '';
try
y := TStringList.Create;
y.LoadFromFile(InFilespec);
// strip trailing CRLF, which was not in the disk file
Result := Copy(y.Text, 1, Length(y.Text) - 2);
finally
FreeAndNil(y);
end;
end;
To summarize, FPC 3 is not loading the chinese.16.utx file the way I think it should.