This situation is exactly what I hate about the utf8rtl hack. The "ansi" encoding somehow gets lost. Inserting conversions that are not runtime (like cp1252) are hacks and are hopeless if your files are aggregates of complex write() commands.
Yes, this is the downside. The solution is to encapsulate the read/write code into some "dirty" functions which can be ported to be compatible with Delphi and the future UTF-16 Lazarus solutions.
Also it hardcodes an encoding (1252 or whatever, which was not hardcoded (but locale dependent) before, so it is not a direct substitute).
I think WinCPToUTF8() and UTF8ToWinCP() functions solve this.
That and the fact it is compatible with both old AND new (unicode, 2009+) Delphi.
The reading/writing non-UTF-8 data from/to files/streams is not compatible. Once the data is converted (in some encapsulated functions) the solution in amazingly compatible with Unicode Delphis.
IS the shortstring utf-8 or it supports the OEM character set of the installed windows (1252 etc)? Is the ansistring type utf-8 and different from what ever the user has chosen in the default ansi encoding in his windows installation yes or no? simple questions if they are it is unacceptable behavior of a library.
Yes, AnsiString is UTF-8 as is clearly explained in the wiki page. ShortString is UTF-8, too, which is logical IMO, although constant assignment to ShortString has some issues as explained here:
http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#String_LiteralsAny way it is clear that the road ahead is not one I'd like to walk. So I'm out for now. have fun everyone.
Please think what are the alternatives. FPC 3.x + Lazarus without -dEnableUTF8RTL leads to many problems as noticed by ChrisF, wp and many others.
The Delphi compatible UTF-16 solution is still years away. This solution now is an evolutionary continuation of the UTF-8 solution that LCL already has. Without this solution Lazarus would be doomed to either use
FPC 2.6.4 the explicit conversion function hack for a long time or then repel still more people by a seriously broken Unicode support (ask ChrisF and wp for details).
We truly need a working Unicode solution. Competing languages / IDEs have supported it for ages.
The FUD was mostly about the place you wrote your opinions. This thread is otoien's honest question about Unicode support.
You can start new threads about solving problems in the alternative way of using FPC 3.x without -dEnableUTF8RTL. It is perfectly OK. You can co-operate with ChrisF and wp who seem to have the same goal.