Wow, lots of replies here. This kind of topic seems to have a tendency to become rather heated.
The only remaining issue for "otoien" is how to read/write non-UTF-8 data. It can be solved easily with WinCPToUTF8() or CP1252ToUTF8() etc..., as I wrote earlier.
This situation is exactly what I hate about the utf8rtl hack. The "ansi" encoding somehow gets lost. Inserting conversions that are not runtime (like cp1252) are hacks and are hopeless if your files are aggregates of complex write() commands. Also it hardcodes an encoding (1252 or whatever, which was not hardcoded (but locale dependent) before, so it is not a direct substitute). ..
Not sure I understnad what "hopeless if your files are aggregates of complex write() commands" implies. I specifically want to convert from CP1252 because this is the encoding used by the TP55 application by my own choice with respect to units [For display only it is translating to DOS CP 437/850]. I do not want a colleague with for instance a Hebrew codepage to use that codepage to read my own recorded data. One need to take height for the situation where data are transported internationally. Also if I supply an
option in the new application to write data files in Windows ANSI (non-UTF-8) I think that also will be done in CP1252. Simply one does not want interpretation of the data (in this case the scientific units) to change depending on where they are read. (For the colleague with the non-CP1252 computer, the solution would be to choose the option to write UTF-8 encoded data files in the new application, as the windows CP1352 ANSI would not display correctly on his computer, only be correctly interpreted).
[As an example how bad things can go: The first versions of a commercial data acquisition program we used to monitor biorhythms in animals was writing files time stamps according UTC instead of local time. Then on read they would adjust time according to local time offset on the computer they were read. Seems like a perfectly sound engineered solution? Well, if data recorded here in the US on a day active animal was read on a European computer, it would be interpreted as if the animal was night-active! Not what one want for that kind of work.]>Please read:
http://wiki.freepascal.org/FPC_Unicode_support#Shortstring>To make it simple: if EnableUTF8RTL is defined -> UTF-8 encoded by default.
Well I assume that even if my shortstring in the packed record [used to read the binary file] will be considered UTF-8 encoded, if I convert it using CP1252ToUTF8, it will be considered just a stream of bytes that should be converted correctly.
I must say that the situation in Lazarus 1.4.0/FPC 2.6.4 and earlier with different encoding of strings in RTL and LCL has been a somewhat crazy situation (but understandable in a historic perspective), where one need to keep aware where things are called from and keep doing these conversions when treating data internally.
It really seems that the EnableUTF8RTL option is going to make things a lot simpler and cleaner for new applications, with mostly need to thinks about possible conversion for different file input and perhaps certain data output. That is a lot more easy to isolate.
One worry I have is the writings about future development with UTC-16 (I am personally perfectly happy with UTC-8). I really hope this will not be the only option for windows applications, but that UTC-8 can be selected with switches for the default string encoding as in the coming version.
I absolutely would not want to write Tab-delimited data text files in UTC-16 format. Besides, while my application could change internally, long term consistency in file formats is imperative in my kind of slow development. Having a split between Unix/Mac one one side and Windows on the other with respect to encoding would really also break the cross-platform compatibility.
If there is a possibility that we are going to loose UTC-8 as an optional string default on Windows, then I think this should be clearly expressed in the wiki's etc. so that new code with respect to file input/output can be prepared with this in mind.
I will have to wait with further consideration/testing until I got a trunc version with FPC 3.x installed.