Duh, unicodestring as basetype.
Duh, and that means only using UTF-16 and again we are stuck with the surrogate pair issue, which FPC doesn't actually help with at all. FPC doesn't know anything about surrogate pairs (as per the recent mailing list conversations).
1. Show me bugreports for specific functionality Unicode discussions on the maillist are highly coloured and overly generalist and usually not worth the trouble.
2. A lot of 1-byte string usage is not utf8 clean either and will need to be cleaned up going forward, but with the additional constraint that it must keep working with backward compatibility.
I understand that you want to find some stick to beat utf16 with. That is pointless. I don't choose utf16 because I think it is superior, but of two reasons:
1. Primarily, Delphi of course. Whatever minor advantages to an encoding over the other might have, it is not worth being hampered with both incompatibility to an ever increasing faction of Delphi users and component builders.
2. The current situation is bad in the sense that with utf8hack there is no ACS type. With ACS, utf8 is very stilted. This is annoying, though I assume it could be remedied, albeit again Delphi compatible.
3. Yes, the third point is also delphi related; having a simple test of compatibility or not saves a lot of discussions and decisions (that turn out to be bad later). The delphi model is known, flaws and all. Bad choices in an own path only emerge over time, which doesn't invite a speedy migration. Moreover because FPC implementations of very major features are often the work of differing people in differing periods it avoids the problem that a second, later implementator doesn't know if something was intended, a temporary shortcut or an honest mistake.
This is already increasingly a problem with FPC extensions. (See e.g. the encoding of case of string).
You do also realise I mentioned "developers want to use utf-8 in their applications",
Yeah, and I rejected it in an earlier post in this thread as the result of people being confused between API and document encodings and/or insensitivity to Windows encoding issues.
and using UnicodeString (as bad as the name choice was), is only UTF-16. So no, that is not an option.
You have had 9 years to get over that. Don't you think it is slowly time you stop mentioning that in every unicode post? It is getting old.