[...] Can you ChrisF please test more and update the page with tested examples. [...]
As far as I've tested, all these 3 kinds of solution are working (BTW, they are general ANSI <-> UTF8 conversions, not limited only to text file data):
{$if (FPC_FULLVERSION>=20701) and defined(EnableUTF8RTL)}
SetCodePage(rawbytestring(StrIn), 1252, false); // 1252 always, or Windows.GetACP()
{$ifend}
StrOut := SysToUTF8(StrIn);
{$if (FPC_FULLVERSION>=20701) and defined(EnableUTF8RTL)}
type
string1252 = type ansistring(1252);
{$ifend}
...
{$if (FPC_FULLVERSION>=20701) and defined(EnableUTF8RTL)}
var StrIn: string1252;
{$else}
var StrIn: string;
{$ifend}
...
StrOut := SysToUTF8(StrIn);
var StrIn: string; // or rawbytestring (OK in both cases apparently, but rawbytestring type needs conditional code)
...
StrOut := CP1252ToUTF8(StrIn); // 1252 always, of course
For my own, I prefer the first one, because generic source code could be used for any Windows code page (providing you know this Windows code page value). Some new generic conversion functions could even be created, like (it's only a proposal):
{$if (FPC_FULLVERSION>=20701) and defined(EnableUTF8RTL)}
function AnsiCPChange(const s: string): string;
begin
result := s;
// Only if code page for string variable has not been already set (by a former AnsiCPChange call, or by user)
if StringCodePage(s) = CP_UTF8 then
if DefaultSystemCodePage = CP_UTF8 then
SetCodePage(rawbytestring(result), Windows.GetACP, false)
else
// In case DefaultSystemCodePage has been changed by user
SetCodePage(rawbytestring(result), DefaultSystemCodePage, false);
end;
function SysToUTF8Ext(const s: string): string;
begin
result := SysToUTF8(AnsiCPChange(s));
end;
function UTF8ToSysExt(const s: string): string;
begin
result := AnsiCPChange(UTF8ToSys(s));
end;
{$ifend}
Or eventually modify the existing SysToUTF8/UTF8ToSys LazUTF8 functions ? It would be great if possible, as it would mean that no change would be required in current user's source code (i.e. current function calls would be still OK).
Concerning AnsiToUTF8 and UTF8ToAnsi, I'm afraid they are no more working anyway; not without modifying the code page for ANSI string variables directly in the user's source code.
[...] Your case of a text file with Windows encoding is an exception and requires a little bit of extra code. [...]
Well, if I'm correct, in fact EVERY existing (i.e. up to Lazarus 1.4/Free Pascal 2.6.4) following conversion instruction calls need to be carefully examined in current source code, and most probably modified (whatever is the origin of the ANSI data; not only text files, I mean): ANSIToUTF8 and UTF8ToANSI, SysToUTF8 and UTF8ToANSI.
Because I can't imagine any case for which these current function calls would also work properly with Lazarus 1.5+/FreePascal 2.7.1+ (as string variables containing ANSI data are now always identified as strings with an UTF8 code page by default). Except for ASCII only text, of course.
As an addition,
1/ If DefaultSystemCodePage is now always set to CP_UTF8, it would be nice to have at least a new variable containing really the Windows code page; like OSDefaultSystemCodePage or any other name (for the other OS, OSDefaultSystemCodePage = DefaultSystemCodePage).
It would break the Delphi compatibility (Delphi has only one DefaultSystemCodePage value), and it would require source code changes, but at least this new variable could be used in these changes (with conditional instructions however, depending of the LCL version).
2/ In new programs (i.e. not talking of the ANSI conversion problem for existing source code), how is it possible to declare a "generic" ANSI type string with LCL 1.5+ ? I mean, a string type with the Windows code page as a static code page value: neither "ansistring" nor "type ansistring(CP_ACP)" are working, as CP_ACP means now UTF8.
So, how can we declare a "realansistring" type, usable for any Windows code page (corresponding of course to the targeted computer Windows OS) ?
NB.
[...] It is important to remember that the "better" Unicode support in LCL makes everything easier when the input data is already UTF-8. [...]
I'm certainly not arguing against a "better" Unicode support in the LCL/RTL. I'd just wanted to identify potential compatibility issues concerning the new LCL/FPC versions, at least as it's currently implemented in the corresponding trunk versions.