What FPCAdds does is it sets codepage conversion code in its initialization section - see also https://wiki.freepascal.org/Unicode_Support_in_Lazarus#Technical_implementation. And that gives me an idea: Why not call them at the beginning of your project with the default Windows ANSI codepage, and you'll avoid UTF8-conversion even if LazUTF8 is in "uses". As shown below this works in my console program (tested Win 10)
Hello wp,
now I have made extensive tests with your suggestion (reply #45) and I have good news!
With Unit LazUTF8 I faced a lot of problems and disadvantages in the past:
a) on Windows in console programs adding Unit LazUTF8 changes the charset of the filenames reported by sysutils.FindFirst() and FindNext(). Without Unit LazUTF8 they return Windows-Charset (ANSI 1252) / with Unit LazUTF8 they return UTF8. This difference makes live not easy.
b) dito for all other procedures and functions which deal with filenames or folders
c) dito for the charset of the results of ParamStr()
d) dito for the results of readln()
e) during decades I have written a couple of common libraries for console programs, which partly have problems with this differing charsets, so then I get wrong results with Unit LazUTF8 because of the changed charset
f) for a couple of older programs I still use FPC 2.6.4 (with the same common libraries), but in FPC 2.6.4 obove functions never return UTF8 on Windows
g) in some cases (but not always) I had problems, that writeln() for an 'ansistring' showed damaged characters for Ä Ö Ü ä ö ü ß when I added Unit LazUTF8.
I tested all above problems with FPC 3.0.4 and 3.3.1 beta on Windows 7 and with 1 exception all of them are solved by your suggestion.
Many many thanks to you for that great and very-easy-to-use idea!
The exception is c) concerning the results of ParamStr(). I created a small demo for that (see attached as compilable project):
procedure set_charset_WIN;
{switches 3 codepages on Windows to ANSI-1252, which have been changed before
to UTF8, if Unit 'LazUTF8' is included}
var winCP: UINT; {dword}
begin
winCP:=windows.GetACP; {gets System codepage}
SetMultiByteConversionCodePage(winCP);
SetMultiByteFileSystemCodePage(winCP);
SetMultiByteRTLFileSystemCodePage(winCP);
end;
procedure Test_ParamStr;
{shows the charset returned by system.ParamStr() OR objpas.ParamStr()
depending of current Compiler "$mode".
Usage: start the program in a Windows 7 Console with a command line
parameter like "äöü".
If then result = "E4 F6 FC len=3" => WINDOWS-charset (ANSI 1252) /
if then result = "C3 A4 C3 B6 C3 BC len=6" => UTF8-charset}
type ansi_1252 = type AnsiString(1252); {Windows-charset}
var sa: ansistring;
sw: ansi_1252;
ss: string[255];
i: integer;
begin
writeln('Results of ParamStr():');
ss:=ParamStr(1); // type shortstring:
write(' - string[s255] => ');
for i:=1 to length(ss) do write(HexStr(ord(ss[i]),2), ' ');
writeln('len=', length(ss));
sa:=ParamStr(1); // type ansistring:
write(' - ansistring => ');
for i:=1 to length(sa) do write(HexStr(ord(sa[i]),2), ' ');
writeln('len=', length(sa));
sw:=ParamStr(1); // type AnsiString(1252):
write(' - ansi(1252) => ');
for i:=1 to length(sw) do write(HexStr(ord(sw[i]),2), ' ');
writeln('len=', length(sw));
end;
Info: {$mode objfpc} causes that ParamStr() of Unit 'objpas' is used / {$mode TP} causes that ParamStr() of Unit 'system' is used.
The results are (both in FPC 3.0.4 and 3.3.1 beta):
Unit call of charset charset
{$mode} LazUTF8 set_charset_WIN() 'ss+sa' 'sw'
---------------------------------------------------------
objfpc without no WIN WIN
"" without yes WIN WIN
"" with no UTF8 WIN
"" with yes UTF8 UTF8
TP without no WIN WIN
"" without yes WIN WIN
"" with no UTF8 UTF8
"" with yes UTF8 UTF8 We see, that the call of set_charset_WIN() unfortunately does
never change from UTF8 to WIN (it makes only a difference in 1 rare case with type 'AnsiString(1252)', but 1) it changes into UTF8, what doesn't help me and 2) I never used type 'AnsiString(1252)' in combination with ParamStr(), so this case is
not of interest).
Do you (or someone else) have an idea, how the returned charset of ParamStr() can be switched from UTF8 to WIN, if Unit LazUTF8 is included (without damaging all the other solved cases above)? I search a "global" solution like above procedure set_charset_WIN(), which has only to be called once at the start of a concerned program. Of course I'm not keen on to adapt every single usage of ParamStr() in my programs and libraries individually (more than 200).
Thanks to all for your help!