In Germany we have 7 special characters, called "Umlaute". These are "Ä Ö Ü ä ö ü ß". They can occur in filenames.
If I write a simple console program (I use FPC 3.0.4 on Windows 7) then FindFirst / FindNext returns the ANSI charset:
unit unit1;
{$mode objfpc}{$H+}
interface
procedure showfiles(pattern: ansistring);
implementation
uses sysutils;
function hexString(s: ansistring): ansistring;
{returns 's' as a hex-string}
var z: ansistring;
i: longint;
begin
z:=''; for i:=1 to length(s) do z:=z + ' ' + hexStr(ord(s[i]),2);
exit(z);
end;
procedure showfiles(pattern: ansistring);
{shows all files which match to 'pattern'}
var SR: TSearchRec;
begin
if FindFirst(pattern,faAnyfile,SR) = 0 then
repeat writeln(SR.Name, hexString(SR.Name));
until FindNext(SR) <> 0;
FindClose(SR);
end;
end.
program project1;
{$mode objfpc}{$H+}
uses unit1;
begin
showfiles('d:\tst\xx_*.*');
end.
The result for a file with special characters (file is attached) is ANSI:
>project1.exe
xx_äöüÄÖÜ.txt 78 78 5F E4 F6 FC C4 D6 DC 2E 74 78 74
But as soon as I write a (minimal) GUI application, using the same "unit1", FindFirst / FindNext returns now the UTF8 charset:
program project2;
{$mode objfpc}{$H+}
{$apptype console} {neccessary for writeln}
uses
Interfaces, // this includes the LCL widgetset
unit1;
begin
showfiles('d:\tst\xx_*.*');
end.
The result for the same file with special characters is now UTF8:
>project2.exe
xx_äöüÄÖÜ.txt 78 78 5F C3 A4 C3 B6 C3 BC C3 84 C3 96 C3 9C 2E 74 78 74
So I have 2 questions:
- what is the minimal Unit, which "switches" the charset returned by FindFirst / FindNext from ANSI to UTF8?
- is there a way (e.g. a function or global variable or conditional) to determine in a
common unit (like "unit1"), whether this "switching" unit is used somewhere in the whole program so that FindFirst / FindNext returns UTF8 instead of ANSI?
I'm a beginner to character sets and codepages. Thanks in advance. I attached my 2 small projects and a demo file with german special characters in the filename.