Recent

Author Topic: How to determine the unknown codepage of a textfile?  (Read 3429 times)

LeP

  • Full Member
  • ***
  • Posts: 203
Re: How to determine the unknown codepage of a textfile?
« Reply #15 on: February 08, 2026, 07:31:02 pm »
As said, (nearly) every Texteditor faces this problem, so there must be solutions.

I'm not convinced of what you assume. I used Notepad++ (which I think is among the most flexible text editors), but several times I had to force the encoding to get the correct character set recognized. And even recently, copying from a localized text file (DOS style), to convert it to UTF-8 I had to force recognition.
Of course, perhaps having a much narrower choice field (2/3 codepage) it might be easier to "get it right"...

Roland57

  • Hero Member
  • *****
  • Posts: 586
    • msegui.net
Re: How to determine the unknown codepage of a textfile?
« Reply #16 on: February 08, 2026, 07:53:25 pm »
@Hartmut

Here is something I had made, when trying to solve a similar problem.

It's an application that tries to detect encoding of a file, using four methods:

  • GuessEncoding function (from LConvEncoding Lazarus unit)
  • file command
  • uchardet command
  • Charset Detector library (written is Pascal)

It doesn't look to work with CP850, but maybe my samples file are not good.

Hope it can help the progress of the discussion.

Would be interested to know how Double Command does that.  :)
« Last Edit: February 08, 2026, 07:57:28 pm by Roland57 »
My projects are on Codeberg.

CM630

  • Hero Member
  • *****
  • Posts: 1641
  • Не съм сигурен, че те разбирам.
    • http://sourceforge.net/u/cm630/profile/
Re: How to determine the unknown codepage of a textfile?
« Reply #17 on: February 08, 2026, 08:06:39 pm »
...
There is an open-source project created using Lazarus called Double Commander...
Seems a good catch.
I made 2 sample files (attached).
CudaText, Notepad++ and AkelPad cannot guess the encoding.
DoubleCommander detected the encoding of the cp1252.txt but it failed with the 850 file.
But when you can detect UTF8 and CP1252, then 850 is what is left.
I have not tried the solution of @Roland57 (yet).
« Last Edit: February 10, 2026, 10:14:19 am by CM630 »
Лазар 4,4 32 bit (sometimes 64 bit); FPC3,2,2

Roland57

  • Hero Member
  • *****
  • Posts: 586
    • msegui.net
Re: How to determine the unknown codepage of a textfile?
« Reply #18 on: February 09, 2026, 06:45:07 pm »
CudaText, Notepad++ and AkelPad cannot guess the encoding.
DoubleCommander detected the encoding of the cp1252.txt but it failed with the 850 file.
But when you can UTF8 and CP1252, then 850 is what is left.

Similar observations here with Geany. And same conclusion.

Geany doesn't always guess successfully the encoding. That's why it has a (very useful) function "Reload as".

Below the results of my program for your sample files.  :(
« Last Edit: February 09, 2026, 06:49:16 pm by Roland57 »
My projects are on Codeberg.

LV

  • Sr. Member
  • ****
  • Posts: 427
Re: How to determine the unknown codepage of a textfile?
« Reply #19 on: February 09, 2026, 07:37:15 pm »
It can be for three encodings

Code: Pascal  [Select][+][-]
  1. program Detect3Enc;
  2.  
  3. uses
  4.   SysUtils, Classes;
  5.  
  6. type
  7.   TEnc = (encUTF8, encCP850, encCP1252);
  8.  
  9. const
  10.   PTChars: UnicodeString =
  11.    'áéíóúâêôãõçÁÉÍÓÚÂÊÔÃÕÇ';
  12.  
  13.   Garbage: UnicodeString =
  14.    '∆µ†¤';
  15.  
  16.   BoxChars: UnicodeString =
  17.    '░▒▓│┤┼┐└═';
  18.  
  19. function LoadBytes(const FN: string): TBytes;
  20. var
  21.   FS: TFileStream;
  22. begin
  23.   FS := TFileStream.Create(FN, fmOpenRead or fmShareDenyWrite);
  24.   try
  25.     SetLength(Result, FS.Size);
  26.     FS.ReadBuffer(Result[0], FS.Size);
  27.   finally
  28.     FS.Free;
  29.   end;
  30. end;
  31.  
  32. function IsRealUTF8(const B: TBytes): Boolean;
  33. var
  34.   i, n: Integer;
  35.   HasMulti: Boolean;
  36. begin
  37.   i := 0;
  38.   HasMulti := False;
  39.  
  40.   while i < Length(B) do
  41.   begin
  42.     if B[i] < $80 then
  43.       Inc(i)
  44.     else
  45.     begin
  46.       HasMulti := True;
  47.  
  48.       if (B[i] and $E0) = $C0 then n := 1
  49.       else
  50.       if (B[i] and $F0) = $E0 then n := 2
  51.       else
  52.       if (B[i] and $F8) = $F0 then n := 3
  53.       else Exit(False);
  54.  
  55.       Inc(i);
  56.       while n > 0 do
  57.       begin
  58.         if (i >= Length(B)) or ((B[i] and $C0) <> $80) then
  59.           Exit(False);
  60.         Inc(i);
  61.         Dec(n);
  62.       end;
  63.     end;
  64.   end;
  65.  
  66.   Result := HasMulti;
  67. end;
  68.  
  69.  
  70. function Score(const S: UnicodeString): Integer;
  71. var
  72.   i: Integer;
  73. begin
  74.   Result := 0;
  75.  
  76.   for i := 1 to Length(S) do
  77.   begin
  78.     if S[i] in ['A'..'Z','a'..'z'] then Inc(Result);
  79.     if Pos(S[i], PTChars) > 0 then Inc(Result, 5);
  80.     if Pos(S[i], Garbage) > 0 then Dec(Result, 15);
  81.     if Pos(S[i], BoxChars) > 0 then Dec(Result, 20);
  82.   end;
  83.  
  84.   if Pos('ão', S) > 0 then Inc(Result, 20);
  85.   if Pos('de', S) > 0 then Inc(Result, 10);
  86. end;
  87.  
  88. function DetectEncoding(const FN: string): TEnc;
  89. var
  90.   B: TBytes;
  91.   S850, S1252: UnicodeString;
  92.   A, B2: Integer;
  93. begin
  94.   B := LoadBytes(FN);
  95.  
  96.   if IsRealUTF8(B) then
  97.     Exit(encUTF8);
  98.  
  99.  
  100.   S850  := TEncoding.GetEncoding(850).GetString(B);
  101.   S1252 := TEncoding.GetEncoding(1252).GetString(B);
  102.  
  103.   A  := Score(S850);
  104.   B2 := Score(S1252);
  105.  
  106.   if B2 > A then
  107.     Result := encCP1252
  108.   else
  109.     Result := encCP850;
  110. end;
  111.  
  112. procedure Test(const FN: string);
  113. begin
  114.   Write(FN, ' -> ');
  115.   case DetectEncoding(FN) of
  116.     encUTF8:    Writeln('UTF8');
  117.     encCP850:   Writeln('CP850');
  118.     encCP1252:  Writeln('CP1252');
  119.   end;
  120. end;
  121.  
  122. begin
  123.   Test('cp850.txt');
  124.   Test('cp1252.txt');
  125.   Test('utf8.txt');
  126.   Readln;
  127. end.
  128.  

Code: Text  [Select][+][-]
  1. cp850.txt -> CP850
  2. cp1252.txt -> CP1252
  3. utf8.txt -> UTF8
  4.  

LV

  • Sr. Member
  • ****
  • Posts: 427
Re: How to determine the unknown codepage of a textfile?
« Reply #20 on: February 09, 2026, 08:34:15 pm »
But I don't know the codepage of the textfile, which I must convert if so to UTF8. Possible in the textfile are only these 3 codepages:
 - UTF8
 - cp1252
 - cp850
or plain ASCII.

Additionally, the author of the post requested ASCII encoding. I tested the program only on the files cp1252.txt, cp850.txt, and UTF8.txt, and I also added a simple ascii.txt file.

Code: Pascal  [Select][+][-]
  1. program Detect4Enc;
  2.  
  3. uses
  4.   SysUtils, Classes;
  5.  
  6. type
  7.   TEnc = (encASCII, encUTF8, encCP850, encCP1252);
  8.  
  9. const
  10.   PTChars: UnicodeString =
  11.    'áéíóúâêôãõçÁÉÍÓÚÂÊÔÃÕÇ';  // Portuguese special characters
  12.  
  13.   Garbage: UnicodeString =
  14.    '∆µ†¤';  // Characters that indicate wrong encoding (CP850/CP1252 artifacts)
  15.  
  16.   BoxChars: UnicodeString =
  17.    '░▒▓│┤┼┐└═';  // Box-drawing characters that indicate wrong encoding
  18.  
  19. function LoadBytes(const FN: string): TBytes;
  20. var
  21.   FS: TFileStream;
  22. begin
  23.   FS := TFileStream.Create(FN, fmOpenRead or fmShareDenyWrite);
  24.   try
  25.     SetLength(Result, FS.Size);
  26.     FS.ReadBuffer(Result[0], FS.Size);
  27.   finally
  28.     FS.Free;
  29.   end;
  30. end;
  31.  
  32. function IsASCII(const B: TBytes): Boolean;
  33. var
  34.   i: Integer;
  35. begin
  36.   Result := True;
  37.   for i := 0 to High(B) do
  38.     if B[i] > 127 then  // ASCII only uses 0-127 range
  39.     begin
  40.       Result := False;
  41.       Exit;
  42.     end;
  43. end;
  44.  
  45. function IsRealUTF8(const B: TBytes): Boolean;
  46. var
  47.   i, n: Integer;
  48.   HasMulti: Boolean;
  49. begin
  50.   i := 0;
  51.   HasMulti := False;
  52.  
  53.   while i < Length(B) do
  54.   begin
  55.     if B[i] < $80 then
  56.       Inc(i)
  57.     else
  58.     begin
  59.       HasMulti := True;
  60.  
  61.       if (B[i] and $E0) = $C0 then n := 1  // 2-byte UTF-8 sequence
  62.       else
  63.       if (B[i] and $F0) = $E0 then n := 2  // 3-byte UTF-8 sequence
  64.       else
  65.       if (B[i] and $F8) = $F0 then n := 3  // 4-byte UTF-8 sequence
  66.       else Exit(False);  // Invalid UTF-8 start byte
  67.  
  68.       Inc(i);
  69.       while n > 0 do
  70.       begin
  71.         if (i >= Length(B)) or ((B[i] and $C0) <> $80) then
  72.           Exit(False);  // Invalid continuation byte
  73.         Inc(i);
  74.         Dec(n);
  75.       end;
  76.     end;
  77.   end;
  78.  
  79.   Result := HasMulti;  // True only if file contains multi-byte UTF-8 sequences
  80. end;
  81.  
  82. function Score(const S: UnicodeString): Integer;
  83. var
  84.   i: Integer;
  85. begin
  86.   Result := 0;
  87.  
  88.   for i := 1 to Length(S) do
  89.   begin
  90.     if S[i] in ['A'..'Z','a'..'z'] then Inc(Result);  // Basic Latin letters
  91.     if Pos(S[i], PTChars) > 0 then Inc(Result, 5);  // Portuguese special characters
  92.     if Pos(S[i], Garbage) > 0 then Dec(Result, 15);  // Penalize garbage characters
  93.     if Pos(S[i], BoxChars) > 0 then Dec(Result, 20);  // Penalize box-drawing characters
  94.   end;
  95.  
  96.   // Bonus for common Portuguese words/sequences
  97.   if Pos('ão', S) > 0 then Inc(Result, 20);
  98.   if Pos('de', S) > 0 then Inc(Result, 10);
  99. end;
  100.  
  101. function DetectEncoding(const FN: string): TEnc;
  102. var
  103.   B: TBytes;
  104.   S850, S1252: UnicodeString;
  105.   A, B2: Integer;
  106. begin
  107.   B := LoadBytes(FN);
  108.  
  109.   // First check ASCII (simplest check)
  110.   if IsASCII(B) then
  111.     Exit(encASCII);
  112.  
  113.   // Then check UTF-8
  114.   if IsRealUTF8(B) then
  115.     Exit(encUTF8);
  116.  
  117.   // If not ASCII and not UTF-8, choose between CP850 and CP1252
  118.   S850  := TEncoding.GetEncoding(850).GetString(B);
  119.   S1252 := TEncoding.GetEncoding(1252).GetString(B);
  120.  
  121.   A  := Score(S850);
  122.   B2 := Score(S1252);
  123.  
  124.   if B2 > A then
  125.     Result := encCP1252
  126.   else
  127.     Result := encCP850;
  128. end;
  129.  
  130. procedure Test(const FN: string);
  131. begin
  132.   Write(FN, ' -> ');
  133.   case DetectEncoding(FN) of
  134.     encASCII:   Writeln('ASCII');
  135.     encUTF8:    Writeln('UTF8');
  136.     encCP850:   Writeln('CP850');
  137.     encCP1252:  Writeln('CP1252');
  138.   end;
  139. end;
  140.  
  141. begin
  142.   Test('cp850.txt');
  143.   Test('cp1252.txt');
  144.   Test('utf8.txt');
  145.   Test('ascii.txt');
  146.  
  147.   Readln;
  148. end.
  149.  

Code: Text  [Select][+][-]
  1. cp850.txt -> CP850
  2. cp1252.txt -> CP1252
  3. utf8.txt -> UTF8
  4. ascii.txt -> ASCII
  5.  

P.S.
1. Start with the simplest checks (ASCII) before progressing to more complex validations (UTF-8, scoring system).
2. Use absolute verification for ASCII and UTF-8 (binary decisions), then employ a scoring system for ambiguous cases like CP850/CP1252 (probabilistic decisions).
3. The algorithm is tailored for Portuguese text, with character sets and bonus patterns optimized for this language.
4. Penalties for "garbage" characters (box-drawing symbols, encoding artifacts) actively prevent incorrect encoding selection, ensuring higher accuracy.
« Last Edit: February 09, 2026, 09:02:17 pm by LV »

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #21 on: February 10, 2026, 09:56:10 am »
Thanks to all for your new posts and suggestions.
I'm currently investigating how Double Commander does codepage detecting (suggested in reply #13) and facing some problems.
When I have results I will report them.

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #22 on: February 11, 2026, 12:06:41 pm »
Sorry for my late reply. Had damaged my Lazarus installation...

Meanwhile I investigated the suggestion from LV to check how Double Commander does codepage detecting.
I found it's sources in https://github.com/doublecmd/doublecmd.
In file ufileview.pas I found nothing about codepage detecting. File uShowText.pas does not exists. So I had to do the big search (900 sourcefiles with 500000 lines).

I was succesfull in folder components/chsdet/ which contains:
 - a stand alone component for automatic charset detection of a given text
 - based on Mozilla's 'universalchardet'
 - created not by the author of Double Commander
 - doesn't need any external components
 - supports about 30 codepages (including UTF8, cp1252 and ASCII, but not cp850).

I tested it and it was successful for UTF8, cp1252 and ASCII, but with cp850 it returned 'Unknown'.

Additionally there is a Unit src/uconvencoding.pas (created by the author of Double Commander) which offers 3 additional codepage detecting functions = MyDetectCodePageType() and 2 variants of DetectEncoding(), which partly use above 'chsdet' component and try to do some additional codepage detecting, but only for 3 cyrillic codepages and some additional UTF8-BOM's. So for my case they are not useful.



I made 2 sample files (attached).
CudaText, Notepad++ and AkelPad cannot guess the encoding.
DoubleCommander detected the encoding of the cp1252.txt but it failed with the 850 file.
But when you can detect UTF8 and CP1252, then 850 is what is left.
I checked both files. They are very untypical for my textfiles, which are either in English or in German, where we have 7 additional letters = ÄÖÜäöüß (called Umlaute), which occur often in German textfiles, but they can be replaced from the autor by (Ae Oe Ue ae oe ue ss) to avoid any trouble with these letters, so German textfiles without any ÄÖÜäöüß also exist.

Will check next the suggestions from Roland57 and LV and report after.

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #23 on: February 11, 2026, 01:04:47 pm »
Please excuse my jumping in, but I'm interested: once you've determined the codepage, what next?
As said after detecting the codepage I need to convert it to UTF8 do display the text in a TMemo.

Quote
Noting https://wiki.freepascal.org/Ansistring, can the codepage of an AnsiString be adjusted "on the fly" so that after something has been read into it from a TextFile, translation to Unicode works correctly?
Until now I have not used this feature (because I have own functions for that, which are older than I use FPC), so I can't answer this question. An alternative is Unit \components\lazutils\lconvencoding.pas which contains many conversion possibilities.

I'm not so happy that you jumped into this Topic, because it is (more than) long enough, which makes it difficult enough for any helpers to find the relevant infos and if a 2nd theme is mixed between, it will make it not better.
If you have more questions, please create a separate Topic. Thanks for your understanding.

CM630

  • Hero Member
  • *****
  • Posts: 1641
  • Не съм сигурен, че те разбирам.
    • http://sourceforge.net/u/cm630/profile/
Re: How to determine the unknown codepage of a textfile?
« Reply #24 on: February 11, 2026, 03:10:40 pm »
...
I tested it and it was successful for UTF8, cp1252 and ASCII, but with cp850 it returned 'Unknown'.
...
So what bothers you?
If I have understood you correctly, for your files if it is not UTF8 and cp1252 it is cp850.
For German (there was no Capital Eszett at the times of these encodings):
Code: Pascal  [Select][+][-]
  1.         850     1252
  2. ß      E1      df
  3. ä      84      e4
  4. Ä      8E      c4
  5. ö      94      f6
  6. Ö      99      d6
  7. ü      81      fc
  8. Ü      9A      dc
  9.  
You might additionally check if any of these #$E1; #$84; #$8E; #$94; #$99; #$81; #$9A are present.
If there are no umlauts, there should be nothing above $#7F so it is English or *e was used... hopefully.
« Last Edit: February 11, 2026, 03:14:39 pm by CM630 »
Лазар 4,4 32 bit (sometimes 64 bit); FPC3,2,2

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #25 on: February 12, 2026, 06:41:21 pm »
Meanwhile I checked encodingexpert-260208.zip from reply #16. It contains 4 methods:
 - LConvEncoding.GuessEncoding() can only return UTF8 or the Default of the OS (which is replaced by ISO-8859-1, if this Default is UTF8, which on Linux is the case) => for me not usable
 - Linux 'file' command: returns 'unknown-8bit' for cp1252 and for cp850 => for me not usable
 - Linux 'uchardet' command: because cp850 is not supported (from it's documentation and test by Lutz in reply #10) and 'Charset Detector' is the same, I did not install it
 - Charset Detector: same as the 'chsdet' stand alone component included in Double Commander, I already tested and described in reply #22.
So regrettably no improvement to what was before.



Then I tested program Detect4Enc from reply #20. For my 1st file it reported cp850, although this file is cp1252 and contains > 700 characters of cp1252 which makes it not usable for me.
But I like your idea of checking for 'Garbage' and 'BoxChars' to penalize them. If I would think of reinventing the wheel and would think about creating my own codepage detector then I would use something similar.

But I definitely do not want to invest all this effort of investigating / creating statistics / experimenting / testing / failing / improving etc. until this detector peace by peace comes nearer to reliable results.



So what bothers you?
If I have understood you correctly, for your files if it is not UTF8 and cp1252 it is cp850.
The risk to treat everything, what could not detected (surely enough), as cp850.
It makes sense, that if a file is too short and/or has not enough characteristic data, to report it as 'Unknown' instead of returning some nonsense.

Quote
You might additionally check if any of these #$E1; #$84; #$8E; #$94; #$99; #$81; #$9A are present.
If there are no umlauts, there should be nothing above $#7F so it is English or *e was used... hopefully.
Unfortunately this is not true. Besides "Umlaute" = ÄÖÜäöüß there are many special chars, which can (but not must) occur in cp850 e.g.  ø á ½ « » © µ ± § x² x³  to show only some of them.
And e.g. $F6 is 'ö' in cp1252 and '±' in cp850.
Or e.g. $FC is 'ü' in cp1252 and '³' in cp850.
I think for reliable results a mature statistics for each codepage is neccessary.

Meanwhile I made some more tests with component 'chsdet' (contained in Double Commander) and got too many wrong results:
 - a couple of files which do have cp1252 were not reported as cp1252, but as GB18030 (chinese) or ISO-8859-7 (greek)
 - a couple of files which do have cp850 were not reported as Unknown, but as cp1252 (!) or Shift_JIS (japanese)
It does not help me, if a codepage detector is so unreliable.

Summary:
Unfortunately none of the tools mentioned in this Topic (until now) shows results, which are reliable enough. Seems that component 'chsdet' (contained in Double Commander) generally is still the best of them. But it does generally not support cp850, which results in different wrong codepages, because it has no statistics for cp850 to consider them.

I thought that better tools exist, because (nearly) every Texteditor has to solve this problem.

If no better tool comes in the next time, I will proceed with my project this way:
 - if a textfile has pure ASCII or UTF8 (both is simple to detect and I found no wrong results with 'chsdet' for both), I will use this codepage
 - in all other cases the user must select manually the codepage (cp1252 or cp850).

Thanks a lot to all for your aid.

Roland57

  • Hero Member
  • *****
  • Posts: 586
    • msegui.net
Re: How to determine the unknown codepage of a textfile?
« Reply #26 on: February 12, 2026, 08:12:08 pm »
@Hartmut

Thank you for the detailed report.

Indeed I have been myself disappointed by some results of my program.

Would you mind sharing a sample of your files? I am still interested in the challenge.  :)
« Last Edit: February 12, 2026, 08:13:41 pm by Roland57 »
My projects are on Codeberg.

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #27 on: February 13, 2026, 10:57:39 am »
Would you mind sharing a sample of your files? I am still interested in the challenge.  :)

Had to search some examples again because I had only noted a couple of results but not the corresponding filenames. I removed some personal content but the results remained always the same. I attached for you (results are from your 'Charset Detector'):

demo1.txt has 27 "Umlaute" = ÄÖÜäöüß in cp1252, but is reported as GB18030 (chinese).
demo2.txt is also cp1252, but is reported as ISO-8859-7 (greek).
demo3.txt has 36 "Umlaute" = ÄÖÜäöüß in cp850, but is reported as cp1252.
demo4.txt has 27 "Umlaute" = ÄÖÜäöüß in cp850, but is reported as Shift_JIS (japanese).

CM630

  • Hero Member
  • *****
  • Posts: 1641
  • Не съм сигурен, че те разбирам.
    • http://sourceforge.net/u/cm630/profile/
Re: How to determine the unknown codepage of a textfile?
« Reply #28 on: February 13, 2026, 11:22:22 am »
demo2.txt - the chance to atodetect is close to zero. I found only 3 characters above #$1F (in “gemäß” and in “Genauigkeit von ±8 Zeilen á 40..50 Bytes”
á does not look like German to me.

demo 1: Manually no problem to detect that it is 1252 - in 850 there are nõchste (instead of nächste), Verschl³sselung (verschlüssenung), f³r (für), Au▀erdem(...)
also: nächste; für; später; verschlüssen; angehängt

One problem is there are filepaths, which seems to contain anything (d:\FPC\work\Experimente\xx_õ÷³─Í▄.txt).

A possible way is to detect filepaths in order to ignore them.
And then if you find ³ surrounded by letters, then it is not 850 (³ makes sense after a numeric).
Maybe the chance of having ▀ in a real text at all is also minimal.

demo3 and demo4: absolutely easy to detect manually.
What about some dictionary brute force? Search German words in both encodings?
Or maybe do it reversely: when a word with a character above #$1F is found, search in a dictionary with both encodings?
Maybe a spellchecker can be applied...


« Last Edit: February 13, 2026, 11:25:50 am by CM630 »
Лазар 4,4 32 bit (sometimes 64 bit); FPC3,2,2

Hartmut

  • Hero Member
  • *****
  • Posts: 1103
Re: How to determine the unknown codepage of a textfile?
« Reply #29 on: February 13, 2026, 07:06:23 pm »
As said
Quote

If I would think of reinventing the wheel and would think about creating my own codepage detector then I would use something similar.
...
But I definitely do not want to invest all this effort of investigating / creating statistics / experimenting / testing / failing / improving etc. until this detector peace by peace comes nearer to reliable results.
...
I think for reliable results a mature statistics for each codepage is neccessary.
Everything else will never be reliable.

Please accept that I do not want to spend (more) endless time to manually analyze and improve one individual file after each other and to continue this endless for future upcoming files.

 

TinyPortal © 2005-2018