Hello
LeP and
JuhaManninen, of course life would be easier, if every textfile would be in Unicode or UTF8 - but that's not reality and if the solution would be so easy, only to convert a restricted number of files only once in life, I would not have started this Topic...
If a textfile has not UTF8, e.g. because it's a logfile, written / changed often by a certain program, or a textfile needs to be read / imported by a certain program, I can't change their codepage. And this should furthermore not be the content / goal of this Topic.
Please back to my question, how to determine the codepage of a textfile. As said, fortunately I don't have to differ various Portuguese codepages - only 3 codepages are possible:
- UTF8
- cp1252
- cp850
As said, (nearly) every Texteditor has to solve this problem, so there must be solutions.
Meanwhile I had a 2nd look at
https://gitlab.freedesktop.org/uchardet/uchardet/-/blob/master/README.md where I saw, that utf8 and cp1252 are supported, but cp850 does not appear in the long list of supported encodings.
Because playing with uchardet seems to be somewhat elaborate:
- is someone sure, that uchardet supports cp850?
- can it be used on Linux (Ubuntu 24.04 with KDE plasma desktop)?
I tried with a CP1252 snippet. It detected that it is not UTF 8, but applied CP1251 on it, which is the ansi accroding my localisation.
I believe @Theo is the author of the component, maybe he is reachable.
Thank you for testing this. I understood, that you got a certain codepage (CP1251) reported as result. Will try to do some tests with existing files and if so try to contact @Theo.