Unicode won't work no matter what I do.

NonSpillable

New Member
Posts: 10

Unicode won't work no matter what I do.

« on: January 14, 2019, 06:03:28 am »

Hi. Since upgrading my debian installation and to latest lazarus (from debians repo) none of my programs accessing files will work. Nordic chars like Å, Ä and Ö is replaced by "?" in all components when reading a file name.

I created a new test application to try different things, attached below. Note that my test app adds "åäö" to a TMemo to demonstrate that it's not a font issue. FindFirst/FindFirstUTF8 and every conceivable combination of Utf8ToSys, Ansitowhatever, Utftowhatever does exactly nothing. How do I fix this?

My system: Debian 9 64-bit. Lazarus: 1.6.2+dfsg-2 date 2019-01-12, FPC version 3.0.0.
Output from '$ locale':
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME=en_AU.utf8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Screenshot at 2019-01-14 05-54-36.png (22.87 kB, 618x501 - viewed 284 times.)

UTFExperiment.zip (146.69 kB - downloaded 128 times.)

testdir.zip (0.6 kB - downloaded 119 times.)

Logged

CCRDude

Hero Member
Posts: 600

Re: Unicode won't work no matter what I do.

« Reply #1 on: January 14, 2019, 07:28:53 am »

In your test application, where do you take the string from? If you use it as a const, have you specified {$codepage utf8}?

Logged

egsuh

Hero Member
Posts: 1292

Re: Unicode won't work no matter what I do.

« Reply #2 on: January 14, 2019, 07:45:18 am »

In Windows, there are something called "locale", "Locale for non unicode-supporting applications" in full. Basically this must be the same as the operating system language but I could set it as other language and then characters are not displayed correctly in some applications. Not sure about Linux.

Logged

NonSpillable

New Member
Posts: 10

Re: Unicode won't work no matter what I do.

« Reply #3 on: January 14, 2019, 09:33:27 am »

Quote from: CCRDude on January 14, 2019, 07:28:53 am

In your test application, where do you take the string from? If you use it as a const, have you specified {$codepage utf8}?

From either a TSearchRec.Name via FindFirst/Next or FindFirstUTF8/NextUTF8 or from a TSearchRecUTF8. And a short constant in the code containing "åäö" is added to the TMemo to demonstrate that it is not a font issue. That is the "åäö" in the first line in the TMemo. As you see, reading file names from disk doesn't work.

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4468
I like bugs.

Re: Unicode won't work no matter what I do.

« Reply #4 on: January 14, 2019, 11:20:00 am »

Your code works well but the file and directory names in testdir are not UTF-8.
It feels strange because in a Linux system everything is UTF-8 by default. You must have copied it from a Windows PC.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

NonSpillable

New Member
Posts: 10

Re: Unicode won't work no matter what I do.

« Reply #5 on: January 14, 2019, 11:56:06 am »

Quote from: JuhaManninen on January 14, 2019, 11:20:00 am

Your code works well but the file and directory names in testdir are not UTF-8.
It feels strange because in a Linux system everything is UTF-8 by default. You must have copied it from a Windows PC.

Nope, they are created in Caja (MATEs file manager). No Windows here. If the file names are not UTF8 (how do you know?), they must be something older, and should therefore work as well. The files are created today, with Caja, on a Ext4 file system, on a up-to-date Debian installation.

Edit: Let me upload the exact same files but in a 7z-archive... Edit2: And one file raw.

testdir.7z (0.19 kB - downloaded 137 times.)

Int_filename_åäö.txt (0 kB - downloaded 124 times.)

« Last Edit: January 14, 2019, 11:59:21 am by NonSpillable »

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4468
I like bugs.

Re: Unicode won't work no matter what I do.

« Reply #6 on: January 14, 2019, 02:03:31 pm »

Quote from: NonSpillable on January 14, 2019, 11:56:06 am

Nope, they are created in Caja (MATEs file manager). No Windows here. If the file names are not UTF8 (how do you know?),

I know by clicking your package which then opens "Ark" in my Manjaro Linux. Ark comes with KDE.
No need to even extract the files. I can see in the Ark window the encoding is wrong.

Quote

they must be something older, and should therefore work as well.

What do you mean? Things work with UTF-8 encoding. Old or new, doesn't matter.

Quote

Edit: Let me upload the exact same files but in a 7z-archive... Edit2: And one file raw.

The single file has the right encoding. It means your archiving process goes wrong.
Let me guess, you run 7z under Wine. Hah! Why would you do that?

BTW, you should see the wrong encoding right in the Caja window. You don't need a test app for that.

« Last Edit: January 14, 2019, 02:05:31 pm by JuhaManninen »

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

NonSpillable

New Member
Posts: 10

Re: Unicode won't work no matter what I do.

« Reply #7 on: January 14, 2019, 04:23:54 pm »

Quote from: JuhaManninen on January 14, 2019, 02:03:31 pm

Quote from: NonSpillable on January 14, 2019, 11:56:06 am
Nope, they are created in Caja (MATEs file manager). No Windows here. If the file names are not UTF8 (how do you know?),
I know by clicking your package which then opens "Ark" in my Manjaro Linux. Ark comes with KDE.
No need to even extract the files. I can see in the Ark window the encoding is wrong.

Quote
they must be something older, and should therefore work as well.
What do you mean? Things work with UTF-8 encoding. Old or new, doesn't matter.

Quote
Edit: Let me upload the exact same files but in a 7z-archive... Edit2: And one file raw.
The single file has the right encoding. It means your archiving process goes wrong.
Let me guess, you run 7z under Wine. Hah! Why would you do that?
BTW, you should see the wrong encoding right in the Caja window. You don't need a test app for that.

Wine? What is it with people and windows? Noooo, I haven't run Win, nor any win program for nearly two decades! I'm using stock zip and stock 7z, from the GUI (Caja in this case). Nothing has anything with win, OSX, BSD or other OSes to do than Debian 9. No files I have uploaded here has been anywhere near any Windows or Wine, and are created today/yesterday.

What I meant with "older" was that there was a time before UTF8, when we had code pages for international tokens. No matter *what* decoding the file system use, Lazarus is the only thing not handling characters correctly. On my disk I have several decades of files, some with Swedish (åäöÅÄÖ) chars, created in different programs, different OSes (some as old as C64, Amiga, Atari, etc) and yes, even files created from DOS. I have never had any problem with any applications until now, when I did a fresh install of Debian 9 AMD64 (I usually use 32-bit) on an i7, fresh lazarus/FPC and suddenly all my programs stop working (I had to recompile them, since I migrate from 32- to 64-bits¹). But no other applications seems to have any problem, new or old, with new or old files.

Edit: Could there be any problems with EXT4, with linux eller other things, than lazarus? But lazarus is the only thing not working (that is, FindFirst/FindNext).

1) Side note, but since early 2000 I always thought that the time was ripe for 64-bit, but no, every single time I try 64-bits linux it let me down, something breaks, and breaks bad. This time it was lazarus. Other times it has been CAD-software, visualization software, media players/codecs, etc, etc, etc.

« Last Edit: January 14, 2019, 05:15:08 pm by NonSpillable »

Logged

engkin

Hero Member
Posts: 3112

Re: Unicode won't work no matter what I do.

« Reply #8 on: January 14, 2019, 05:30:55 pm »

Just curious, as I do not use Linux, do you guys need to add a unit like cwstring to have a WideStringManager?

@NonSpillable,

what do you get for:
DefaultSystemCodePage
DefaultFileSystemCodePage
DefaultRTLFileSystemCodePage?

Are you really using Lazarus: 1.6.2 and not 1.8.2 or that's just a typo?

Your testdir.zip testdir.7z do not look right on my side: Win using 7z.

Edit:
Some of your LC_* values are between quotation marks and some without, I don't know if that makes a difference?

Doing a quick search gave me the impression that the values you see for LC_* in a console could be different than their counterpart for a GUI.

« Last Edit: January 14, 2019, 05:54:47 pm by engkin »

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4468
I like bugs.

Re: Unicode won't work no matter what I do.

« Reply #9 on: January 14, 2019, 08:16:29 pm »

Quote from: NonSpillable on January 14, 2019, 04:23:54 pm

Edit: Could there be any problems with EXT4, with linux eller other things, than lazarus? But lazarus is the only thing not working (that is, FindFirst/FindNext).

FindFirst/FindNext work perfectly well. Your directory and file names are just plain wrong.
I don't understand why you don't see it in your Caja file manager.
See the attached screenshot of my Dolphin file manager.

Quote

1) Side note, but since early 2000 I always thought that the time was ripe for 64-bit, but no, every single time I try 64-bits linux it let me down, something breaks, and breaks bad. This time it was lazarus. Other times it has been CAD-software, visualization software, media players/codecs, etc, etc, etc.

Nonsense. I have used 64-bit Lazarus on Linux for about 7-8 years. Works well.

files.png (23.11 kB, 418x201 - viewed 290 times.)

« Last Edit: January 14, 2019, 08:32:47 pm by JuhaManninen »

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

Global Moderator
Hero Member
Posts: 4468
I like bugs.

Re: Unicode won't work no matter what I do.

« Reply #10 on: January 14, 2019, 08:25:18 pm »

Quote from: engkin on January 14, 2019, 05:30:55 pm

Just curious, as I do not use Linux, do you guys need to add a unit like cwstring to have a WideStringManager?

WideStringManager is only needed on Windows. Why do you mix it here?

Quote

what do you get for:
DefaultSystemCodePage
DefaultFileSystemCodePage
DefaultRTLFileSystemCodePage?

A user does not need to set such things, especially not on Linux. Why do you mix it here?

Quote

Are you really using Lazarus: 1.6.2 and not 1.8.2 or that's just a typo?

Makes no difference. Both use the new UTF-8 system. On Linux it matters even less.

Quote

Your testdir.zip testdir.7z do not look right on my side: Win using 7z.

Exactly, now you are on the right track.

Quote

Some of your LC_* values are between quotation marks and some without, I don't know if that makes a difference?

What LC_* values?

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

Global Moderator
Hero Member
Posts: 4468
I like bugs.

Re: Unicode won't work no matter what I do.

« Reply #11 on: January 14, 2019, 08:43:10 pm »

Quote from: egsuh on January 14, 2019, 07:45:18 am

In Windows, there are something called "locale", "Locale for non unicode-supporting applications" in full. Basically this must be the same as the operating system language but I could set it as other language and then characters are not displayed correctly in some applications. Not sure about Linux.

It is specific to Windows. Linux typically uses UTF-8 everywhere although there is no standard for that.
I don't know why you bring the Locales up in a Linux question.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Bart

Hero Member
Posts: 5290

Re: Unicode won't work no matter what I do.

« Reply #12 on: January 14, 2019, 10:17:32 pm »

Quote from: JuhaManninen on January 14, 2019, 08:25:18 pm

WideStringManager is only needed on Windows. Why do you mix it here?
Quote

IIRC then a WideStringManager is needed on Linux as well to have full unicode support.
Otherwise things like WideCompare* won't work.
See the comments in LazUtf8.
(This also means that any LCL program on *nix already has CWString unit enabled.)

Of course this makes no difference at all if the filename is encode wrong in the first place.

Bart

Logged

BeniBela

Hero Member
Posts: 906

Re: Unicode won't work no matter what I do.

« Reply #13 on: January 14, 2019, 10:31:45 pm »

Linux filenames have no encoding. Any byte sequence not containing #0 and '/' is allowed, like an arbitrary C-string.

Btw, I have collected some unusual file names here. A proper file handling needs to support all of them

Logged

https://www.benibela.de/index_en.html
https://github.com/benibela

Bart

Hero Member
Posts: 5290

Re: Unicode won't work no matter what I do.

« Reply #14 on: January 14, 2019, 10:44:31 pm »

Well, then I mean malformed byte sequences that do not represent any valid UTF8 codepoint.

Do Lazarus file IO routines handle correctly your collection of "nasty files"?

Bart

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Unicode won't work no matter what I do. (Read 6737 times)

NonSpillable

Unicode won't work no matter what I do.

CCRDude

Re: Unicode won't work no matter what I do.

egsuh

Re: Unicode won't work no matter what I do.

NonSpillable

Re: Unicode won't work no matter what I do.

JuhaManninen

Re: Unicode won't work no matter what I do.

NonSpillable

Re: Unicode won't work no matter what I do.

JuhaManninen

Re: Unicode won't work no matter what I do.

NonSpillable

Re: Unicode won't work no matter what I do.

engkin

Re: Unicode won't work no matter what I do.

JuhaManninen

Re: Unicode won't work no matter what I do.

JuhaManninen

Re: Unicode won't work no matter what I do.

JuhaManninen

Re: Unicode won't work no matter what I do.

Bart

Re: Unicode won't work no matter what I do.

BeniBela

Re: Unicode won't work no matter what I do.

Bart

Re: Unicode won't work no matter what I do.

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook