Recent

Author Topic: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?  (Read 9144 times)

Lutz Mändle

  • Jr. Member
  • **
  • Posts: 65
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #15 on: February 12, 2021, 04:17:21 pm »
Replace TSQLite3Connection in your example with TSQLConnector, that should work.

Then after the creation of DBConnectionX have a line like this:

Code: Pascal  [Select][+][-]
  1. ....
  2.     DBConnectionX.ConnectorType:='SQLite3';
  3.     DBConnectionX.CharSet:='utf8';
  4. ....
  5.  

OH1KH

  • Jr. Member
  • **
  • Posts: 63
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #16 on: February 12, 2021, 06:00:25 pm »
I have tried:

MainCon.CharSet:='CP_NONE';   (in my case MariaDB is used)

This passes ok when compiled, but when running it causes error:
TMySQL57COnnection: Failed to set connection character set: Can't initialize character set CP_NONE (path: compiled_in)

As the problem persist when reading varchar column from database and placing to to Tedit (.Text) that now with fpc3.2.0 causes unneeded character conversion I assume that I could get proper view of TEdit if charset would be CP_NONE, I.E. that no conversion would happen.

This is quite funny. Reading from TEdit(.Text) and placing it to database column works like before.
But to other direction it makes unneeded charcter conversion.
If I write values to databse with fpc320 compiled version and then open fpc304 compiled version for reading database everything is ok, like before.

Why the database string reading is twisted with unneeded conversion when writing has left untouched?
Well, lucky in a way. Then we would get double trouble!

« Last Edit: February 12, 2021, 06:02:27 pm by OH1KH »
--
Saku

OH1KH

  • Jr. Member
  • **
  • Posts: 63
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #17 on: February 12, 2021, 06:21:24 pm »
@OH1KH and @Hartmut,
have you set the charset property of the TSQLConnection to utf8?
I'm working with mysql and sqlite on windows and some linux flavors and have no utf8 related problems.

Tried that also:

 MainCon.CharSet:='UTF8';   

Compiles and runs OK.
When I set TEdit.text :='ÖÄÅöäå' and push it to database and then pull it back it is still 'ÖÄÅöäå' in Tedit.Text

But it arises an new problem when I start fpc304 compiled version. Then TEdit.Text shows out as '???'
And if I run fpc320 compiled version all old database entries have garbage view, while ones inserted with this compile version have special characters ok.
My database has > 10000 entries, but there are thousands of users running this program and they are facing same problem then.

It must be backward compatible in all ways !






--
Saku

Hartmut

  • Hero Member
  • *****
  • Posts: 749
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #18 on: February 12, 2021, 07:40:41 pm »
Replace TSQLite3Connection in your example with TSQLConnector, that should work.
Then after the creation of DBConnectionX have a line like this:
Code: Pascal  [Select][+][-]
  1.     DBConnectionX.ConnectorType:='SQLite3';
  2.     DBConnectionX.CharSet:='utf8';
Thank you very much for this helpful example. That sounds not difficult. I will test it when I have the time, that in case that it solves my problem, I can directly adapt it in all programs and common units with static and dynamic DB classes.

Lutz Mändle

  • Jr. Member
  • **
  • Posts: 65
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #19 on: February 12, 2021, 10:29:05 pm »
@OH1KH

the charset property should set to a value that matches the charset used by the database. From what you wrote earlier in this thread I assume that your tables are not configured for utf8. What shows the status command of the command line client?

It should something like this:
Code: Text  [Select][+][-]
  1. MariaDB [(none)]> status
  2. --------------
  3. mysql  Ver 15.1 Distrib 10.5.8-MariaDB, for Linux (x86_64) using  EditLine wrapper
  4.  
  5. Connection id:          7
  6. Current database:
  7. Current user:           dbuser@localhost
  8. SSL:                    Not in use
  9. Current pager:          less
  10. Using outfile:          ''
  11. Using delimiter:        ;
  12. Server:                 MariaDB
  13. Server version:         10.5.8-MariaDB MariaDB package
  14. Protocol version:       10
  15. Connection:             Localhost via UNIX socket
  16. Server characterset:    utf8mb4
  17. Db     characterset:    utf8mb4
  18. Client characterset:    utf8
  19. Conn.  characterset:    utf8
  20. UNIX socket:            /run/mysql/mysql.sock
  21. Uptime:                 1 hour 43 min 29 sec
  22.  
  23. Threads: 2  Questions: 12  Slow queries: 0  Opens: 16  Open tables: 10  Queries per second avg: 0.001
  24. --------------
  25.  
  26. MariaDB [(none)]>
  27.  

OH1KH

  • Jr. Member
  • **
  • Posts: 63
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #20 on: February 13, 2021, 09:32:06 am »
There are two databases in use:

1st choice is a database running in user's config folder (mysql_safe)

Connection:      Localhost via UNIX socket
Server characterset:   utf8
Db     characterset:   utf8
Client characterset:   utf8
Conn.  characterset:   utf8
UNIX socket:      /home/saku/.config/cqrlog/database/sock
Uptime:         1 min 6 sec

Second choice (normally used here) is in external mysql network server:

Server characterset:   latin1
Db     characterset:   utf8
Client characterset:   utf8
Conn.  characterset:   utf8
UNIX socket:      /var/lib/mysql/mysql.sock
Uptime:         3 days 21 hours 38 min 21 sec


I both cases databases have UTF-8 charset.

From reading many wiki pages because of this problem I have understood that if client  running program and database connected both use UTF-8 charset TMySQLConnection should automatic adopt that for use without any specification in connect source code.

Setting connection parameter  XXXXXX.CharSet:='UTF8'; makes this work ok with both fpc3.0.4 and fpc3.2.0 compiled versions.
Problem now is just old database contents that have been written with fpc3.0.4 compiled version(s) that do not have  had XXXXXX.CharSet:='UTF8'; line.

If continuing now with CharSet:='UTF8'; line. all future database additions have special characters ok, but all past information shows out as garbage.

Perhaps database update routine has to be added that goes through all old entries and converts them to "new normal".

--
Saku

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #21 on: February 13, 2021, 10:17:28 am »
What I can comment:
1. for SQLite3 there is hardcoded UTF-8 as connection charset because SQLite3 always assumed that character data are UTF-8 encoded. There is no way to change charset/collation for database/table/column or connection

2. As I understood if MySQL is used only with FPC3.2 then all is as expected. Problem arrises when old data are read, which were written in FPC3.0.4
If these data originated in Lazarus then they were UTF-8 encoded. So what hapend is that they were translated during transmision to MySQL database. Logical explaination is that default connection charset was other than UTF-8 in scenario with FPC3.0.4

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #22 on: February 13, 2021, 10:19:34 am »
OH1KH and Hartmut, just to be sure :
Do you use the Lazarus UTF-8 system? It is supported automatically in LCL applications. In programs you need to use unit LazUTF8.
See
 https://wiki.lazarus.freepascal.org/Unicode_Support_in_Lazarus
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Hartmut

  • Hero Member
  • *****
  • Posts: 749
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #23 on: February 13, 2021, 05:19:25 pm »
OH1KH and Hartmut, just to be sure :
Do you use the Lazarus UTF-8 system? It is supported automatically in LCL applications. In programs you need to use unit LazUTF8.

I am mostly working on console programs and because I faced endless problems when including unit LazUTF8 I try to avoid this unit whenever possible, because it sets many used charsets upside down. If you are interested in examples you can find some of them in reply #12 of https://forum.lazarus.freepascal.org/index.php/topic,51558.0.html where I had a discussion with wp about some of those problems. So unit LazUTF8 would not be a solution for me. But I will check before long if the suggestion of Lutz Mändle in reply #12 and #15 does solve the problem which I hope.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #24 on: February 13, 2021, 06:37:59 pm »
I am mostly working on console programs and because I faced endless problems when including unit LazUTF8 I try to avoid this unit whenever possible, because it sets many used charsets upside down.
Ok, now we know what causes your Unicode related problems. You must use the Unicode support through LazUTF8 or you are doomed ... as you have noticed yourself.
You could have mentioned initially that you try to avoid our well tested and supported Unicode system. People spent time searching a problem elsewhere.
BTW, it does not set charsets upside down. It sets Unicode with UTF-8 encoding. I don't see any reason not to use Unicode. It is 2021 already. Unicode has been around for 25+ years.
Why did you set your DB for Unicode if you don't want to use Unicode?

Quote
If you are interested in examples you can find some of them in reply #12 of https://forum.lazarus.freepascal.org/index.php/topic,51558.0.html where I had a discussion with wp about some of those problems.
I am not interested. I already know what happens. No need to fight against windmills.

Quote
So unit LazUTF8 would not be a solution for me. But I will check before long if the suggestion of Lutz Mändle in reply #12 and #15 does solve the problem which I hope.
Good luck with that!  ::)

OH1KH, I believe you have the same problem. Didn't you know about the Unicode support thing?
« Last Edit: February 13, 2021, 07:37:26 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Hartmut

  • Hero Member
  • *****
  • Posts: 749
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #25 on: February 14, 2021, 11:34:56 am »
You could have mentioned initially that you try to avoid our well tested and supported Unicode system. People spent time searching a problem elsewhere.
Sorry, I had no idea that there could be an influence. I read and understood the release notes that there would be a generally breaking code problem with FPC 3.2.0 and not, that this problem occurs only to people, who do not use unit LazUTF8. Again my petition for better and more helpful release notes.

Quote
Why did you set your DB for Unicode if you don't want to use Unicode?
I never set my SQLite DB's to UTF8. They always used UTF8 automatically for all those years since my 1st FPC version.

Currently unit LazUTF8 is not an option for me. I tried to point you to some reasons why. If the suggestion of Lutz Mändle works I would prefer this solution.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #26 on: February 14, 2021, 04:32:08 pm »
Sorry, I had no idea that there could be an influence. I read and understood the release notes that there would be a generally breaking code problem with FPC 3.2.0 and not, that this problem occurs only to people, who do not use unit LazUTF8. Again my petition for better and more helpful release notes.
From FPC's point of view the Lazarus UTF-8 solution is a hack.
Now I understand the change in FPC 3.2 was consistent with dynamic string codepage assignments elsewhere.
The UTF-8 solution works well when all your data is UTF-8. When you read / write data containing Windows codepage, you must convert to / from Unicode ASAP.
Please read and understand this :
 https://wiki.lazarus.freepascal.org/Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage

Quote
I never set my SQLite DB's to UTF8. They always used UTF8 automatically for all those years since my 1st FPC version.
Ok, I will ask in a different way: Why do you want to use Windows codepages?
For decades they have been a source of pain and sorrow and texts with question marks ('?????????') when transferred between locales.
Unicode solves those problems!
You must be a masochist to still use Windows codepages.

Quote
Currently unit LazUTF8 is not an option for me. I tried to point you to some reasons why. If the suggestion of Lutz Mändle works I would prefer this solution.
Surely it is an option. Just wrap your mind around the issue. It may be simpler than you realize.
« Last Edit: February 14, 2021, 04:41:52 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Hartmut

  • Hero Member
  • *****
  • Posts: 749
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #27 on: February 15, 2021, 02:31:51 pm »
When you read / write data containing Windows codepage, you must convert to / from Unicode ASAP. Please read and understand this :
https://wiki.lazarus.freepascal.org/Unicode_Support_in_Lazarus#Reading_.2F_writing_text_file_with_Windows_codepage
Thanks for that link. I looked into it and added it to my list where I collect infos about UTF8 related things if I might need it one day.

Quote
Ok, I will ask in a different way: Why do you want to use Windows codepages?
It's not my declared goal to use Windows charsets everywhere. But great parts of my sources rely on a specific charset (for which they had been written and tested): all my sources for lots of programs including many common units have grown up over nearly 35 years (I started with Turbo Pascal 3). Great parts of this sources would not work with UTF8.

For me it is important, that I do not loose enormous time by changing things, which had already been finished including costly testing (!) and which work perfectly, as long as there is no real need. Because I'm only a hobbyist and unfortunately the time for each hobby is limited :-((

Quote
For decades they have been a source of pain and sorrow and texts with question marks ('?????????') when transferred between locales.
I had some cases in the past, but I solved them long time ago. That means, this is definitely not my problem. But this is funny: when I *included* unit LazUTF8, one of the problems which you did not want to read was, that in some cases '???' instead of 'äöü' etc. appeared. But only *with* this unit and definitely *never* without this unit.

Please let us stop this discussion now. You did not want to read some examples of the problems which I faced with unit LazUTF8 - which of course is free to you - but then all its disadvantages for me might not be comprehensible to you.

I highly appreciate this forum (including your part) for changing technical informations e.g. what is possible or how something can made working. But please allow, that I want to be the one who decides which of 2 possible solutions I would prefer in my personal situation, because I am the one who has to shoulder the resulting work.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #28 on: February 16, 2021, 08:34:42 pm »
I had some cases in the past, but I solved them long time ago. That means, this is definitely not my problem. But this is funny: when I *included* unit LazUTF8, one of the problems which you did not want to read was, that in some cases '???' instead of 'äöü' etc. appeared. But only *with* this unit and definitely *never* without this unit.
Yes because a wrong encoding was assumed.
How well does your code work with a phrase in Greek or Russian? ... Or in Hindi or Chinese?

Quote
Please let us stop this discussion now. You did not want to read some examples of the problems which I faced with unit LazUTF8 - which of course is free to you - but then all its disadvantages for me might not be comprehensible to you.
Actually I read your posts briefly. You had problems iterating UTF-8 codepoints. It could be solved easily.
Do you need to show text outside ASCII in Windows console? That may be the only reason one would omit the Unicode solution.
Your data in DB is Unicode with UTF-8 encoding. It is a perfect match for our UTF-8 system. It means you never need to convert string encoding.
I still encourage you to create a test version of your code with LazUTF8 where you strip all conversions. Just assign data and you are good.
For specific codepoint iteration issues you get help from here. Wiki has info, too :
 https://wiki.lazarus.freepascal.org/UTF8_strings_and_characters
There is even a nice "character" iterator supporting combining codepoints :
 https://wiki.lazarus.freepascal.org/Unicode_Support_in_Lazarus#CodePoint_functions_for_encoding_agnostic_code

Quote
I highly appreciate this forum (including your part) for changing technical informations e.g. what is possible or how something can made working. But please allow, that I want to be the one who decides which of 2 possible solutions I would prefer in my personal situation, because I am the one who has to shoulder the resulting work.
Yes of course. A valid solution also is to stick with a compiler version that works for you. If it works now, it will continue to work in the future.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

EgonHugeist

  • Jr. Member
  • **
  • Posts: 78
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #29 on: February 20, 2021, 09:41:38 am »
@LacaK

I do understand the users having trouble with the FPC3.2 and not using LazUTF8.

The input/ouput data should be equal to the database encoding, except
Code: Pascal  [Select][+][-]
  1. T(String/Memo)Field.Transliterate
is set to true. See: https://www.freepascal.org/docs-html/fcl/db/tstringfield.html. That than means each raw "string" should have the codepage of the field if Transliterate is disabled. It doesn't matter if ithe property is declared as "String" and the DefaultSystemCodePage <> TField.CodePage. IIRC the fpc defaults is:
Code: [Select]
T(String/Memo)Field.Transliterate := False;. Did you change that to the Delphi defaults inbetween? If not it's a bug from my POV.

Just what i'm thinking and how i implemented it on Zeos8. What do you think?
I don't wanna comment the "LCL all utf8" design here.

Regards from ZeosDevTeam, Michael

 

TinyPortal © 2005-2018