Recent

Author Topic: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?  (Read 9077 times)

OH1KH

  • Jr. Member
  • **
  • Posts: 63
UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« on: February 10, 2021, 10:32:52 am »
HI!

PC has Fedora 32 with env setting as LANG=fi_FI.UTF-8

I have a program compiled with Laz2.0.8/fpc3.0.4 where I write to TEdit a string that contains some special characters like ÖÄÅöäå.
Content is then saved to mysql varchar column.

If I look column content with console mysql it is "garbage" I.E every special char is different and doubled in database column.
How ever  when it is taken back to TEdit all looks same ÖÄÅöäå as it was written.

Without any change I compile same source with Laz 2.0.10/fpc3.2.0 and with that write to TEdit a string that contains same special characters like ÖÄÅöäå, and put them to database.
The console mysql view shows same "garbage" content in database column as before.

If column is taken back to TEdit it shows out now same garbage, not proper text as with Laz2.0.8/fpc3.0.4 compiled version.

What is changed?
I have tried to find a glue. No luck.
It looks like reading from database does not do the same conversion as with older laz/fpc
--
Saku

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2020
  • Former Delphi 1-7, 10.2 user
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #1 on: February 10, 2021, 11:16:12 am »
Did you check the FPC 3.2.0 release notes: User Changes and New Features?

This User Changes breaking change looks like it may be relevant.

[Edit: Fix URL]
« Last Edit: February 11, 2021, 07:19:31 am by trev »

OH1KH

  • Jr. Member
  • **
  • Posts: 63
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #2 on: February 10, 2021, 02:46:07 pm »
Yes I did.
Several times and also this part, but I did not understand what to do to resume old way functionality.

By my testing that has somehow "one direction" effect.
Strings written to database with Laz2.0.10/fpc3.2.0  return completely readable if they are read back to TEdit with Laz2.0.8/fpc3.0.4 compiled version.

https://github.com/ok2cqr/cqrlog/issues/323
--
Saku

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2020
  • Former Delphi 1-7, 10.2 user
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #3 on: February 11, 2021, 07:30:14 am »
So now you have your answer to what has changed which was documented in the release notes. Those notes are always useful reading.

OH1KH

  • Jr. Member
  • **
  • Posts: 63
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #4 on: February 11, 2021, 03:49:51 pm »
Hi !
There is no answer how to turn on compatibility to old way!

Is it really so that I have to change nearly 900 lines of code from XXXXX.AsString to XXXXX.AsBinary and then pass result to function loop that creates string without conversion ??

I test that and it works, but it is a huge job to do for source that is ok with old fpc.
And very stupid requirement just like regexp that does not allow empty strings.



--
Saku

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #5 on: February 11, 2021, 04:44:14 pm »
So now you have your answer to what has changed which was documented in the release notes. Those notes are always useful reading.
trev, your answer was not very helpful. You should tell how to solve the problem if you really know.
I remember other people asking about the same issue. I don't remember how it was solved though. Is it something that got fixed in FPC 3.2.1 fixes branch? I hope they will release 3.2.2 ASAP.
I don't do programming with DBs currently. This bug did not hit me.
In any case a breaking change like this is bad. A realistic workaround is to stick with FPC 3.0.4 until it gets fixed. You can build the latest Lazarus, including Lazarus trunk, with FPC 3.0.4.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #6 on: February 11, 2021, 06:15:58 pm »
Is it really so that I have to change nearly 900 lines of code from XXXXX.AsString to XXXXX.AsBinary and then pass result to function loop that creates string without conversion ??
Typecasting the binary value to RawByteString should prevent conversion. But yes, it must be added everywhere. Not an easy solution.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Hartmut

  • Hero Member
  • *****
  • Posts: 742
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #7 on: February 12, 2021, 09:02:29 am »
Seems that OH1KH has the same problem that I have (on Windows) since I tried FPC 3.2.0:

When I read data from a SQLite-DB, for so many years (at least since FPC 2.6.4 and until 3.0.4) this data was always in UTF8 - for console and GUI programs - regardless whether Unit LazUTF8 was used or not. But now (if Unit LazUTF8 is not used) this data is instead returned in Windows-charset (ANSI 1252). But the SELECT-Statements, if they include characters like Ä Ö Ü ä ö ü ß, have still to be in UTF8. Crazy...

Of course I read above release notes multiple times. But I do not understand too much of what is written there - especially not, how a solution for existing code could look - and I'm not a bloody beginner (started with Turbo Pascal 3 nearly 35 years ago).

I wasted 2 days with useless tries to find a global/central solution, which does not require to change endless code lines in every program and in multiple common units but I failed. Then I had to stop this because I have enough other todo's with higher prio and I decided for now not to use FPC 3.2.0 (but this is not the sense of a new release).
And this was not the 1st time, where breaking code changes in a new release in combination with insufficient release notes caused to me a lot of time consuming and frustrating troubles :-((

I appreciate the continuous work of the FPC developers to improve and enhance FPC. Thanks a lot to them. But if they break existing code, I think it is important, to write in the release notes not only what they have changed and why. If things are not trivial - as here - additionally there should be clear instructions, how to adapt existing code:
 - step 1: do this
 - step 2: do that
 - step 3: if you have ... then do ... else do ...
If not, FPC developers do a disservice to a lot of FPC users who have existing code: of course the developer, who changes something like this, is an expert in this area and is the one, who can at easiest overlook how is the best/easiest way to adapt existing code. A lot of "normal FPC users" / hobbyists are far away from that and are thrown into a lot of trouble plus the risk of adding unneccessary bugs.


How could / should a solution for this problem look like?
 - it should not be neccessary that every FPC user has to change "hundreds" of code lines
 - there should be something like a global/central setting in one of the classes, which are involved to access a database (e.g. TSQLConnection?) so that you "only" have to adapt 1 code line for each whole database
 - and the effort should be reasonable, that common units could be adapted in a way (e.g. via not too much $IFDEF), that they can be compiled both with the new FPC release and with some older FPC versions.


@JuhaManninen:
I see your name since a long time and assume that you have some influence in the FPC community. Would you please be so kind to forward the blue and green parts to the FPC developers? I hope that this could help
 - that release notes in the future become more expressive and helpful - especially for breaking code changes
 - that this current issue will be fixed in a wise way.
Thanks a lot to you.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #8 on: February 12, 2021, 10:06:45 am »
@JuhaManninen:
I see your name since a long time and assume that you have some influence in the FPC community. Would you please be so kind to forward the blue and green parts to the FPC developers? I hope that this could help
 - that release notes in the future become more expressive and helpful - especially for breaking code changes
 - that this current issue will be fixed in a wise way.
Thanks a lot to you.
I am a Lazarus developer but not really involved in FPC development. I haven't even followed fpc-dev and fpc-pascal mailing lists diligently recently.
Some FPC developers do follow this forum.
Does nobody really know a solution for this problem? Will the next FPC 3.2.2 fix it?
« Last Edit: February 13, 2021, 10:39:01 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Lutz Mändle

  • Jr. Member
  • **
  • Posts: 65
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #9 on: February 12, 2021, 10:18:57 am »
@OH1KH and @Hartmut,
have you set the charset property of the TSQLConnection to utf8?
I'm working with mysql and sqlite on windows and some linux flavors and have no utf8 related problems.

Hartmut

  • Hero Member
  • *****
  • Posts: 742
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #10 on: February 12, 2021, 11:42:08 am »
@OH1KH and @Hartmut,
have you set the charset property of the TSQLConnection to utf8?
I'm working with mysql and sqlite on windows and some linux flavors and have no utf8 related problems.
Thanks Lutz for your reply. Here is how I did the DB initialization:

Code: Pascal  [Select][+][-]
  1. var DataSourceX: TDataSource;
  2.     DBConnectionX: TSQLite3Connection;
  3.     SQLQueryX: TSQLQuery;
  4.     SQLTransactionX: TSQLTransaction;
  5. ...
  6.    DataSourceX:=TDataSource.Create(nil);           {create all vars: }
  7.    DBConnectionX:=TSQLite3Connection.Create(nil);
  8.    DBConnectionX.CharSet:='UTF8'; {tried other spellings, see below}
  9.    SQLQueryX:=TSQLQuery.Create(nil);
  10.    SQLTransactionX:=TSQLTransaction.Create(nil);
  11.  
  12.    DBConnectionX.Transaction:=SQLTransactionX;     {connect all vars: }
  13.    SQLQueryX.Database:=DBConnectionX;
  14. // SQLQueryX.Transaction:=SQLTransactionX; {happens automatically}
  15.    DataSourceX.Dataset:=SQLQueryX;
  16.  
  17.    DBConnectionX.Name:='DBConnection';
  18.    DBConnectionX.DatabaseName:='mydb.sqlite';

I tried some other spellings like 'cpUTF8', 'utf8', 'CPUTF8', 'CP_UTF8' because, as often, I found no documentation about the allowed names with reasonable effort (even the Objectinspector does not have a ComboBox for that property where you could see the spellings).
But all this changed nothing.

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #11 on: February 12, 2021, 11:56:50 am »
As it was already mentioned TSQLConnection.CharSet property plays here major role.
Try set it to 'utf8mb4' for MySQL
https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html

Lutz Mändle

  • Jr. Member
  • **
  • Posts: 65
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #12 on: February 12, 2021, 01:11:38 pm »
I'm using a TSQLConnector and set the property ConnectorType to 'SQLite3', DatabaseName is the filename of the sqlitedb, Hostname, UserName and Password leave empty.
According to the sources the spellings 'utf8', 'utf-8' and 'utf8mb4' for the charset property should have the same effect.

In TSQLite3Connection the method GetConnectionCharSet is overridden and always returns 'utf8' no matter how the charset property is set.

Hartmut

  • Hero Member
  • *****
  • Posts: 742
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #13 on: February 12, 2021, 02:31:55 pm »
As it was already mentioned TSQLConnection.CharSet property plays here major role. Try set it to 'utf8mb4' for MySQL
Thanks. I tested this in a console program with FPC 3.2.0 on Windows 7 32-bit, but unfortunately nothing changed.

I'm using a TSQLConnector and set the property ConnectorType to 'SQLite3', DatabaseName is the filename of the sqlitedb, Hostname, UserName and Password leave empty. According to the sources the spellings 'utf8', 'utf-8' and 'utf8mb4' for the charset property should have the same effect.
Thanks for this suggestion. Until now I've never heard of TSQLConnector and must 1st find out how this thing has to be interlaced in the already existing combination of the 4 above classes which I normally use for DB's which is not trivial to me (in some programs I used the Objectinspector to set up static classes while in others I needed dynamic classes as shown above). Unfortunately these days I don't have more time to dive into this, but I will try it later. How sure are you that it will solve the problem?

Quote
In TSQLite3Connection the method GetConnectionCharSet is overridden and always returns 'utf8' no matter how the charset property is set.
If I understand this correctly, all my tries to set 'TSQLite3Connection.CharSet' some month ago and now again were foredoomed and wasted time. I had tried it, because I understood above release notes in this way which say:
Quote
New behaviour: When those fields are created there can be specified CodePage (if none specified CP_ACP is assumed), which defines encoding of character data presented by this field. In case of sqlDB TSQLConnection.CharSet is usualy used.

Again I am very unhappy how release notes in such a non trivial case were realized :-((

ASBzone

  • Hero Member
  • *****
  • Posts: 678
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #14 on: February 12, 2021, 02:43:39 pm »
Hi !
There is no answer how to turn on compatibility to old way!

Is it really so that I have to change nearly 900 lines of code from XXXXX.AsString to XXXXX.AsBinary and then pass result to function loop that creates string without conversion ??

I test that and it works, but it is a huge job to do for source that is ok with old fpc.
And very stupid requirement just like regexp that does not allow empty strings.


Are you able to explicitly set the code page in your application so that it returns it to that code page when reading from SQL?
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

 

TinyPortal © 2005-2018