Recent

Author Topic: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?  (Read 10788 times)

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #30 on: March 01, 2021, 09:37:41 am »
May be that implementation of Transliterate property / Translate method is incomplete.
I remembered that I studied it years ago, but I never understood it fully.
So I can not comment on this ... byt I think, that most of users are not aware of this Transliterate property ...

EgonHugeist

  • Jr. Member
  • **
  • Posts: 78
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #31 on: March 09, 2021, 06:52:34 pm »
Laco, i just wanna help. Honestly it's a bug for me (and the reporters) what you/others are doing. Dealing with DB's vs String+StringCodePage doesn't mean you should convert "something by" default just because the codepage-aware compiler is expecting a string with defaultsystemcodepage. It's 100% accapltable if the output/input string has the codepage of the database column. I'm trying to say: You need an indicator todo a CP conversion, OTH it's an inacaptable behavior nobody is happy about.

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #32 on: March 20, 2021, 11:43:48 am »
I have the same problem. I have a huge database filled with strings by programms compiled with fpc 3.0.4 and earlier. I accepted the remedy that to my oppinion fpc 3.0.4 did the encoding wrong, because other programms like HeidiSQL or SQL-Workbench did not show the special-chars in the right way.
Starting with fpc 3.2.0 the strings are also displayed wrongly.So I needed the old behavior.... but when I try to compile the code with Laz2.0.8& fpc 3.0.4 the strings are also displayed wrongly,
and now I have a problem.
btw: No Console Program, Data goes TMySQL57Connection -> TSQLQuery -> TDataset -> TDBGrid
So no manual fiddeling with data.

I tried LazUTF8 -> no change.
I tried Charset to whatever -> no change.
Any help is welcome.
I am on Laz 2.0.8 fpc 3.0.4
« Last Edit: March 20, 2021, 01:50:35 pm by jc99 »
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #33 on: March 20, 2021, 04:03:59 pm »
I have the same problem. I have a huge database filled with strings by programms compiled with fpc 3.0.4 and earlier. I accepted the remedy that to my oppinion fpc 3.0.4 did the encoding wrong, because other programms like HeidiSQL or SQL-Workbench did not show the special-chars in the right way.
Starting with fpc 3.2.0 the strings are also displayed wrongly.So I needed the old behavior.... but when I try to compile the code with Laz2.0.8& fpc 3.0.4 the strings are also displayed wrongly,
and now I have a problem.
btw: No Console Program, Data goes TMySQL57Connection -> TSQLQuery -> TDataset -> TDBGrid
So no manual fiddeling with data.

I tried LazUTF8 -> no change.
I tried Charset to whatever -> no change.
Any help is welcome.
I am on Laz 2.0.8 fpc 3.0.4
So, you say that with 3.0.4 and 3.2.0 your data isn't displayed correct? And previously with 3.0.4 it was displayed correct? Then you probably changed something in your project.
What is character encoding in your database?
Can you show some example how your data is shown now and how it should be shown (incorrect data - correct data)?

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #34 on: March 20, 2021, 04:29:50 pm »
Lazy people like me, would be motivated to try to help if you post a simple project showing the problem.

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #35 on: March 20, 2021, 05:25:41 pm »
Lazy people like me, would be motivated to try to help if you post a simple project showing the problem.
It would be best for me if he would prepare test data. Create a copy of a table, one row with sample data and exports it with mysqldump. And he need to say how data should look like.
Then we could analyze how is string written in his database (and propose a solution).

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #36 on: March 20, 2021, 05:37:09 pm »
Lazy people like me, would be motivated to try to help if you post a simple project showing the problem.
It would be best for me if he would prepare test data. Create a copy of a table, one row with sample data and exports it with mysqldump. And he need to say how data should look like.
Then we could analyze how is string written in his database (and propose a solution).

Test project without data is not a test project, and I will not be motivated without data, I am that lazy. And I was/still that lazy not to mention it in my previous post.

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #37 on: March 26, 2021, 08:45:24 am »
Sorry I was busy for some Days ...
A test-procect is hard, cause you need a DB filled in old manner, my actual DB is ~4Gb ~50k Main-Entries, lot's of (very personal) data.
I'll try, maybe share a GitHub-Link ...
(btw some of the code is published on GitHub)[BRI'll set up a VM with Laz2.0.8 ... to test (and work as workaround)

 
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

jcmontherock

  • Sr. Member
  • ****
  • Posts: 278
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #38 on: March 26, 2021, 10:35:26 am »
What I am using is the sql script "set names=..." depending of the encoding of the DB, just after the connection to server. This sql set the encoding for everybody (server, client). All scripts used after that should have the same encoding.
Windows 11 UTF8-64 - Lazarus 4.0RC2-64 - FPC 3.2.2

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #39 on: March 26, 2021, 10:38:59 am »
Sorry I was busy for some Days ...
A test-procect is hard, cause you need a DB filled in old manner, my actual DB is ~4Gb ~50k Main-Entries, lot's of (very personal) data.
I didn't mean whole database. One word would possibly be enough. You can create new table (create table new_table like your_table), insert one row from existing table, delete data not relevant from that row and export that table only.

Seenkao

  • Hero Member
  • *****
  • Posts: 652
    • New ZenGL.
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #40 on: March 26, 2021, 10:53:48 am »
Как я понял, раньше происходила автоматическая кодировка в UTF-8? В настоящее время вы вынуждены вручную переводить в UTF-8?

Eng: As I understand it, there was automatic encoding in UTF-8 before? Are you currently forced to manually translate to UTF-8?
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.

Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #41 on: March 26, 2021, 02:07:41 pm »
[...]
So, you say that with 3.0.4 and 3.2.0 your data isn't displayed correct? And previously with 3.0.4 it was displayed correct? Then you probably changed something in your project.

I thought so too, ... (but I have found nothing yet)
 
Quote
What is character encoding in your database?

--> UTF8

Quote
Can you show some example how your data is shown now and how it should be shown (incorrect data - correct data)?

Wälde -> Wälde
Günther -> Günther
Maaß -> Maaß
DB-Code (SQL)
Code: SQL  [Select][+][-]
  1. /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
  2. /*!40101 SET NAMES utf8 */;
  3. /*!50503 SET NAMES utf8mb4 */;
  4. /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
  5. /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
  6.  
  7. CREATE DATABASE IF NOT EXISTS `rnz-traueranzeigen` /*!40100 DEFAULT CHARACTER SET utf8 */;
  8. USE `rnz-traueranzeigen`;
  9.  
  10. CREATE TABLE IF NOT EXISTS `anzeigen` (
  11.   `idAnzeige` BIGINT(20) NOT NULL AUTO_INCREMENT COMMENT 'ID des Eintrags',
  12.   `Auftrag` VARCHAR(20) NOT NULL COMMENT 'Auftragsnummer ',
  13.   `Stichwort` VARCHAR(100) DEFAULT NULL COMMENT 'Ausgelesenes Stichwort',
  14.   `Nachname` VARCHAR(100) DEFAULT NULL COMMENT 'Eingegebener Nachname (bei Personen)',
  15.   `Vorname` VARCHAR(100) DEFAULT NULL COMMENT 'Eingegebener Vorname  (bei Personen)',
  16. --[...]
  17.   `Pfad` VARCHAR(100) DEFAULT NULL COMMENT 'rel. Pfad zu den Daten',
  18.   `Bild` longblob COMMENT 'Bild-Daten',
  19.   `LinkID` BIGINT(20) DEFAULT NULL COMMENT 'ID der "Hauptanzeige" (bei Nachrufen, Danksagungen und Errinnerungen)',
  20.   `ProfileImg` VARCHAR(50) DEFAULT NULL COMMENT 'Bild der Person (Profilbild)',
  21.   `ProfImgCount` SMALLINT(6) NOT NULL DEFAULT '-1' COMMENT 'Anzahl der PDF-Bilder',
  22.   `TimeStamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Zeitstempel der letzten Änderung',
  23.   PRIMARY KEY (`idAnzeige`),
  24.   UNIQUE KEY `Auftrag` (`Auftrag`),
  25.   KEY `Stichwort` (`Stichwort`),
  26. --[...]
  27.   KEY `ixLinkID` (`LinkID`),
  28.   KEY `ixTimeStamp` (`TimeStamp`)
  29. ) ENGINE=InnoDB AUTO_INCREMENT=48862 DEFAULT CHARSET=utf8 COMMENT='RNZ-Anzeigen ab Mrz 2015 ';
  30.  
« Last Edit: March 26, 2021, 02:24:20 pm by jc99 »
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #42 on: March 26, 2021, 02:21:59 pm »
The Places-Table would be:
Code: SQL  [Select][+][-]
  1. /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
  2. /*!40101 SET NAMES utf8 */;
  3. /*!50503 SET NAMES utf8mb4 */;
  4. /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
  5. /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
  6.  
  7. CREATE DATABASE IF NOT EXISTS `rnz-traueranzeigen` /*!40100 DEFAULT CHARACTER SET utf8 */;
  8. USE `rnz-traueranzeigen`;
  9.  
  10. CREATE TABLE IF NOT EXISTS `orte` (
  11.   `idOrte` INT(11) NOT NULL COMMENT 'ID',
  12.   `Ortname` VARCHAR(100) NOT NULL COMMENT 'Kurzname',
  13.   `LangName` VARCHAR(300) DEFAULT NULL COMMENT 'Voll qualifizierter Name',
  14.   `idOrte_Lnk` VARCHAR(300) DEFAULT NULL COMMENT 'Bei Kopie Verweiss auf (haupt-)Ort',
  15.   `Longitude` VARCHAR(20) DEFAULT NULL COMMENT 'Längengrad des Orts',
  16.   `Latitude` VARCHAR(20) DEFAULT NULL COMMENT 'Breitengrad des Orts',
  17.   PRIMARY KEY (`idOrte`),
  18.   KEY `Ortname` (`Ortname`)
  19. ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  20.  
  21. /*!40000 ALTER TABLE `orte` DISABLE KEYS */;
  22. INSERT INTO `orte` (`idOrte`, `Ortname`, `LangName`, `idOrte_Lnk`, `Longitude`, `Latitude`) VALUES
  23.         (2, 'Schönbrunn', 'Schönbrunn, Rhein-Neckar-Kreis, Baden-Württemberg, Germany', NULL, 'E8,9284', 'N49,4123');
  24. /*!40000 ALTER TABLE `orte` ENABLE KEYS */;
  25.  
  26. /*!40101 SET SQL_MODE=IFNULL(@OLD_SQL_MODE, '') */;
  27. /*!40014 SET FOREIGN_KEY_CHECKS=IF(@OLD_FOREIGN_KEY_CHECKS IS NULL, 1, @OLD_FOREIGN_KEY_CHECKS) */;
  28. /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
  29.  
btw: the name of the place is Schönbrunn !
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #43 on: March 26, 2021, 03:56:57 pm »
To get the following results:
Wälde -> Wälde
Günther -> Günther
Maaß -> Maaß

The string is UTF8, but the codepage is cp1252

For instance, try:
Code: Pascal  [Select][+][-]
  1. s := "Maaß";
  2. memo1.Append(s);// gives correct word
  3.  
  4. SetCodePage(RawByteString(s), 1252, False);
  5. memo1.Append(s);// gives the wrong word

Notice that, in this example, the data itself is correct, only the codepage is wrong.

The more examples of wrong conversions you give the more we confirm the real codepage (cp1252). What is the codepage of your system?
« Last Edit: March 26, 2021, 04:09:52 pm by engkin »

dseligo

  • Hero Member
  • *****
  • Posts: 1458
Re: UTF-8 and Laz 2.0.10/fpc 3.2.0 difference?
« Reply #44 on: March 26, 2021, 09:02:22 pm »
Try this:
Code: Pascal  [Select][+][-]
  1. uses lazutf8;
  2. ...
  3. var s:String;
  4. ...
  5. s:='Wälde'; // -> Wälde
  6. Memo1.Append(s+' --> '+Utf8ToWinCP(s));
  7. s:='Günther'; // -> Günther
  8. Memo1.Append(s+' --> '+Utf8ToWinCP(s));
  9. s:='Maaß'; // -> Maaß
  10. Memo1.Append(s+' --> '+Utf8ToWinCP(s));

 

TinyPortal © 2005-2018