Lazarus

Programming => Databases => Topic started by: marclebrun on June 23, 2021, 08:20:20 am

Title: Firebird database with charset NONE
Post by: marclebrun on June 23, 2021, 08:20:20 am
Hi  :)

There is a 20+ years old database in a company.
The server is Ubuntu 12.04 using Firebird 2.5.
And all the applications are running on Windows 10.

The database has been created without specifying any charset.
Code: MySQL  [Select][+][-]
  1. SELECT rdb$character_set_name FROM rdb$database; => "NONE"

There is an application written in Lazarus 1.4.4 using the old FBLib library from Alessandro Batisti https://github.com/graemeg/firebird-lib (https://github.com/graemeg/firebird-lib)

When connecting, this application specifies a charset of ISO8859_1

Code: Pascal  [Select][+][-]
  1. DB := TFBLDatabase.Create(nil);
  2. DB.CharacterSet := 'ISO8859_1';
  3. ...

So far so good, but...

I want to migrate this application from Lazarus 1.4.4 to Lazarus 2.0.12.

No matter what charset I use when connecting, the result is the same : all characters with accents are replaced with question marks.
Exemple :
Code: [Select]
H?l?ne
J?r?me
Jo?l

I also tried using the TIBConnection from the Lazarus components but the result is exactly the same.

It looks like something has changed between Lazarus 1.4 and Lazarus 2.x regarding character encoding...?
I guess I will need to recreate this (big) database and find a way to convert the existing data from its original charset to UTF8.
Is it possible to convert the charset while restoring a backup ?
Is there another solution ?

Many thanks for your answers :D

Marc
Title: Re: Firebird database with charset NONE
Post by: Zvoni on June 23, 2021, 08:33:58 am
Maybe this can shed some light?
https://stackoverflow.com/questions/32249389/convert-character-set-from-iso8859-1-to-utf8
Title: Re: Firebird database with charset NONE
Post by: Fantablup on June 23, 2021, 08:47:59 am
Look at this link

http://www.destructor.de/firebird/gbak.htm (http://www.destructor.de/firebird/gbak.htm)

They added two repair fixes in 2.5
Look at "Repair malformed"

Hope it helps. I have not ried it.
Seems like the only solution.
Title: Re: Firebird database with charset NONE
Post by: marclebrun on June 23, 2021, 09:00:53 am
Thank you both for your replies.

I had already seen that discussion on stackoverflow but the database is so big that I cannot just recreate some columns and migrate the data...

I will check the page about gbak, I hope I will find something useful.

I also read about FBCopy, do you guys know if it works well for this kind of issue ?

Thanks  :D

Marc
Title: Re: Firebird database with charset NONE
Post by: Fantablup on June 23, 2021, 09:08:45 am
Before version 2.5, they had the fix outside Gbak, but has included it in 2.5.
It says that should fix with a database with character set NONE.
Title: Re: Firebird database with charset NONE
Post by: Zvoni on June 23, 2021, 09:20:00 am
Thank you both for your replies.

I had already seen that discussion on stackoverflow but the database is so big that I cannot just recreate some columns and migrate the data...

Marc
Why not? Since it's a one-time thing...
As they wrote on SO:
Export the DDL,
tweak the DDL with the "new" charset
create the new Database
dump the old data (data only) from the old DB
import it into the new DB
Test/QA the new Database
Roll out
Title: Re: Firebird database with charset NONE
Post by: marclebrun on June 23, 2021, 10:11:02 am
Why not? Since it's a one-time thing...

Because the database has 362 tables, and some of them have more than 200 fields, ...
I know it's not well designed  :)

What tool would you use to dump the data from the old DB to the new one ?
Title: Re: Firebird database with charset NONE
Post by: Zvoni on June 23, 2021, 10:22:50 am
Since it's FireBird, FlameRobin should be the first try
Otherwise DBeaver

EDIT: 362 tables?
That's..... cute....
Just looking at the main schema of our IBM DB2 i see 1100 physical Tables (not counting views and such)
and some 700 Schemas, each with its own number of tables.....
Title: Re: Firebird database with charset NONE
Post by: devEric69 on June 23, 2021, 11:37:00 am
Hello,

To export (in *.sql text-file) just the DDL statements of the database structure (synopsis of the database's scheme), and then to export the structure and its data in order to reimported those data into a new database structure - for example - I use fbsql (incuded in IBX for Lazarus, an IT solution from MWA Software, dedicated to firebird development). It works well for me (but I've never tried it on such a big database).

Edit (important): I've only tested this solution with firebird 3 (I see @marclebrun, that you are still in version 2.5)!
Title: Re: Firebird database with charset NONE
Post by: tonyw on June 23, 2021, 01:09:35 pm
 FBSQL is an example utility provided as source code with the IBX release. It probably won't do what you want without a small change to the source code.

What FBSQL can do is to dump an entire database as metadata plus data into a text file, blobs and arrays included. This means that your entire database can be rendered as text and hence edited.

What I suggest you do is to get hold of the IBX source and compile the FBISQL example, but changing line 407 of fbsql.lpr from

 FIBDatabase.Params.Values['lc_ctype'] := 'UTF8';

to

 FIBDatabase.Params.Values['lc_ctype'] := 'ISO8859_1';

This is so that your database is opened using its proper character set. You can then use this version to dump your database to a textfile using fbisql from the command line using (e.g.)

fbsql -A -g -u <USERNAME> -p <PASSWORD> -o <output file> <database connect string>

If the database is local then <database connect string> is just the path to the database, otherwise it should include the domain name of the server ':' database name - any valid connect string will do.

The above should dump the whole database to one big text file in UTF8 encoding.

You should now edit the text file to replace all "Character Set none" to "Character set UTF8". Also edit the create database statement to what is needed to define the replacement database.

Restore line 407 of fbisql.lpr to its original value and recompile. You can now create the replacement database using:

fbsql -b -e  -u <USERNAME> -p <PASSWORD> -i <file containing edited text dump of database>

fbsql should now read in the text dump and recreate your database for you.

If you are using FB4 then you should get IBX from the MWA software SVN repository. The zip has a bug in it affecting the post RC1 version of FB4 and which is only fixed in SVN.

Title: Re: Firebird database with charset NONE
Post by: devEric69 on June 23, 2021, 04:26:15 pm
Thank you very much for this FBSQL's charset tip, Tony.
Title: Re: Firebird database with charset NONE
Post by: tonyw on June 23, 2021, 04:28:03 pm
Thank you very much for this FBSQL's charset tip, Tony.
In the development version, I have already added a "-c" command line switch to allow the connection character set to be set at run time - just like ISQL.
Title: Re: Firebird database with charset NONE
Post by: devEric69 on June 23, 2021, 04:37:44 pm
In the development version, I have already added a "-c" command line switch to allow the connection character set to be set at run time - just like ISQL.

OK, that's good to know. For information, I'm using your tool (FBSQL) in order to create and manage differential *.sql patches.
Title: Re: Firebird database with charset NONE
Post by: zgabrovski on June 26, 2021, 08:45:16 am
Guys it is very easy to use a FB Database with no character set specified as is.
Simple set the TIBConnection property "UseConnectionCharSetIfNone" to true and then set your desired character set.
I had exactly the the same problem before and we developed this.
Title: Re: Firebird database with charset NONE
Post by: marclebrun on June 27, 2021, 08:57:18 am
Thank you all for your great help, I have made some very good progress :)


Now there is something I don't understand very well :

In FlameRobin, I'm using ISO8859_1 to connect to the new database, everything looks perfect.
Then I change the connection settings to use UTF8, I only have a warning that the charset is different from the database, but then everything still looks perfect. I didn't expect that.

BUT :

In my Lazarus application, things are different :

That's strange, isn't it ?
Is there something I should be careful about ?

Again, many thanks to all for your great advice so far !!! :-D

Marc
Title: Re: Firebird database with charset NONE
Post by: tonyw on June 27, 2021, 09:54:29 am
When you create a Firebird database, you specify a default character set. This is used as the storage character set unless you explicitly specify a character set for a text column.

When you connect to a database you specify the character set used for the connection. Firebird will attempt to transliterate, if possible, to and from the stored characters into this character set for read/write operations. If transliteration is not possilbe then you get the notorious arihmetic overflow or string conversion error.

Lazarus works internlly in UTF8 for the AnsiString type and a further transliteration is required when the data is received if UTF8 is not used for the connection character set. IBX does this automatically. i cannot speak for other database connectors.

When Character set NONE is speciified as the storage character set, a text string is stored as it was received i.e. using the connection character set. When read back, no transliteration takes place and is interpreted using the connection character set. It works fine as long as the same connection character set is used for read/write.

For Lazarus, I would always use UTF8 in all cases as this avoids the need for transliteration. UTF8 can store all unicode characters and avoids the risk of tranliteration errors. The only downside is that storage space is always four times the characater width. Your choice depends on how much of an issue this is compared to the limitations and issues that arise from use of a single byte "codepage"character set.
Title: Re: Firebird database with charset NONE
Post by: Fantablup on June 28, 2021, 05:43:01 am
Why don't you create the database with UTF8 and collations with UNICODE_CI or UNICODE_CI_AI.
These two is case insentive, so it is easy to do searches.

You will then remove any character problems.
TinyPortal © 2005-2018