Author Topic: Problem with utf8 encoding in database and Lazarus, solution  (Read 5719 times)


  • Hero Member
  • *****
  • Posts: 1592
Problem with utf8 encoding in database and Lazarus, solution
« on: October 22, 2009, 12:54:27 pm »
People who use utf8 encoding in databases should read this.

If you use utf8 encoded varchar (string) fields in your database and if you use persistant fields (fields created in design time) in Lazarus, you can encounter problems in your application. Your dataset can act strangely, the data can appear cut in your database components, some data might not appear at all, you might not be able to post changes in datasets (and debugger won't tell you why, some db exception will be raised, with misleading message)... The worst of all is that you might not notice anything for a while, your application can appear to work correctly and then at some point, maybe when your application is in users hands, functionig for some time, strange problems could arrise. It is not easy to trace origin of these problems. I eventually managed to solve this, so here I am going to explain it.

Dataset's persistent (defined in design time) string fields automatically set their Size property to length of the corresponding fields defined in database. In utf8 encoding there are characters that have one byte and characters that have more than one byte. Hence, the strings can have more bytes (not more characters, but more bytes) than defined length. The problem arises in runtime when persistent field gets such data, the mess with data can even be transfered to next dataset record. Somehow, the problem is only with persistent fields, if fields are automaticaly created in runtime, they seem to know how to deal with this.

The solution:
If you use utf8 encoded string fields in your database, then after you create persistand fields in your dataset, you should manually change the field's Size property. If you don't expect any non-european characters it should be enough to change size to 2 * original size, because I think that all european characters (all latin, cirilic and greek letters) use maximally two bytes in utf8. If you do expect some far east characters, then you should set field's size property to 3 *, or perhaps even 4 * original size.
When you change the field's Size property, then the DisplayWidth property of the field gets changed automatically to the value you entered in Size. You should manually set that back to original value, because you don't need larger field's display, your field only needs to cope with more bytes, not more characters.

That's all.

If you have many datasets with many fields, read on, you can find the following tip useful:

Tip: Instead of manually changing each field's Size and then changing back DisplayWith properties, use this:
In an unit visible to all your forms, type the following procedure:

Code: [Select]
procedure SetStringFieldSizes(DS: TDataSet);
  I: Integer;
  SF: TStringField;
  for I := 0 to DS.FieldCount - 1 do
    if DS.Fields.Fields[I] is TStringField then begin
      SF := DS.Fields.Fields[I] as TStringField;
      SF.Size := 2 * SF.DisplayWidth;
     this should work for all latin, cirilic or greek letters,
     but if you expect non-european characters, then
     in the previous line, instead of 2 *, you should
     put 3 * or maybe 4 *


Then, you should leave automatically given sizes in design time, but in each form's (and datamodule's) OnCreate event you must put something like this:

Code: [Select]
procedure TForm2.FormCreate(Sender: TObject);
  I: Integer;
  DS: TDataSet;
  for I := 0 to ComponentCount - 1 do
    if Components[I] is TDataSet then
      SetStringFieldSizes(Components[I] as TDataSet);

  //... your other form-creating code
« Last Edit: October 22, 2009, 01:42:10 pm by Zoran »


TinyPortal © 2005-2018