Recent

Author Topic: MySQL UTF8 problem with solution  (Read 3571 times)

jollytall

  • Sr. Member
  • ****
  • Posts: 319
MySQL UTF8 problem with solution
« on: December 28, 2021, 04:34:08 pm »
I had a very strange problem today. After a while I solved it, although still do not 100% understand it. I hope it can help someone later, reading it.

I had two very similar (freepascal, command line) programs, both working on the same database. I did insert/update with tSQLQuery using :param  and Params.ParamByName('param').AsString := s;

In one of the programs UTF8 characters worked as it should, but in the other they did not. I tried all the tricks said in different places in the wiki/forum, like making a "set names 'utf8'", "set character set 'utf8'", set CharSet to 'UTF8', 'utf8', 'utf8mb4' but nothing worked.
I also tried that instead of using :param, just entering the value directly in the SQL.Text and then it works! Still this is not OK for SQL injection, so I went on.

Then I noticed that the working program had LazUtils as a dependency set, so I also made that, but still not working. Then I moved over the units used and when I added laz2_XmlRead then it started to work.
Since the second program does not do any XML activity, I thought it is one of the other units implicitly linked in when laz2_XmlRead is added, so logically I replaced it with lazutf8 and it works with that.

It is still strange to me, that even if I added uses lazutf8; to a units implementation section and the actual query is in another unit, it still works. So, it is clearly not a function calling reference (anyway the program compiles without LazUtils and lazutf8 as well), but a link time something.

So, this is the solution. Nonetheless, I would be happy if somebody could explain to me:
- why is it needed?
- why does any LazUtils unit work?
- How can it work from another unit?
- How is it possible that utf8 support works with SQL.Text but does not work with Params.ParamByName (unless LazUtils is linked)?

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: MySQL UTF8 problem with solution
« Reply #1 on: January 03, 2022, 01:30:46 pm »
Can you print value of system variable DefaultSystemCodePage in case LazUtf8 is NOT used and then in case LazUtf8 is used.
I think, that when LazUtf8 is used then some procedure in initialization section of LazUtf8 unit is called, which causes altered behavior you observed ...

jollytall

  • Sr. Member
  • ****
  • Posts: 319
Re: MySQL UTF8 problem with solution
« Reply #2 on: January 03, 2022, 05:05:44 pm »
Thank you, you are right.

If I have no lazutf8 linked, then DefaultSystemCodePage is zero, but when I add it anywhere, it gets 65001.
What is strange I checked the Initialization section, it looks rather innocent. Only two lines:
  InitFPUpchars;
  InitLazUtf8; 
where the first makes an array for all upcase characters and the second is an empty procedure.

Can it be, that some other places I call a function/procedure what has an overwritten version in  lazutf8 (though I did not find such a method either)?

Still, if lazutf8 is added it works, if not, the program still complies, runs, but the SQL table is spammed with garbage.

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: MySQL UTF8 problem with solution
« Reply #3 on: January 03, 2022, 06:17:54 pm »
Yes 65001 is CP_UTF8. It affect default string code page. So if set correctly then strings gets transliterated correctly.

DefaultSystemCodePage is changed by call to SetMultiByteConversionCodePage.
This is presented in components/lazutils/fpcadds.pas (used in LazUtf8)
There is also modified widestring manager in winlazutf8.inc in procedure InitLazUtf8;

See also note: https://wiki.freepascal.org/Unicode_Support_in_Lazarus#Using_UTF-8_in_non-LCL_programs

 

TinyPortal © 2005-2018