Forum > Databases
MySQL UTF8 problem with solution
(1/1)
jollytall:
I had a very strange problem today. After a while I solved it, although still do not 100% understand it. I hope it can help someone later, reading it.
I had two very similar (freepascal, command line) programs, both working on the same database. I did insert/update with tSQLQuery using :param and Params.ParamByName('param').AsString := s;
In one of the programs UTF8 characters worked as it should, but in the other they did not. I tried all the tricks said in different places in the wiki/forum, like making a "set names 'utf8'", "set character set 'utf8'", set CharSet to 'UTF8', 'utf8', 'utf8mb4' but nothing worked.
I also tried that instead of using :param, just entering the value directly in the SQL.Text and then it works! Still this is not OK for SQL injection, so I went on.
Then I noticed that the working program had LazUtils as a dependency set, so I also made that, but still not working. Then I moved over the units used and when I added laz2_XmlRead then it started to work.
Since the second program does not do any XML activity, I thought it is one of the other units implicitly linked in when laz2_XmlRead is added, so logically I replaced it with lazutf8 and it works with that.
It is still strange to me, that even if I added uses lazutf8; to a units implementation section and the actual query is in another unit, it still works. So, it is clearly not a function calling reference (anyway the program compiles without LazUtils and lazutf8 as well), but a link time something.
So, this is the solution. Nonetheless, I would be happy if somebody could explain to me:
- why is it needed?
- why does any LazUtils unit work?
- How can it work from another unit?
- How is it possible that utf8 support works with SQL.Text but does not work with Params.ParamByName (unless LazUtils is linked)?
LacaK:
Can you print value of system variable DefaultSystemCodePage in case LazUtf8 is NOT used and then in case LazUtf8 is used.
I think, that when LazUtf8 is used then some procedure in initialization section of LazUtf8 unit is called, which causes altered behavior you observed ...
jollytall:
Thank you, you are right.
If I have no lazutf8 linked, then DefaultSystemCodePage is zero, but when I add it anywhere, it gets 65001.
What is strange I checked the Initialization section, it looks rather innocent. Only two lines:
InitFPUpchars;
InitLazUtf8;
where the first makes an array for all upcase characters and the second is an empty procedure.
Can it be, that some other places I call a function/procedure what has an overwritten version in lazutf8 (though I did not find such a method either)?
Still, if lazutf8 is added it works, if not, the program still complies, runs, but the SQL table is spammed with garbage.
LacaK:
Yes 65001 is CP_UTF8. It affect default string code page. So if set correctly then strings gets transliterated correctly.
DefaultSystemCodePage is changed by call to SetMultiByteConversionCodePage.
This is presented in components/lazutils/fpcadds.pas (used in LazUtf8)
There is also modified widestring manager in winlazutf8.inc in procedure InitLazUtf8;
See also note: https://wiki.freepascal.org/Unicode_Support_in_Lazarus#Using_UTF-8_in_non-LCL_programs
Navigation
[0] Message Index