Recent

Author Topic: [Solved] Zeos adding an extra #0 at the end of a UTF8 string  (Read 1057 times)

JohnvdWaeter

  • Full Member
  • ***
  • Posts: 185
    • http://www.jvdw.nl
[Solved] Zeos adding an extra #0 at the end of a UTF8 string
« on: February 13, 2024, 10:10:12 am »
Hi All,
Banging my head over this...
In my firebird 2.5 database all varchar fields are UTF-8.
The DB connection is UTF-8.
I use Zeos components in a console/server application in Free Pascal.
The ZConnection has all properties set to UTF-8.

At this application, a JSON arrives with a text containing UTF8 chars.
This should be posted into a FB table via a Zeos zquery that calls an SP.
The text arrives properly at the application, checked by writeln(atext);
Then I want to put it in the zQuery's parameter:

params.parambyname('atext').asString:=atext;
wrong result, garbage text where special characters are. But I thought all Free Pascal strings {$H+} are now UTF8?

Ok, next try:

params.parambyname('atext').asUnicodeString:=atext; Allmost good, it appears nicely in the table (checked by IBExpert) BUT it adds a chr(0) at the end.

I can't figure out where the #0 comes from...


« Last Edit: February 14, 2024, 03:45:00 am by JohnvdWaeter »

zeljko

  • Hero Member
  • *****
  • Posts: 1686
    • http://wiki.lazarus.freepascal.org/User:Zeljan
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #1 on: February 13, 2024, 11:31:30 am »
What is ZEOS version ? I don't have such problems here.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11985
  • FPC developer.
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #2 on: February 13, 2024, 11:35:14 am »
OS?   On Windows UTF8 is only default for Lazarus GUI apps to my knowledge.

cdbc

  • Hero Member
  • *****
  • Posts: 1760
    • http://www.cdbc.dk
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #3 on: February 13, 2024, 11:57:43 am »
Hi
The extra #0 comes from the double #0#0 terminator in PWideChar... methinks...
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE5 -> FPC 3.2.2 -> Lazarus 2.2.6 up until Jan 2024 from then on it's: KDE5/QT5 -> FPC 3.3.1 -> Lazarus 3.0

Thaddy

  • Hero Member
  • *****
  • Posts: 16348
  • Censorship about opinions does not belong here.
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #4 on: February 13, 2024, 01:15:18 pm »
But I thought all Free Pascal strings {$H+} are now UTF8?
That is only true for Lazarus applications. In FPC the stringtype is Ansi, not UTF8. (Is your app a console app?)
The extra zero comes from Widestring/PWideChar indeed.
There is nothing wrong with being blunt. At a minimum it is also honest.

tonyw

  • Sr. Member
  • ****
  • Posts: 331
    • MWA Software
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #5 on: February 13, 2024, 01:37:24 pm »
You have fallen foul of this bug

https://gitlab.com/freepascal.org/fpc/source/-/issues/39746

I came across it when testing IBX and added the following workaround

Code: Pascal  [Select][+][-]
  1. function SafeAnsiUpperCase(const s: AnsiString): AnsiString;
  2. 2194    var i: integer;
  3. 2195    begin
  4. 2196      {Guard against string length being increased by 1 and the trailing zero
  5. 2197       being incorporated in the string - see
  6. 2198       https://gitlab.com/freepascal.org/fpc/source/-/issues/39746}
  7. 2199      Result := AnsiUpperCase(s);
  8. 2200      i := length(Result);
  9. 2201      while (i > 0) and (Result[i] = #0) do Dec(i);
  10. 2202      SetLength(Result,i);
  11. 2203    end;
  12.  

The line numbers are from IBUtils.pas. As the name suggests, it has to be called instead of AnsiUpperCase and applies when international character sets are enabled.

JohnvdWaeter

  • Full Member
  • ***
  • Posts: 185
    • http://www.jvdw.nl
Re: Zeos adding an extra #0 at the end of a UTF8 string
« Reply #6 on: February 14, 2024, 03:44:32 am »
It is a console/server application for Linux (Debian)
Lazarus version = 2.12, FPC 3.2.0, running on Linux
Zeos version 7.2.14

Your responses put me on the right track!
The solution was to know what to Google for...

Simply putting LazUTF8 in the main program units clause and getting rid of all utf8strings, widestrings, ansistrings, unicodestrings etc and replacing them with just String did the job!

Thanks :)

 

TinyPortal © 2005-2018