Recent

Author Topic: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?  (Read 4788 times)

Bart

  • Hero Member
  • *****
  • Posts: 5531
    • Bart en Mariska's Webstek
Hi,

Long time ago, before codepage-aware ANSI strings existed in fpc, all strings in Lazarus were treated as being UTF8 encoded.
With the introduction of codepage aware strings (fpc 3.0 and later) the LazUTF8 unit sets DefaultSystemCodePage to CP_UTF8. This ensures "String" in an LCL application will have CP_UTF8 as it's codepage.
At the same time we introduced a possibility to build Lazarus (and the LCL etc.) with a define (DisableUTF8RTL) that disabled this "initialization" to CP_UTF8.
When using this define, "String" in an application will have CP_ACP as it's codepage.
IIRC this was done for Lazarus users who used databases that were encode in a Windows codepage.

The Lazarus team is thinking about completely removing this workaround.
Of course it may very well be the case that users still depend on building LCL applications with the DisableUTF8RTL define.
If that is the case, we would ask you to please report this in this thread.

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 16631
  • Kallstadt seems a good place to evict Trump to.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #1 on: February 04, 2025, 08:41:34 am »
I used to hate utf8 on Windows since on that platform it was not native to the OS which is since NT/XP days either ucs2 or utf16, but since windows 10 that has become transparent so I have no objections, since nowadays I explicitly specify my string type at declaration, so shortstring, ansistring,utf8string, unicodestring etc.
The ambiguity of just "string" does not affect me. (and that behavior stems indeed from my former dislike of utf8 on Windows)

Of course, when dropped, I expect Lazarus to honor the specified string types when used. The latter makes me slightly worried.... but as long as this keeps working on unit level:
Code: Pascal  [Select][+][-]
  1. type
  2.     string = type shortstring;// or any specified string type
then it's fine with me.
« Last Edit: February 04, 2025, 10:23:43 am by Thaddy »
But I am sure they don't want the Trumps back...

teco

  • New Member
  • *
  • Posts: 34
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #2 on: February 04, 2025, 09:39:08 am »
Removing this "workaround" would make Lazarus/Free Pascal mostly unusable for us.

- The Database engine we are using works with UTF16 or Codepage. Only with this it can be used without issues.
- We import Records from different systems. Import would cause issues without this because of codetables, etc.

abouchez

  • Full Member
  • ***
  • Posts: 124
    • Synopse
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #3 on: February 04, 2025, 10:16:33 am »
In most projects, "string" is used as an alias to "TFileName", and when the RTL has to call the OS.

At RTL level, there are not the same: you have SetMultiByteConversionCodePage() and SetMultiByteRTLFileSystemCodePage().

In all my projects, and even in mormot.core.os.pas initialization, I force:
Code: Pascal  [Select][+][-]
  1.   {$ifdef ISFPC27}
  2.   // we force UTF-8 everywhere on FPC for consistency with Lazarus
  3.   SetMultiByteConversionCodePage(CP_UTF8);
  4.   SetMultiByteRTLFileSystemCodePage(CP_UTF8);
  5.   {$endif ISFPC27}
This is done even  for console/server applications, for simplicity and consistency.

IMHO CP_UTF8 is much better than UTF-16, even on Windows.
https://utf8everywhere.org/
The time to make a UTF8/UTF16 translation with the API is negligeable.
And I would not trust Microsoft to have properly implemented the CP_UTF8 at system level. Just use the existing code with UTF-16 parameters.

I would not change any default behavior. Removing -dDisableUTF8RTL would make no difference to me.
Keep it as such, with the SetMultiByte*() functions, unless you want string = UnicodeString like in Delphi and in that case, just use the new mode available in trunk.
But I suspect no-one is using this -dDisableUTF8RTL anywhere but in some very old and unmaintained project, which could stay with a FPC 3.2 anyway.
« Last Edit: February 04, 2025, 10:23:15 am by abouchez »

abouchez

  • Full Member
  • ***
  • Posts: 124
    • Synopse
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #4 on: February 04, 2025, 10:21:18 am »
- The Database engine we are using works with UTF16 or Codepage. Only with this it can be used without issues.
- We import Records from different systems. Import would cause issues without this because of codetables, etc.
I don't see how those two points would be affected.

Even removing this -dDisableUTF8RTL support won't change anything in code using code pages.
If I understand correctly, it is not about removing code pages, it is about removing a potential RTL without CP_UTF8 for strings, when used with Lazarus.
I already thought it was the case: I expected Lazarus to require CP_UTF8 (or CP_UTF16 in the trunk in Unicode mode).
« Last Edit: February 04, 2025, 10:26:38 am by abouchez »

Thaddy

  • Hero Member
  • *****
  • Posts: 16631
  • Kallstadt seems a good place to evict Trump to.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #5 on: February 04, 2025, 10:28:34 am »
Hence I mentioned that on unit level the string type should be definable and on project level the string type should work with explicit string types.
As long as that remains the case there really are no(t many) problems.
But I am sure they don't want the Trumps back...

teco

  • New Member
  • *
  • Posts: 34
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #6 on: February 04, 2025, 10:30:22 am »
ElevateDB is original developed for Delphi.

With Free Pascal it has issues with german umlauts (äöüß)

Solution from the manufacturer is to use -dDisableUTF8RTL or switching to Delphi.

abouchez

  • Full Member
  • ***
  • Posts: 124
    • Synopse
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #7 on: February 04, 2025, 10:34:42 am »
Solution from the manufacturer is to use -dDisableUTF8RTL or switching to Delphi.
Maybe switch to the new FPC Unicode mode may be a good option.
That is, remove "-dDisableUTF8RTL" when FPC Unicode mode is finalized.

paweld

  • Hero Member
  • *****
  • Posts: 1322
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #8 on: February 04, 2025, 11:48:17 am »
Removing this "workaround" would make Lazarus/Free Pascal mostly unusable for us.

- The Database engine we are using works with UTF16 or Codepage. Only with this it can be used without issues.
- We import Records from different systems. Import would cause issues without this because of codetables, etc.
In LCL you have the LConvEncoding unit, which allows conversion between multiple code pages.
I use Lazarus for applications using the MSSQL database server, and most of the databases have the collation set to Polish_CI_AS and it works very well with Lazarus using the CP1250ToUTF8 and UTF8ToCP1250 functions. To handle UTF-16 there is UTF8ToUTF16 from the LazUTF8 unit.
I also use these functions when importing data from various files.
Best regards / Pozdrawiam
paweld

Thaddy

  • Hero Member
  • *****
  • Posts: 16631
  • Kallstadt seems a good place to evict Trump to.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #9 on: February 04, 2025, 11:53:15 am »
Simply try {$mode delphiunicode} on the units where code is affected.
As I understand utf8 code should be assignment compatible to utf16.
But the problem you ran into is exactly why I initially hated Lazarus default utf8.
But I am sure they don't want the Trumps back...

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #10 on: February 04, 2025, 12:40:15 pm »
I would not change any default behavior. Removing -dDisableUTF8RTL would make no difference to me.
Keep it as such, with the SetMultiByte*() functions, unless you want string = UnicodeString like in Delphi and in that case, just use the new mode available in trunk.
Nobody is planning to change the default behavior now. The question is about removing the DisableUTF8RTL define and old codepage support.

Default UTF-16 in LCL using UnicodeString will require big changes. I don't even know how it can be done.
At that point however the old codepage support will complicate things further. I hope it gets removed before that.

ElevateDB is original developed for Delphi.
With Free Pascal it has issues with german umlauts (äöüß)
Solution from the manufacturer is to use -dDisableUTF8RTL or switching to Delphi.
OK, that is a valid reason to use dDisableUTF8RTL, meaning it cannot be removed now.
You should however push ElevateDB developers to improve their Unicode support. Having issues with german umlauts (äöüß) probably means they treat strings as having fixed-width chars. That means they treat also UTF-16 as fixed-width which it is not. With UTF-8 you must do it correctly because codepoint lengths vary so often.

Another thought: Maybe you could easily solve the problem in your own code using explicit conversion functions as paweld explained.
Convert the database values to Unicode as soon as possible after reading and convert them to the database encoding as late as possible before writing. That is what everybody else does (AFAIK).
If the database already has UTF-16, then just convert it (automatically) to UTF-8. And back to UTF-16.
If it does not work, please explain me why.
« Last Edit: February 04, 2025, 01:05:17 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 16631
  • Kallstadt seems a good place to evict Trump to.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #11 on: February 04, 2025, 01:10:32 pm »
Yes it looks that they still treat utf16 as ucs2, which was correct a long time ago, but not for the past 20 years. They seem to reserve two bytes per char instead of four, which should be the case for utf8, utf16 and utf32 at storage point and there is no escape from that in a database setting.
But I am sure they don't want the Trumps back...

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #12 on: February 04, 2025, 01:22:43 pm »
IMO all textual data in databases and all code manipulating it should be converted to Unicode, regardless of what encoding.
A locale dependent codepage is a truly inferior system. Somebody else's data will show as "????????".
I thought we got rid of that in 1990's. But no!
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

teco

  • New Member
  • *
  • Posts: 34
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #13 on: February 04, 2025, 01:55:52 pm »

Another thought: Maybe you could easily solve the problem in your own code using explicit conversion functions as paweld explained.
Convert the database values to Unicode as soon as possible after reading and convert them to the database encoding as late as possible before writing. That is what everybody else does (AFAIK).
If the database already has UTF-16, then just convert it (automatically) to UTF-8. And back to UTF-16.
If it does not work, please explain me why.

The catch is that there are not only our software accessing the ElevateDB. We write data into Elevate and another software do further processing. This software is Delphi based.  We must make it work that the Delphi Software see the right information, so we can not see it independently with a solution. The manufacturer says that Delphi is his main sales for the Desktop and Server Edition and Free Pascal is only a small part. To small for additional work to increase compatibility.

Thaddy

  • Hero Member
  • *****
  • Posts: 16631
  • Kallstadt seems a good place to evict Trump to.
Re: Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
« Reply #14 on: February 04, 2025, 02:02:14 pm »
To small for additional work to increase compatibility.
Yes, that makes sense: that is your job. (btw it is too, double o)
FPC supports many conversions.
But I am sure they don't want the Trumps back...

 

TinyPortal © 2005-2018