Forum > Lazarus
Lazarus in non-UTF8 mode (-dDisableUTF8RTL): does anyone still use this?
Bart:
Hi,
Long time ago, before codepage-aware ANSI strings existed in fpc, all strings in Lazarus were treated as being UTF8 encoded.
With the introduction of codepage aware strings (fpc 3.0 and later) the LazUTF8 unit sets DefaultSystemCodePage to CP_UTF8. This ensures "String" in an LCL application will have CP_UTF8 as it's codepage.
At the same time we introduced a possibility to build Lazarus (and the LCL etc.) with a define (DisableUTF8RTL) that disabled this "initialization" to CP_UTF8.
When using this define, "String" in an application will have CP_ACP as it's codepage.
IIRC this was done for Lazarus users who used databases that were encode in a Windows codepage.
The Lazarus team is thinking about completely removing this workaround.
Of course it may very well be the case that users still depend on building LCL applications with the DisableUTF8RTL define.
If that is the case, we would ask you to please report this in this thread.
Bart
Thaddy:
I used to hate utf8 on Windows since on that platform it was not native to the OS which is since NT/XP days either ucs2 or utf16, but since windows 10 that has become transparent so I have no objections, since nowadays I explicitly specify my string type at declaration, so shortstring, ansistring,utf8string, unicodestring etc.
The ambiguity of just "string" does not affect me. (and that behavior stems indeed from my former dislike of utf8 on Windows)
Of course, when dropped, I expect Lazarus to honor the specified string types when used. The latter makes me slightly worried.... but as long as this keeps working on unit level:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---type string = type shortstring;// or any specified string typethen it's fine with me.
teco:
Removing this "workaround" would make Lazarus/Free Pascal mostly unusable for us.
- The Database engine we are using works with UTF16 or Codepage. Only with this it can be used without issues.
- We import Records from different systems. Import would cause issues without this because of codetables, etc.
abouchez:
In most projects, "string" is used as an alias to "TFileName", and when the RTL has to call the OS.
At RTL level, there are not the same: you have SetMultiByteConversionCodePage() and SetMultiByteRTLFileSystemCodePage().
In all my projects, and even in mormot.core.os.pas initialization, I force:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- {$ifdef ISFPC27} // we force UTF-8 everywhere on FPC for consistency with Lazarus SetMultiByteConversionCodePage(CP_UTF8); SetMultiByteRTLFileSystemCodePage(CP_UTF8); {$endif ISFPC27}This is done even for console/server applications, for simplicity and consistency.
IMHO CP_UTF8 is much better than UTF-16, even on Windows.
https://utf8everywhere.org/
The time to make a UTF8/UTF16 translation with the API is negligeable.
And I would not trust Microsoft to have properly implemented the CP_UTF8 at system level. Just use the existing code with UTF-16 parameters.
I would not change any default behavior. Removing -dDisableUTF8RTL would make no difference to me.
Keep it as such, with the SetMultiByte*() functions, unless you want string = UnicodeString like in Delphi and in that case, just use the new mode available in trunk.
But I suspect no-one is using this -dDisableUTF8RTL anywhere but in some very old and unmaintained project, which could stay with a FPC 3.2 anyway.
abouchez:
--- Quote from: teco on February 04, 2025, 09:39:08 am ---- The Database engine we are using works with UTF16 or Codepage. Only with this it can be used without issues.
- We import Records from different systems. Import would cause issues without this because of codetables, etc.
--- End quote ---
I don't see how those two points would be affected.
Even removing this -dDisableUTF8RTL support won't change anything in code using code pages.
If I understand correctly, it is not about removing code pages, it is about removing a potential RTL without CP_UTF8 for strings, when used with Lazarus.
I already thought it was the case: I expected Lazarus to require CP_UTF8 (or CP_UTF16 in the trunk in Unicode mode).
Navigation
[0] Message Index
[#] Next page