Forum > General

[solved] Case with unicode switch and -FcUTF8

<< < (2/2)

Ocye:

--- Quote from: JuhaManninen on October 14, 2015, 02:28:55 pm ---It apparently causes a swamp of nasty issues...
--- End quote ---
And I'm completely confused now ;-)
Is there a 'small' compiler switch like {$UTF8+/-}?


--- Quote from: GetMem on October 14, 2015, 02:43:02 pm ---That case hurts my eyes...
--- End quote ---
True, but readability is better. And at some point I have to convert the drop down selection (which is dynamically filled).
Anyway, the question is rather about the restrictions of UTF8 in RTL.


--- Quote from: Roland Chastain on October 14, 2015, 02:46:30 pm ---
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---'Fran'#$C3#$A7'ais': writeln('Salute Monde');
--- End quote ---
And now for Greek (Ελληνικά)  ;D


--- Quote from: Bart on October 14, 2015, 03:11:27 pm ---What is the encoding of the source-file in question?
--- End quote ---
How do I check that? Notepad++ tells me the file is UTF8 without BOM (saving explicitly as UTF8 makes no difference). I use to copy files from Linux to Windows.

JuhaManninen:

--- Quote from: Ocye on October 14, 2015, 03:17:00 pm ---
--- Quote from: JuhaManninen on October 14, 2015, 02:28:55 pm ---It apparently causes a swamp of nasty issues...
--- End quote ---
And I'm completely confused now ;-)
Is there a 'small' compiler switch like {$UTF8+/-}?

--- End quote ---

There is a small define :
 -dEnableUTF8RTL
My plan is replace it with -dDisableUTF8RTL for people who want to use the FPC system codepage string as default.
Without any defines a Lazarus project compiled with FPC 3.x will then use the new UTF-8 system.

See issue :
 http://bugs.freepascal.org/view.php?id=26453
and its related issues. This is really nasty, it may be a compiler bug as Michl figured out.
Our new UTF-8 system solves those problems. It is still a hack but less of a hack than the currently used UTF-8 hack is.

Later when FPC, RTL and other libs are ready, we will implement a Delphi compatible UTF-16 support, too.
Before that, let's try to make things work without UTF-16.

Roland57:

--- Quote from: Ocye on October 14, 2015, 03:17:00 pm ---And now for Greek (Ελληνικά)  ;D
--- End quote ---

Here you are.


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---{$codepage utf8} procedure Greetings(aLanguage: string);begin  case aLanguage of    'English': WriteLn('Hello World');    'Deutsch': WriteLn('Hallo Welt');    {'Français'}'Fran'#$C3#$A7'ais': WriteLn('Salute Monde');    {'Русский'}#$D0#$A0#$D1#$83#$D1#$81#$D1#$81#$D0#$BA#$D0#$B8#$D0#$B9: WriteLn('приве́т мир');    {'Ελληνικά'}#$CE#$95#$CE#$BB#$CE#$BB#$CE#$B7#$CE#$BD#$CE#$B9#$CE#$BA#$CE#$AC: WriteLn('Ελληνικά');  end;end; begin  Greetings('English');  Greetings('Français');  Greetings('Русский');  Greetings('Ελληνικά');  ReadLn;end.

Ocye:

--- Quote from: JuhaManninen on October 14, 2015, 04:44:13 pm ---See issue :
 http://bugs.freepascal.org/view.php?id=26453...
--- End quote ---
I always struggle with those issues. As a non-professional, who didn't really understand the codepage stuff, and usually coding on Linux it's one of the major obstacles to get the code working cross-plattform for multiple language.


--- Quote from: Roland Chastain on October 14, 2015, 08:55:21 pm ---
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---{$codepage utf8}    {'Русский'}#$D0#$A0#$D1#$83#$D1#$81#$D1#$81#$D0#$BA#$D0#$B8#$D0#$B9: WriteLn('приве́т мир');    {'Ελληνικά'}#$CE#$95#$CE#$BB#$CE#$BB#$CE#$B7#$CE#$BD#$CE#$B9#$CE#$BA#$CE#$AC: WriteLn('Ελληνικά'); 
--- End quote ---
Hm... could indeed be a solution, at least temporarily. Thanks for the suggestion.

JuhaManninen:

--- Quote from: Ocye on October 16, 2015, 10:57:13 am ---
--- Quote from: JuhaManninen on October 14, 2015, 04:44:13 pm ---See issue :
 http://bugs.freepascal.org/view.php?id=26453...
--- End quote ---
I always struggle with those issues. As a non-professional, who didn't really understand the codepage stuff, and usually coding on Linux it's one of the major obstacles to get the code working cross-plattform for multiple language.

--- End quote ---

Same here. That's why we have created the new "UTF-8 hack":
 http://wiki.freepascal.org/Better_LCL_Unicode_Support
It works amazingly well. Yes it has issues, too, but they are predictable and understandable.
The bug report I mentioned happens only when the new UTF-8 system is NOT used.


--- Quote from: Ocye on October 16, 2015, 10:57:13 am ---Hm... could indeed be a solution, at least temporarily. Thanks for the suggestion.

--- End quote ---

No, a proper solution is the new UTF-8 system. Why don't you want to use it?

From your first post :

--- Quote ---Having the option -FcUTF8 set (RTL with UTF8 support), the compiler complains ...

--- End quote ---

You have misunderstood the meaning of -FcUTF8. It does not make RTL support UTF-8. It only makes the compiler assume that source files have UTF-8 encoding.
-dEnableUTF8RTL changes the default encoding of String type.
My plan is to remove -dEnableUTF8RTL and make the new UTF-8 system the default behavior for all Lazarus projects. As you have seen, String with system codepage + FPC 3.x + LCL with UTF-8 is a SWAMP.
Maybe I should do this change ASAP to avoid more confusion. There will be -dDisableUTF8RTL for people who must use system code page strings.

And, before anybody asks:
Delphi compatible UTF-16 support will be made later when FPC and its libs are ready.
This new UTF-8 system is much better than the old UTF-8 hack with all those UTF8... functions. It is less of a hack.
In fact it is almost Delphi compatible at source level for lots of code. Reading/writing non-UTF-8 streams or files or DBs need changes which must be documented somehow.

Navigation

[0] Message Index

[*] Previous page

Go to full version