Recent

Author Topic: [solved] Case with unicode switch and -FcUTF8  (Read 6974 times)

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
[solved] Case with unicode switch and -FcUTF8
« on: October 14, 2015, 11:29:45 am »
I use case with string labels in my program. Something like:
Code: Pascal  [Select][+][-]
  1. procedure Greetings(aLanguage: string);
  2. begin
  3.   case aLanguage of
  4.    'English': writeln('Hello World');
  5.    'Deutsch': writeln('Hallo Welt');
  6.    'Français': writeln('Salute Monde'); //error
  7.    'Русский': writeln('приве́т мир'); //error
  8.   end;
  9. end;
Having the option -FcUTF8 set (RTL with UTF8 support), the compiler complains "Constant and CASE types do not match" for the labels with UTF8 codepoints. Do I really need to type cast or convert?

Lazarus 1.5 rUnknown FPC 3.0.0 i386-win32-win32/win64
« Last Edit: October 16, 2015, 10:57:37 am by Ocye »
Lazarus 1.7 (SVN) FPC 3.0.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: Case with unicode switch and -FcUTF8
« Reply #1 on: October 14, 2015, 02:28:55 pm »
http://wiki.freepascal.org/Better_LCL_Unicode_Support

We must also support programs / apps with the default string type (local code page), without the new UTF-8 system.
It apparently causes a swamp of nasty issues but they must listed somewhere and solved as well as possible.
I try to improve the wiki page in near future ...
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

balazsszekely

  • Guest
Re: Case with unicode switch and -FcUTF8
« Reply #2 on: October 14, 2015, 02:43:02 pm »
That case hurts my eyes, but I guess it's just personal preference:

Code: Pascal  [Select][+][-]
  1. type
  2.   TLanguage = (EN, DE, FR, RU);
  3.  
  4. procedure Greetings(const aLanguage: TLanguage);
  5. begin
  6.   case aLanguage of
  7.    EN: WriteLn('Hello World');
  8.    DE: WriteLn('Hallo Welt');
  9.    FR: WriteLn('Salute Monde');
  10.    RU: WriteLn('приве́т мир');
  11.   end;
  12. end;
  13.  
  14. //...
  15. Greetings(EN);
  16.  
  17.  

Roland57

  • Sr. Member
  • ****
  • Posts: 423
    • msegui.net
Re: Case with unicode switch and -FcUTF8
« Reply #3 on: October 14, 2015, 02:46:30 pm »
Hello! This seems to work:

Code: Pascal  [Select][+][-]
  1. 'Fran'#$C3#$A7'ais': writeln('Salute Monde');
My projects are on Gitlab and on Codeberg.

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Case with unicode switch and -FcUTF8
« Reply #4 on: October 14, 2015, 03:11:27 pm »
What is the encoding of the source-file in question?
The description leads me to believ the souce-file is not UTF8-encoded, and therefore (when specifying FcUTF8) the 'Français' is then a malformed string, and the compiler cannot handle it as a case label?

Bart

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: Case with unicode switch and -FcUTF8
« Reply #5 on: October 14, 2015, 03:17:00 pm »
It apparently causes a swamp of nasty issues...
And I'm completely confused now ;-)
Is there a 'small' compiler switch like {$UTF8+/-}?

That case hurts my eyes...
True, but readability is better. And at some point I have to convert the drop down selection (which is dynamically filled).
Anyway, the question is rather about the restrictions of UTF8 in RTL.

Code: Pascal  [Select][+][-]
  1. 'Fran'#$C3#$A7'ais': writeln('Salute Monde');
And now for Greek (Ελληνικά)  ;D

What is the encoding of the source-file in question?
How do I check that? Notepad++ tells me the file is UTF8 without BOM (saving explicitly as UTF8 makes no difference). I use to copy files from Linux to Windows.
Lazarus 1.7 (SVN) FPC 3.0.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: Case with unicode switch and -FcUTF8
« Reply #6 on: October 14, 2015, 04:44:13 pm »
It apparently causes a swamp of nasty issues...
And I'm completely confused now ;-)
Is there a 'small' compiler switch like {$UTF8+/-}?

There is a small define :
 -dEnableUTF8RTL
My plan is replace it with -dDisableUTF8RTL for people who want to use the FPC system codepage string as default.
Without any defines a Lazarus project compiled with FPC 3.x will then use the new UTF-8 system.

See issue :
 http://bugs.freepascal.org/view.php?id=26453
and its related issues. This is really nasty, it may be a compiler bug as Michl figured out.
Our new UTF-8 system solves those problems. It is still a hack but less of a hack than the currently used UTF-8 hack is.

Later when FPC, RTL and other libs are ready, we will implement a Delphi compatible UTF-16 support, too.
Before that, let's try to make things work without UTF-16.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Roland57

  • Sr. Member
  • ****
  • Posts: 423
    • msegui.net
Re: Case with unicode switch and -FcUTF8
« Reply #7 on: October 14, 2015, 08:55:21 pm »
And now for Greek (Ελληνικά)  ;D

Here you are.

Code: Pascal  [Select][+][-]
  1. {$codepage utf8}
  2.  
  3. procedure Greetings(aLanguage: string);
  4. begin
  5.   case aLanguage of
  6.     'English': WriteLn('Hello World');
  7.     'Deutsch': WriteLn('Hallo Welt');
  8.     {'Français'}'Fran'#$C3#$A7'ais': WriteLn('Salute Monde');
  9.     {'Русский'}#$D0#$A0#$D1#$83#$D1#$81#$D1#$81#$D0#$BA#$D0#$B8#$D0#$B9: WriteLn('приве́т мир');
  10.     {'Ελληνικά'}#$CE#$95#$CE#$BB#$CE#$BB#$CE#$B7#$CE#$BD#$CE#$B9#$CE#$BA#$CE#$AC: WriteLn('Ελληνικά');
  11.   end;
  12. end;
  13.  
  14. begin
  15.   Greetings('English');
  16.   Greetings('Français');
  17.   Greetings('Русский');
  18.   Greetings('Ελληνικά');
  19.   ReadLn;
  20. end.
My projects are on Gitlab and on Codeberg.

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: Case with unicode switch and -FcUTF8
« Reply #8 on: October 16, 2015, 10:57:13 am »
See issue :
 http://bugs.freepascal.org/view.php?id=26453...
I always struggle with those issues. As a non-professional, who didn't really understand the codepage stuff, and usually coding on Linux it's one of the major obstacles to get the code working cross-plattform for multiple language.

Code: Pascal  [Select][+][-]
  1. {$codepage utf8}
  2.     {'Русский'}#$D0#$A0#$D1#$83#$D1#$81#$D1#$81#$D0#$BA#$D0#$B8#$D0#$B9: WriteLn('приве́т мир');
  3.     {'Ελληνικά'}#$CE#$95#$CE#$BB#$CE#$BB#$CE#$B7#$CE#$BD#$CE#$B9#$CE#$BA#$CE#$AC: WriteLn('Ελληνικά');
  4.  
Hm... could indeed be a solution, at least temporarily. Thanks for the suggestion.
Lazarus 1.7 (SVN) FPC 3.0.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4474
  • I like bugs.
Re: Case with unicode switch and -FcUTF8
« Reply #9 on: October 16, 2015, 11:52:52 am »
See issue :
 http://bugs.freepascal.org/view.php?id=26453...
I always struggle with those issues. As a non-professional, who didn't really understand the codepage stuff, and usually coding on Linux it's one of the major obstacles to get the code working cross-plattform for multiple language.

Same here. That's why we have created the new "UTF-8 hack":
 http://wiki.freepascal.org/Better_LCL_Unicode_Support
It works amazingly well. Yes it has issues, too, but they are predictable and understandable.
The bug report I mentioned happens only when the new UTF-8 system is NOT used.

Hm... could indeed be a solution, at least temporarily. Thanks for the suggestion.

No, a proper solution is the new UTF-8 system. Why don't you want to use it?

From your first post :
Quote
Having the option -FcUTF8 set (RTL with UTF8 support), the compiler complains ...

You have misunderstood the meaning of -FcUTF8. It does not make RTL support UTF-8. It only makes the compiler assume that source files have UTF-8 encoding.
-dEnableUTF8RTL changes the default encoding of String type.
My plan is to remove -dEnableUTF8RTL and make the new UTF-8 system the default behavior for all Lazarus projects. As you have seen, String with system codepage + FPC 3.x + LCL with UTF-8 is a SWAMP.
Maybe I should do this change ASAP to avoid more confusion. There will be -dDisableUTF8RTL for people who must use system code page strings.

And, before anybody asks:
Delphi compatible UTF-16 support will be made later when FPC and its libs are ready.
This new UTF-8 system is much better than the old UTF-8 hack with all those UTF8... functions. It is less of a hack.
In fact it is almost Delphi compatible at source level for lots of code. Reading/writing non-UTF-8 streams or files or DBs need changes which must be documented somehow.
« Last Edit: October 16, 2015, 12:04:17 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018