Recent

Author Topic: Lazarus doesn't let me write in spanish áéíóú and ^  (Read 7612 times)

Bogen85

  • Hero Member
  • *****
  • Posts: 595
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #30 on: February 19, 2023, 07:09:05 pm »
https://en.cppreference.com/w/c/language/identifier
Quote
Identifier
An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation (since C99), of class XID_Continue (since C23). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character (since C99)(until C23), or Unicode character of class XID_Start) (since C23)). Identifiers are case-sensitive (lowercase and uppercase letters are distinct). Every identifier must conform to Normalization Form C. (since C23)

https://en.cppreference.com/w/cpp/language/identifiers
Quote
Identifiers
An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and most Unicode characters. A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode character of class XID_Start) and may contain non-digit characters, digits, and Unicode characters of class XID_Continue in non-initial positions. Identifiers are case-sensitive (lowercase and uppercase letters are distinct), and every character is significant. Every identifier must conform to Normalization Form C.

Note: Support of Unicode identifiers is limited in most implementations, e.g. gcc (until 10).

Specifically for C++:
Quote
An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and most Unicode characters.

But this post is about not being to able to type non-ascii in Lazarus (for free pascal literals and comments?)

Not for me to defend myself against false accusations of lying on this public forum... (sorry to hijack it for this purpose...)



Jonas Maebe

  • Hero Member
  • *****
  • Posts: 1059
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #31 on: February 19, 2023, 08:45:55 pm »
Going back to the original remark/question: so-called "dead keys" have indeed been broken on macOS since the switch of the LCL from Carbon to Cocoa, and this needs to be fixed. I don't know if there is an open bug report about this.

dsiders

  • Hero Member
  • *****
  • Posts: 1084
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #32 on: February 19, 2023, 09:09:22 pm »
Going back to the original remark/question: so-called "dead keys" have indeed been broken on macOS since the switch of the LCL from Carbon to Cocoa, and this needs to be fixed. I don't know if there is an open bug report about this.

I found a closed report about "back ticks" and tilde keys. https://gitlab.com/freepascal.org/lazarus/lazarus/-/issues/35788

Otherwise, it doesn't appear to be reported.
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 1059
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #33 on: February 19, 2023, 09:12:09 pm »
Indeed, that bug report seems to be about the same issue. Maybe it broke again?

eljo

  • Sr. Member
  • ****
  • Posts: 468
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #34 on: February 19, 2023, 10:02:01 pm »
I don't know if there is an open bug report about this.
See reply #17 by circular

tetrastes

  • Sr. Member
  • ****
  • Posts: 483
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #35 on: February 20, 2023, 12:01:27 am »
Are you saying that the clang and gcc installed on my system are lying to me about non-ASCII being allowed in identifiers?
And that these two sites are lying?
https://en.cppreference.com/w/cpp/language/identifiers
https://en.cppreference.com/w/c/language/identifier

What is the blatant lie? Someone in this thread said C/C++ don't support non-ASCII in identifiers, so I checked.
They do, and the two sites above confirm that. (As does my example in reply #23)

I could be mistaken, it might not be me you are accusing of lying...

This is implementation-defined. Even proposed C23 standard requires only basic character set (which is subset of ASCII) and universal character names (which are \unnnn or \Unnnnnnnn and obviously ASCII also) to be implemented. This is quote from https://open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf:

Quote
5.2 Environmental considerations
5.2.1 Character sets
1 Two sets of characters and their associated collating sequences shall be defined: the set in which source
files are written (the source character set), and the set interpreted in the execution environment (the
execution character set). Each set is further divided into a basic character set, whose contents are given
by this subclause, and a set of zero or more locale-specific members (which are not members of the
basic character set) called extended characters. The combined set is also called the extended character
set. The values of the members of the execution character set are implementation-defined.
...
5 The universal character name construct provides a way to name other characters.

As for implementation, gcc v.11.3.0 (at Ubuntu 22.04.1) compiles utf-8 sources, but doesn't utf-16 or extended ASCII.
cl.exe v.19.31 (from VS 2022) silently compiles utf-16 and extended ASCII (which it interprets as being in system CP), utf-8 it compiles with /source-charset:utf-8 option. May be gcc has analog of this option, I didn't try to find it.

Bogen85

  • Hero Member
  • *****
  • Posts: 595
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #36 on: February 20, 2023, 01:33:41 am »
As for implementation, gcc v.11.3.0 (at Ubuntu 22.04.1) compiles utf-8 sources, but doesn't utf-16 or extended ASCII.
cl.exe v.19.31 (from VS 2022) silently compiles utf-16 and extended ASCII (which it interprets as being in system CP), utf-8 it compiles with /source-charset:utf-8 option. May be gcc has analog of this option, I didn't try to find it.

https://en.cppreference.com/w/c/language/identifier
Since C99 unicode non-ascii has been allowed, something changed at C23
Quote
A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character (since C99)(until C23)

https://en.cppreference.com/w/cpp/language/identifiers
Appears to be at least since C++11.

Already quoted both enough times... (and Microsoft's documentation as well).

I don't see off hand in reading those where it is optional...

Since English is my primary language, I have nothing personal to gain if Free Pascal were to adopt such a thing. And in languages that allow non-ASCII identifiers I don't create them myself, as I have no need to.

« Last Edit: February 20, 2023, 09:59:33 am by Bogen85 »

tetrastes

  • Sr. Member
  • ****
  • Posts: 483
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #37 on: February 20, 2023, 09:23:56 am »
What is the blatant lie? Someone in this thread said C/C++ don't support non-ASCII in identifiers, so I checked.
They do, and the two sites above confirm that. (As does my example in reply #23)

I could be mistaken, it might not be me you are accusing of lying...
I didn't write these words. Please correct your post.

Quote
Since C99 unicode non-ascii has been allowed, something changed at C23

C99 standard says the same about character sets: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf.
The change is adding such terms as XID_Start and XID_Continue character, and disallowing some characters, e.g. emojies.
As for C++, C++17 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf and C++20 https://isocpp.org/files/papers/N4860.pdf even do not mention source extended character set.

So, while non-ASCII chars in identifiers are allowed, but not mandatory and implementation dependent, and if you want your code to be portable, don't use them.

« Last Edit: February 20, 2023, 09:29:49 am by tetrastes »

tetrastes

  • Sr. Member
  • ****
  • Posts: 483
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #38 on: February 20, 2023, 10:00:46 am »
I don't see off hand in reading those where it is optional...

At your https://en.cppreference.com/w/c/language/identifier don't you see words "implementation-defined" thrice?

Bogen85

  • Hero Member
  • *****
  • Posts: 595
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #39 on: February 20, 2023, 10:01:30 am »
I didn't write these words. Please correct your post.

I did correct my post. Not sure what happened. Was reading the emails and thought the wrong thing and got confused.

Bogen85

  • Hero Member
  • *****
  • Posts: 595
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #40 on: February 20, 2023, 10:05:40 am »
I don't see off hand in reading those where it is optional...

At your https://en.cppreference.com/w/c/language/identifier don't you see words "implementation-defined" thrice?

Yes. I see it. now.
Yeah, as far as portability many (not all, I know there are other C/C++ compilers...) just stick to gcc/clang/msvc, and those allow it. If someone already has a lot of gcc specific things (which clang supports) than for them they only support gcc and clang. Not always the best, but it is common.

dbannon

  • Hero Member
  • *****
  • Posts: 2802
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #41 on: February 20, 2023, 11:39:40 am »
gcc, clang, g++, and clang++ all compile (without any warnings or errors) and run the following (without any extra compile flags):

Code: C  [Select][+][-]
  1. #include <stdio.h>
  2.  
  3. void grëéþt() {
  4.         int á = 4;
  5.         char ñame[] = "wórød";
  6.         printf("hélló %s:%d\n", ñame, á);
  7. }
  8.  
  9. int main () {
  10.         grëéþt();
  11.         return 0;
  12. }

https://en.cppreference.com/w/cpp/language/identifiers
https://en.cppreference.com/w/c/language/identifier

Both the C and C++ standards allow for non-ASCII Unicode characters in identifiers.

The above code does not compile using gcc on an Ubuntu 20.04 system, while a few years old, its is still a "officially supported" OS for some time now.  So, as  tetrastes said, it might work on your desktop, but it might not work on someone else's !  And what use is code if its not portable. ?

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

lainz

  • Hero Member
  • *****
  • Posts: 4470
    • https://lainz.github.io/
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #42 on: February 20, 2023, 04:14:28 pm »
The above code does not compile using gcc on an Ubuntu 20.04 system, while a few years old, its is still a "officially supported" OS for some time now.  So, as  tetrastes said, it might work on your desktop, but it might not work on someone else's !  And what use is code if its not portable. ?

Davo

Same happens with versions of FPC, newer versions code can't be compiled with old versions. So is not portable in your terms.

We're talking about new stuff, to clarify...

Bogen85

  • Hero Member
  • *****
  • Posts: 595
Re: Lazarus doesn't let me write in spanish áéíóú and ^
« Reply #43 on: February 20, 2023, 05:08:48 pm »
gcc, clang, g++, and clang++ all compile (without any warnings or errors) and run the following (without any extra compile flags):

Code: C  [Select][+][-]
  1. #include <stdio.h>
  2.  
  3. void grëéþt() {
  4.         int á = 4;
  5.         char ñame[] = "wórød";
  6.         printf("hélló %s:%d\n", ñame, á);
  7. }
  8.  
  9. int main () {
  10.         grëéþt();
  11.         return 0;
  12. }

https://en.cppreference.com/w/cpp/language/identifiers
https://en.cppreference.com/w/c/language/identifier

Both the C and C++ standards allow for non-ASCII Unicode characters in identifiers.

The above code does not compile using gcc on an Ubuntu 20.04 system, while a few years old, its is still a "officially supported" OS for some time now.  So, as  tetrastes said, it might work on your desktop, but it might not work on someone else's !  And what use is code if its not portable. ?

After testrastres pointed out, it is up to the implementation for C. For C++ I don't see the the same wording for C++, and it appears that since C++11 it is allowed.

But as I pointed out earlier
Quote
And in languages that allow non-ASCII identifiers I don't create them myself, as I have no need to
(because for me they would add no value)

I have to admit, I overreacted a bit, but the standards do allow for them, less so for C, but more so for C++ since C++11. Since 3 major compilers support them, I don't consider that fringe, and that is one of the things I was reacting to.

As far as portable, and yeah, I know it makes things difficult when such a mindset spills into too many projects, I've been on projects where generally only recent (past year or so) (or latest stable release) tool chains are supported.


 

TinyPortal © 2005-2018