Recent

Author Topic: Identifiers character set  (Read 1856 times)

gidesa

  • Full Member
  • ***
  • Posts: 141
Identifiers character set
« on: October 30, 2024, 12:59:23 pm »
Hello,
I note that the character set used for Freepascal identifiers is restricted to alphabetic a-z A-Z, digits 0-9, and underscore _ . Contrary to Delphi that use Unicode characters, with less restrictions.
For example "§", "°", "£" are valid identifiers in Delphi, but not in Freepascal.
These "strange" characters sometimes are useful. For example when you want to use quasi-mathematical formula, so "§()" could be a summation procedure on a vector, matrix, etc.
What is the reason of restrictions? Maybe because it's difficult to manage that characters on different platforms?
Is it possible to enlarge the identifier character set?

TRon

  • Hero Member
  • *****
  • Posts: 3623
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

Thaddy

  • Hero Member
  • *****
  • Posts: 16168
  • Censorship about opinions does not belong here.
Re: Identifiers character set
« Reply #2 on: October 30, 2024, 01:26:40 pm »
There is a - working, I tested it - patch in the pipe line made by a Chinese Freepascal fanatic
(variables in simplified Chinese etc, makes the compiler support utf8 for sourcecode, not only literals)
« Last Edit: October 30, 2024, 03:39:52 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

gidesa

  • Full Member
  • ***
  • Posts: 141
Re: Identifiers character set
« Reply #3 on: October 30, 2024, 01:27:28 pm »
Ok, I see. There is a big debate.
But there is also a working compiler change.
I will try it.
Thanks

MarkMLl

  • Hero Member
  • *****
  • Posts: 8024
Re: Identifiers character set
« Reply #4 on: October 30, 2024, 01:29:25 pm »
So how do we know, unambiguously, whether a string of characters is a valid identifier? At present the rule is clear: must start with a character in the range [A..Z,a..z,_].

Speaking as somebody who would like to see more flexibility in operator definition, this is something to which I've given a fair amount of thought: I don't think it's doable, unless the fundamental lexer behaviour varies by locale. Should this be per-unit or per-project? If per unit, what happens when one unit doesn't like an identifier exported by another?

We have Perl 6 as a cautionary tale...

MarkMLl

p.s. I've posted before about a hack which allows one to say REDUCE + for vector etc. summarion.

MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

gidesa

  • Full Member
  • ***
  • Posts: 141
Re: Identifiers character set
« Reply #5 on: October 30, 2024, 02:19:33 pm »
I have done a quick test on various languages: Python; Gcc C++; Rexx interpreter; old Modula-2 XDS compiler; just Javascript console in browser. No one accept identifiers beginning with special characters as "§".
So Delphi behavior is an oddity, not the rule.
Conclusion: Freepascal is good as it's :-)

(*) Note: as "summation" I mean the sigma mathematical symbol, that is the sum of all elements of one vector/matrix. "§" is an approximation to sigma character.
I know that with advanced records one can redefines many operator between same records, as sum "+". And I have done this.
« Last Edit: October 30, 2024, 02:26:01 pm by gidesa »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8024
Re: Identifiers character set
« Reply #6 on: October 30, 2024, 02:40:49 pm »
I have done a quick test on various languages: Python; Gcc C++; Rexx interpreter; old Modula-2 XDS compiler; just Javascript console in browser. No one accept identifiers beginning with special characters as "§".
So Delphi behavior is an oddity, not the rule.
Conclusion: Freepascal is good as it's :-)

Maybe. It /might/ be possible to use DUCET as an authority as to what is genuinely a letter hence permissible in an identifier, but the scope of the decision remains: it might be necessary to have a ground-up build for each combination.

Quote
(*) Note: as "summation" I mean the sigma mathematical symbol, that is the sum of all elements of one vector/matrix. "§" is an approximation to sigma character.
I know that with advanced records one can redefines many operator between same records, as sum "+". And I have done this.

I know perfectly well what you mean, and I'm not talking about advanced records: I'm talking about reduction as done in Vector Pascal or APL.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

WooBean

  • Sr. Member
  • ****
  • Posts: 277
Re: Identifiers character set
« Reply #7 on: October 30, 2024, 02:49:49 pm »
I have done a quick test on various languages: Python; Gcc C++; Rexx interpreter; old Modula-2 XDS compiler; just Javascript console in browser. No one accept identifiers beginning with special characters as "§".
So Delphi behavior is an oddity, not the rule.
Conclusion: Freepascal is good as it's :-)

...

I have jumped to Embarcadero's site to see their position on identifiers - https://docwiki.embarcadero.com/RADStudio/Athens/en/Identifiers

It looks that it is at least unclear - the safe interpretation is that old Pascal rule is still valid.
Platforms: Win7/64, Linux Mint Ulyssa/64

LV

  • Full Member
  • ***
  • Posts: 153
Re: Identifiers character set
« Reply #8 on: October 30, 2024, 02:56:06 pm »
https://gitlab.com/freepascal.org/fpc/source/-/issues/40933

That's great! Thank you for sharing the link.
It appears that most programming languages now support a limited set of Unicode characters.
https://rosettacode.org/wiki/Unicode_variable_names

Thaddy

  • Hero Member
  • *****
  • Posts: 16168
  • Censorship about opinions does not belong here.
Re: Identifiers character set
« Reply #9 on: October 30, 2024, 02:57:35 pm »
Ok, I see. There is a big debate.
But there is also a working compiler change.
I will try it.
Thanks
The debate is not without reason: it is pretty hefty from a Pascal language point of view.
But it works! It is there and if you want to try you can do so.
I was really impressed after I tried it - with Lithuanian -, but also understand the sentiments against applying it.

(A, Ą, B, C, Č, D, E, Ę, Ė, F, G, H, I, Į, Y, J, K, L, M, N, O, P, R, S, Š, T, U, Ų, Ū, V, Z, Ž, a, ą, b, c, č, d, e, ę, ė, f, g, h, i, į, y, j, k, l, m, n, o, p, r, s, š, t, u, ų, ū, v, z, ž, 0..9, _)
« Last Edit: October 30, 2024, 03:11:02 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8024
Re: Identifiers character set
« Reply #10 on: October 30, 2024, 03:00:02 pm »

I have jumped to Embarcadero's site to see their position on identifiers - https://docwiki.embarcadero.com/RADStudio/Athens/en/Identifiers

It looks that it is at least unclear - the safe interpretation is that old Pascal rule is still valid.

The reference to http://archives.ecma-international.org/2003/TG1/2003tg1-014.pdf Annex A looks like it might be useful though.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

TRon

  • Hero Member
  • *****
  • Posts: 3623
Re: Identifiers character set
« Reply #11 on: October 30, 2024, 03:07:47 pm »
I have done a quick test on various languages: Python; Gcc C++; Rexx interpreter; old Modula-2 XDS compiler; just Javascript console in browser. No one accept identifiers beginning with special characters as "§".
So Delphi behavior is an oddity, not the rule.
Conclusion: Freepascal is good as it's :-)

...

I have jumped to Embarcadero's site to see their position on identifiers - https://docwiki.embarcadero.com/RADStudio/Athens/en/Identifiers

It looks that it is at least unclear - the safe interpretation is that old Pascal rule is still valid.
Be careful making such assumptions:

athens fundamental syntactic elements
Quote
Identifiers denote constants, variables, fields, types, properties, procedures, functions, programs, units, libraries, and packages. An identifier can be any length, but only the first 255 characters are significant. An identifier must begin with an alphabetic character, a Unicode character, or an underscore (_) and cannot contain spaces (or Unicode characters considered as whitespace). Alphanumeric characters, Unicode characters, digits (including Unicode characters representing numerals), and underscores are allowed after the first character. Reserved words cannot be used as identifiers. Since the Delphi Language is case-insensitive, an identifier like CalculateValue could be written in any of these ways:

@LV:
YW  :)
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

Thaddy

  • Hero Member
  • *****
  • Posts: 16168
  • Censorship about opinions does not belong here.
Re: Identifiers character set
« Reply #12 on: October 30, 2024, 03:14:16 pm »
Just tried: Also supports classic Greek.. Now that really looks like code obfuscation even more so than Chinese looks to my eyes.
I can read old Greek, but not - or very, very, very limited - Chinese.
« Last Edit: October 30, 2024, 03:15:50 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

LV

  • Full Member
  • ***
  • Posts: 153
Re: Identifiers character set
« Reply #13 on: October 30, 2024, 03:14:58 pm »

Thaddy

  • Hero Member
  • *****
  • Posts: 16168
  • Censorship about opinions does not belong here.
Re: Identifiers character set
« Reply #14 on: October 30, 2024, 03:35:13 pm »
Well, the cat is out of the box and is alive, let's see how things develope.
Loads more sophisticated than, ahum, this: https://forum.lazarus.freepascal.org/index.php/topic,36887.msg246265.html#msg246265
 8-) :D

One note of warning: branch it. I forgot that first.
« Last Edit: October 30, 2024, 03:38:01 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

 

TinyPortal © 2005-2018