Recent

Author Topic: Identifiers character set  (Read 1873 times)


TRon

  • Hero Member
  • *****
  • Posts: 3631
Re: Identifiers character set
« Reply #16 on: October 30, 2024, 03:45:36 pm »
Well, the cat is out of the box, let's see how things develope.
Loads more sophisticated than, ahum, this;https://forum.lazarus.freepascal.org/index.php/topic,36887.msg246265.html#msg246265
 8-) :D
Well, if unicode identifiers are allowed then I fail to see why reserved/key words can't as well. Consistency is key, not ?  >:D
This tagline is powered by AI (AI advertisement: Free Pascal the only programming language that matters)

Thaddy

  • Hero Member
  • *****
  • Posts: 16184
  • Censorship about opinions does not belong here.
Re: Identifiers character set
« Reply #17 on: October 30, 2024, 05:16:42 pm »
Well, defintion, lexer, parser.
Depending on these you restrict. >:D
But you know that.
If I smell bad code it usually is bad code and that includes my own code.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5755
  • Compiler Developer
Re: Identifiers character set
« Reply #18 on: October 30, 2024, 09:57:00 pm »
Well, the cat is out of the box, let's see how things develope.
Loads more sophisticated than, ahum, this;https://forum.lazarus.freepascal.org/index.php/topic,36887.msg246265.html#msg246265
 8-) :D
Well, if unicode identifiers are allowed then I fail to see why reserved/key words can't as well. Consistency is key, not ?  >:D
Keywords should not use anything else than ASCII, because this way it's still possible to use the whole language on systems where Unicode is not easily usable (e.g. FreeDOS).

MarkMLl

  • Hero Member
  • *****
  • Posts: 8032
Re: Identifiers character set
« Reply #19 on: October 30, 2024, 10:30:29 pm »
Keywords should not use anything else than ASCII, because this way it's still possible to use the whole language on systems where Unicode is not easily usable (e.g. FreeDOS).

I for one am fine with that. Now about identifiers...

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

LV

  • Full Member
  • ***
  • Posts: 153
Re: Identifiers character set
« Reply #20 on: October 30, 2024, 11:15:26 pm »
Let's examine the use of mathematical symbols in the Julia programming language (no ads) as an example.

Warfley

  • Hero Member
  • *****
  • Posts: 1761
Re: Identifiers character set
« Reply #21 on: October 30, 2024, 11:43:07 pm »
I mean the way Julia allows for operators would generally be possible, just add those symbols to the Lexer. Currently the Lexer just hardcodes all operators, so you could just extend the list of operators.
The problem currently is that there is a 1-1 mapping of operators to node types, which means adding all those operators would completely blow up the code for handling addnodes (which is the node type for any binary operator).
So while certainly not that "difficult" to implement in the current FPC design it would be very tedious and clutter up the code immensly

What I find even more interesting is Haskell where you can define any arbitrary operator from any combination of symbol characters. For example:
Code: Text  [Select][+][-]
  1. (...) :: Ord a' => a' -> a' -> ClosedRange a'
  2. (...) a b
  3.     | a <= b    = Range a b
  4.     | otherwise = EmptyRange
  5. infix 5 ...
  6.  
Defines a new infix operator a...b Which returns a range from a to b

This allows you to create any custom operator without having them hardcoded. The problem there is that this is currently completely not supported by the fpc.as currently every operator is hardcoded

Also generally, the FPC is a bit lacking when it comes to operators as there is currently no associativity defined for the operators.
« Last Edit: October 30, 2024, 11:51:49 pm by Warfley »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8032
Re: Identifiers character set
« Reply #22 on: October 31, 2024, 08:40:31 am »
Well, considering Julia etc. the big question remains: given an arbitrary character set, how do we distinguish between identifiers (which currently start with a letter or underscore), numbers (which currently start with a digit or radix indicator) and operators (which are at present constrained to a small fixed range, but potentially need not be)?

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Warfley

  • Hero Member
  • *****
  • Posts: 1761
Re: Identifiers character set
« Reply #23 on: October 31, 2024, 09:19:54 am »
The Unicode consortium actually does the job for us, in that they publish documents describing all that stuff. I think the specification for identifiers was already linked in this thread

So when there is a new version of the Unicode standard, the Unicode consortium will also update these specifications and those changes can then be carried over into the software that uses them.
There are so many other issues with Unicode that no single person can be tasked to keep track of all that. For example normalization. Like the letter ä can be represented in multiple ways, a plus two dots, a plus one dot plus one dot, or as a single char, etc. The Unicode consortium has a spec on normalization, introducing 4 normalization algorithms to make sure you can compare any representation of the same character.
« Last Edit: October 31, 2024, 12:26:42 pm by Warfley »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8032
Re: Identifiers character set
« Reply #24 on: October 31, 2024, 10:26:31 am »
Hence also DUCET which I mentioned early, since AIUI this is also relevant to matched bracketing characters etc.

However "There is an accepted standard" and "The Pascal community agrees to comply" are several leagues apart.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

gidesa

  • Full Member
  • ***
  • Posts: 145
Re: Identifiers character set
« Reply #25 on: October 31, 2024, 12:23:51 pm »
Let's examine the use of mathematical symbols in the Julia programming language (no ads) as an example.
Exactly, can be useful. Although it's needed a convenient keyboard redefine. What editor is it?
But for symbol as "§", "£", "°" there are physical keys on many standard keyboards. So they are
easy to use.

LV

  • Full Member
  • ***
  • Posts: 153
Re: Identifiers character set
« Reply #26 on: October 31, 2024, 12:33:13 pm »
Let's examine the use of mathematical symbols in the Julia programming language (no ads) as an example.
Exactly, can be useful. Although it's needed a convenient keyboard redefine. What editor is it?
But for symbol as "§", "£", "°" there are physical keys on many standard keyboards. So they are
easy to use.

This is Visual Studio Code. By the way, Unicode characters can be easily inserted in Lazarus: Edit->Insert from Character Map->Range :)
« Last Edit: October 31, 2024, 12:36:24 pm by LV »

Warfley

  • Hero Member
  • *****
  • Posts: 1761
Re: Identifiers character set
« Reply #27 on: October 31, 2024, 12:38:57 pm »
Exactly, can be useful. Although it's needed a convenient keyboard redefine. What editor is it?
But for symbol as "§", "£", "°" there are physical keys on many standard keyboards. So they are
easy to use.

An alternative is something I have seen in Isabelle (very interesting system), where they use mathematical symbols, but in the actual sources they are represented as ascii escape sequences, so for example the editor shows the unicode symbol "∃" but the real text contents is "\<exists>" and there are macros to writeing them (e.g. equivalence will be automatically converted from <=>) or with autocmoplete from the \ commands.

That said, for fun I have once defined a bunch of mathematical operators like the ∈ for testing contents of a set, and while it makes code very nice to look at, it's really annoying to type.
Which is why most custom operators in haskell are usually written in ascii.

What I think would be even more interesting and would not require any change to the FPC would be ligature support for Lazurus. You can take a look at FiraCode which is a ligature based coding font, that has ligature rules to for example make the two chars => to a sinlge arrow symbol. Or <= and >= become ≤ and ≥.
Also has some other rules, like centering colons or applying different spacing to the use of different symbols.

It's really nice and in editors that support it, it's now my default font. For anyone interested I can recommend to just download the font, and use it in an editor like VSCode or Notepad++ which supports ligatures (need to enable them in the options first, see the fira code wiki) and just open your source files with it.
At least for me it just makes it a joy to look at the code and it works well with nearly any language.

LV

  • Full Member
  • ***
  • Posts: 153
Re: Identifiers character set
« Reply #28 on: October 31, 2024, 12:53:14 pm »
Thank you for the very useful information.

gidesa

  • Full Member
  • ***
  • Posts: 145
Re: Identifiers character set
« Reply #29 on: October 31, 2024, 12:55:23 pm »

What I think would be even more interesting and would not require any change to the FPC would be ligature support for Lazurus. You can take a look at FiraCode which is a ligature based coding font, that has ligature rules to for example make the two chars => to a sinlge arrow symbol. Or <= and >= become ≤ and ≥.
Also has some other rules, like centering colons or applying different spacing to the use of different symbols.

It's really nice and in editors that support it, it's now my default font. For anyone interested I can recommend to just download the font, and use it in an editor like VSCode or Notepad++ which supports ligatures (need to enable them in the options first, see the fira code wiki) and just open your source files with it.
At least for me it just makes it a joy to look at the code and it works well with nearly any language.

Thanks, very interesting.
I usually use Gvim as editor, out of IDE. This use a extended character codification called "digraphs". You digit Ctrl-K and then a digram as S* for greek sigma.
But all that systems, as in Lazarus or Delphi IDE, are annoying, as you say.
I remember that there are also programs to remap physical keys to arbitrary characters. I  have to find the name of them.

 

TinyPortal © 2005-2018