Recent

Author Topic: Pascal Security  (Read 20274 times)

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Pascal Security
« Reply #30 on: November 11, 2021, 10:37:20 am »
Hi!

The feature of Unicode Identiftiers is again a mistake.

It will prohibit the international exchange of source code.

I am not able to translate identifiers - as shown in the link - from Persian, Malayalam or Sinhala to ASCII.

Instead of simplyfing things it seems to get worse.

Winni


PierceNg

  • Sr. Member
  • ****
  • Posts: 435
    • SamadhiWeb
Re: Pascal Security
« Reply #31 on: November 11, 2021, 01:24:25 pm »
Sure, if you enable internationalization/i18n, but if you develop an application as a local only for other locals, using i18n just adds additional work and debugging overhead for you, personally I write most of my applications in english and don't care about i18n because it is simply a lot of extra work which I don't think is justified for my 1 man projects.
And even without strings, there are still comments, I regularly see non english comments, in a korean project I've seen all the comments where in korean. Generally speaking people just like to use their native language.

SQL too.

Code: SQL  [Select][+][-]
  1. % sqlite3 /tmp/ml.db
  2. SQLite version 3.27.2 2019-02-25 16:06:06
  3. Enter ".help" FOR usage hints.
  4. sqlite> .header ON
  5. sqlite> .schema
  6. CREATE TABLE 表一 (键一 INTEGER PRIMARY KEY, 列二 text DEFAULT '中文');
  7. sqlite> SELECT * FROM 表一;
  8. 键一|列二
  9. 1|中文
  10. 2|值二
  11. sqlite>
  12.  

In Japan, I watched some Open University classes on TV where they taught PostgreSQL, and similar to my Chinese example above, table names, column names etc were all in Japanese.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Pascal Security
« Reply #32 on: November 11, 2021, 01:35:35 pm »
In Japan, I watched some Open University classes on TV where they taught PostgreSQL, and similar to my Chinese example above, table names, column names etc were all in Japanese.

...where I don't think that (modern) Chinese or Japanese have directionality issues.

Also there's a distinction to be made between keywords, table/column identifiers and identifiers which identify something executable... which gets "interesting" in the case of e.g. Pascal where a variable (which could fairly be expected to be named using a localised language) points to a function/procedure (which might arguably be subject to more restrictive naming, for reasons discussed at the start of the thread).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Pascal Security
« Reply #33 on: November 11, 2021, 01:44:41 pm »
It will prohibit the international exchange of source code.

I am not able to translate identifiers - as shown in the link - from Persian, Malayalam or Sinhala to ASCII.

Not every source code needs to be internationally exchanged however. A small company in Malasya developing local software might prefer to use native identifers, because a) the (native) developers can maybe express themselves better in their native language and b) the terms of the problem domain can be used (e.g. if local terms for some processes need to be used that would be hard to translate).

Also people like to program in their native language. There are many programs out there that are written with German identifiers (I just saw one in one of the other threads). People that use the Latin alphabet can do that, but everyone else is precluded from that with FPC. So simply for this inclusivity FPC should support this as well, it's only fair.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12896
  • FPC developer.
Re: Pascal Security
« Reply #34 on: November 11, 2021, 02:06:47 pm »
Hi!

The feature of Unicode Identiftiers is again a mistake.

It will prohibit the international exchange of source code.

I am not able to translate identifiers - as shown in the link - from Persian, Malayalam or Sinhala to ASCII.

And you will be able to translate those identifiers if they are somehow superimposed onto the Latin alphabet? Keep in mind that just limiting to ASCII doesn't mean it will be English, languages like Turkish and Malay/Indonesian use Latin alphabet too.

In short, I think unicode identifiers are inevitable, and as Pascaldragon says, Delphi changed a decade ago. But it is a major undertaking, and half solutions are not really desirable. Either do it right, or wait.


kupferstecher

  • Hero Member
  • *****
  • Posts: 618
Re: Pascal Security
« Reply #35 on: November 11, 2021, 02:22:00 pm »
...where I don't think that (modern) Chinese or Japanese have directionality issues.
You mean right-to-left writing? There are rare occasions for that in Chinese, see attached picture and compare the characters on the left and right side of the aircrafts, both are written from front to back of the aircraft instead of left to right.

(Picture source:
https://aircraftphotos.top/2020/07/06/%E4%B8%AD%E5%9B%BD%E4%B8%9C%E6%96%B9%E8%88%AA%E7%A9%BA%E4%BA%91%E5%8D%97%E5%85%AC%E5%8F%B8-b737-700-b-5826/
https://aircraftphotos.top/2020/08/03/%e4%b8%9c%e6%96%b9%e8%88%aa%e7%a9%ba737-700-b-5276-%e5%ad%94%e9%9b%80%e6%b6%82%e8%a3%85/)

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Pascal Security
« Reply #36 on: November 11, 2021, 02:58:45 pm »
...where I don't think that (modern) Chinese or Japanese have directionality issues.
You mean right-to-left writing? There are rare occasions for that in Chinese, see attached picture and compare the characters on the left and right side of the aircrafts, both are written from front to back of the aircraft instead of left to right.

Yes, but that would imply direction-changing escapes at the start of a literal or (perhaps) an identifier, not embedded in it and definitely not with multiple embedded escapes.

There's already a rule that identifiers have to start in a certain way that differs from what might follow (i.e. the acceptability of digits) and that could potentially be extended to when escapes were permitted.

Should two identifiers which comprised the same character sequence but rendered in opposing directions be equivalent or distinct? Should a reserved word rendered in the unconventional direction be considered equivalent?

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Pascal Security
« Reply #37 on: November 11, 2021, 05:06:20 pm »
Hi!

In the early Delphi days I was happy to find some solutions in a russian Delphi forum. I did not understand a word of the communication, but the code was (mostly) written in some kind of "Pascal English".

If the code was cyrillic it would't have helped me.

Winni

kupferstecher

  • Hero Member
  • *****
  • Posts: 618
Re: Pascal Security
« Reply #38 on: November 11, 2021, 05:45:14 pm »
Should two identifiers which comprised the same character sequence but rendered in opposing directions be equivalent or distinct?
Good question. One might search for a term (in the source code) and expect it to be typed as seen or he might search a term where he doesn't even know it is present in reversed form. In the second case the equivalent representation could be helpful. Anyways i don't think that the reversed direction is required on source level. I just wanted to show an example of an unexpected right-to-left occurance.

Warfley

  • Hero Member
  • *****
  • Posts: 2066
Re: Pascal Security
« Reply #39 on: November 11, 2021, 11:12:59 pm »
Yes, but that would imply direction-changing escapes at the start of a literal or (perhaps) an identifier, not embedded in it and definitely not with multiple embedded escapes.

There's already a rule that identifiers have to start in a certain way that differs from what might follow (i.e. the acceptability of digits) and that could potentially be extended to when escapes were permitted.
Well, but the normal syntax constructs are left to right, arabic text is right to left, so if you have an arabic identifier it must always start with a direction switch, similar it always has to end with one to start the parameter list. Also it is common for such languages to embedd english words in the latin alphabet into their sentences. And identifiers usually are not words but phrases, a function like "PerfomRESTRequest" could be written as the arabic phrase for perfom request and the latin word REST, so in such a case you need at least 3 direction changes, possibly 4.

Honestly I see no way around to, if langauges like arabic should be supported, to allow for arbitrary direction changes.

Should two identifiers which comprised the same character sequence but rendered in opposing directions be equivalent or distinct? Should a reserved word rendered in the unconventional direction be considered equivalent?

MarkMLl
Well pretty simple, an identifier is referenced by the characters it is made of, not how the characters are displayed. Simply for the fact that it would be completetly impractical to compare renderings, you would have to render every identifier and compare the bitmaps or something similar. Then the question, what rendering engine should be used for this?

The thing is, yes unicode is complicated and no one likes internationalization. But it's the only viable solution if you don't want to exclude over half the planet.

Hi!

In the early Delphi days I was happy to find some solutions in a russian Delphi forum. I did not understand a word of the communication, but the code was (mostly) written in some kind of "Pascal English".

If the code was cyrillic it would't have helped me.

Winni
Forcing people to use a language you understand helps you to understand their code... What a revelation.

But you know this goes the other way around, while this makes it easier for you to understand the code, for the russians, the simple fact of learning Pascal requires them to learn english, which makes it much harder than if they could use russian and look at other russian code.
I wrote my first programs (Delphi) at the age of 7-8, where I did not understand a word english. I really started learning programming at the age of 12/13, where I knew a little bit english but wasn't that confident with it.
If I couldn't have used german identifier names, I would simply not have learned Pascal.

Restricting the identifiers to the english alphabeth only will simply result in people from other countries not using the language, but rather languages that will support their alphabet. And having less overall pascal programmers won't help you finding more code online either.
« Last Edit: November 11, 2021, 11:19:23 pm by Warfley »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Pascal Security
« Reply #40 on: November 12, 2021, 08:48:06 am »
Well, but the normal syntax constructs are left to right, arabic text is right to left, so if you have an arabic identifier it must always start with a direction switch,

Yes, START WITH is the point I'm trying to emphasise. That doesn't mean that such things remain legal after the initial direction indicator.

Quote
Well pretty simple, an identifier is referenced by the characters it is made of, not how the characters are displayed.

I'm unhappy with all possibilities. It is not desirable for somebody to be able to define <right-to-left>iP as 3.0 locally.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

kupferstecher

  • Hero Member
  • *****
  • Posts: 618
Re: Pascal Security
« Reply #41 on: November 12, 2021, 11:07:14 am »
Perhaps some arabic speaker could shed light on how they would see a useful way of using right-to-left in source code. I can't really imagine it to be readable to have right-to-left identifiers in source code that is generally left-to-right.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Pascal Security
« Reply #42 on: November 12, 2021, 03:37:45 pm »
Yes, but that would imply direction-changing escapes at the start of a literal or (perhaps) an identifier, not embedded in it and definitely not with multiple embedded escapes.

There's already a rule that identifiers have to start in a certain way that differs from what might follow (i.e. the acceptability of digits) and that could potentially be extended to when escapes were permitted.

Should two identifiers which comprised the same character sequence but rendered in opposing directions be equivalent or distinct? Should a reserved word rendered in the unconventional direction be considered equivalent?

Please take a look at the Unicoode Identifier and Pattern Syntax which is what we'd follow.

In the early Delphi days I was happy to find some solutions in a russian Delphi forum. I did not understand a word of the communication, but the code was (mostly) written in some kind of "Pascal English".

If the code was cyrillic it would't have helped me.

And if the code would have been in Latinized Russian it wouldn't have helped you either. Same that a Pascal program written with German identifiers won't help an Arabian or a Chinese (as long as they are don't able to understand German).

Warfley

  • Hero Member
  • *****
  • Posts: 2066
Re: Pascal Security
« Reply #43 on: November 12, 2021, 04:05:14 pm »
Yes, START WITH is the point I'm trying to emphasise. That doesn't mean that such things remain legal after the initial direction indicator.

But also that is not quite right. As I said, latin charset names are kept in latin charset. So consider these three very normal function identifier names (I use english terms because I don't speak arabic but it explains the concept)
PerformRESTRequestQueryRESTAPI, RESTQueryGET, RESTRequest
One or more names (here in red) can occur at any place. They can be at the start, so the identifier would not start with a right-to-left char. They can be at the end so an identifier might not end with a left-to-right char. They can be enclosed by natural language or can encompass it.
Basically a left-to-right and right-to-left can occur at any place. The only rule is that if a right-to-left one occurs, there must always be a left-to-right at some point afterwards. But thats pretty much it.

I'm unhappy with all possibilities. It is not desirable for somebody to be able to define <right-to-left>iP as 3.0 locally.

MarkMLl
Sure internationalization is not great, but it's the best possible solution. But your example isn't actually that bad. Sure it will look weird like but all the code support (like ctrl+click/go to definition) will not consider this whenever you write pi.

I mean we aren't exactly inventing something new here. Pretty much all of the most popular languages (C++, Java, Python, C#, etc.) use unicode identifiers for years, in some cases for more than a decade, and it works fine. In the very most cases it will change absolutely nothing for most of the programmers here.
I've been using C++ and Java for a long time now, and I must say, I never had any problems related to unicode in the code.
« Last Edit: November 12, 2021, 04:12:46 pm by Warfley »

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1315
Re: Pascal Security
« Reply #44 on: November 12, 2021, 04:27:48 pm »
At some point, Microsoft decided it would be a good idea to translate their programming languages. The result is mostly unreadable, even for the native speakers. You would have to redesign the whole language and syntax, simply translating the words doesn't work.

And a hodgepodge of English programming syntax and native identifiers isn't better. Even more so: you should never put the content of strings directly in the code, especially if you want it to work in different languages.

Ergo: keep the programming language, syntax and identifiers in English. That automatically prevents chars that look the same but aren't as it isn't easy to translate those in Unicode. In other words: you might think you're typing the exact same word, but it isn't. And the only way to figure out how a certain glyph is translated is with a hex-editor or by copy-pasting it in a program that tells you.

 

TinyPortal © 2005-2018