Recent

Author Topic: Pascal Security  (Read 20271 times)

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 379
Pascal Security
« on: November 01, 2021, 08:48:11 pm »

AlexTP

  • Hero Member
  • *****
  • Posts: 2713
    • UVviewsoft
Re: Pascal Security
« Reply #1 on: November 01, 2021, 08:57:44 pm »
> uses Unicode directionality override characters

I have read about this 5 years ago on russian Habr.com. And made a special handling of these chars in CudaText - it shows these chars specially.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Pascal Security
« Reply #2 on: November 01, 2021, 09:39:56 pm »
https://www.schneier.com/blog/archives/2021/11/hiding-vulnerabilities-in-source-code.html - is FPC across these issues?

Since you've not summarised that: some languages accept Unicode for function names and for things that might be used in the context of a reserved word. Does FPC?

https://www.freepascal.org/docs-html/current/ref/refse4.html#x15-140001.4 provides no useful information other than that the first character of a function (etc.) name must be an underscore or a letter.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

AlexTP

  • Hero Member
  • *****
  • Posts: 2713
    • UVviewsoft
Re: Pascal Security
« Reply #3 on: November 01, 2021, 10:00:20 pm »

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Pascal Security
« Reply #4 on: November 01, 2021, 10:04:24 pm »
https://www.freepascal.org/docs-html/current/ref/refse4.html#x15-140001.4 provides no useful information other than that the first character of a function (etc.) name must be an underscore or a letter.


That page makes clear that only letters, digits and _ are valid characters in identifiers and reserved words; digits may not be the first character and & is accepted only as an initial identifier character. In other words FPC accepts only a fairly restricted ANSI 7-bit alphabet for identifiers (and likewise for operators and separators).

Recent Delphis, on the other hand, accept other characters in identifiers, I believe.
« Last Edit: November 01, 2021, 10:06:14 pm by howardpc »

dbannon

  • Hero Member
  • *****
  • Posts: 3825
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Pascal Security
« Reply #5 on: November 02, 2021, 12:18:05 am »
Easily tested.

On the other hand, testing against ALL utf8 characters might be a bit harder ....

EDIT: No, its not about 'funny' characters in identifier names :
Quote
By injecting Unicode Bidi override
characters into comments and strings, an adversary can pro-
duce syntactically-valid source code in most modern languages
for which the display order of characters presents logic that
diverges from the real logic.

That could be a bit more interesting .....
« Last Edit: November 02, 2021, 12:51:46 am by dbannon »
Lazarus 4, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 379
Re: Pascal Security
« Reply #6 on: November 02, 2021, 01:27:39 am »
Since this affects any parsed text at all, it's not just compilers. Does anyone have a routine at hand to say whether a unicode character point is one of the offending characters?

dbannon

  • Hero Member
  • *****
  • Posts: 3825
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Pascal Security
« Reply #7 on: November 02, 2021, 05:14:48 am »
Since this affects any parsed text at all, it's not just compilers. Does anyone have a routine at hand to say whether a unicode character point is one of the offending characters?

Or, perhaps can tell the differences between the "offending characters" being used legitimately or not ?

Davo
Lazarus 4, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

AlexTP

  • Hero Member
  • *****
  • Posts: 2713
    • UVviewsoft
Re: Pascal Security
« Reply #8 on: November 02, 2021, 06:55:52 am »
>Does anyone have a routine at hand to say whether a unicode character point is one of the offending characters?

Simple case-block with the codes:

(U+202A, U+202B, U+202C, U+202D, U+202E, U+2066, U+2067, U+2068, U+2069, U+061C, U+200E и U+200F)
« Last Edit: November 02, 2021, 07:06:10 am by Alextp »

dbannon

  • Hero Member
  • *****
  • Posts: 3825
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Pascal Security
« Reply #9 on: November 02, 2021, 07:19:50 am »
The paper highlights two (main) exploits -

Invisible characters in Identifiers - fpc spits them out (I think).

Bidi override characters used in comments (or string literals?) to terminate the comment early, leaving content that appears as a comment (or a string literal) as compile-able code.  Unless we make a decision to never allow such (unicode) characters anywhere in a source file, not much can be done.

But the paper does mention programmer's editors showing up characters, maybe (as Alextp is clearly thinking) just giving a reviewer a 'heads up' every time a file is loaded into an editor that it does contain one of the risky characters ?  This does not stop the problem but provides an easy audit tool.

Code: Pascal  [Select][+][-]
  1. showmessage('warning line 34, pos 52 contains U+202B Right-to-Left Embedding character');

I guess it will annoy hell out of a programmer who does have a legitimate reason for such content, even more than the "inline is not inlined" thingo but it could be turned of if developing your own code, only turned on when reviewing someone else's ?  An editor option, not a project one.

I think it would be a nice feather in Lazarus's hat if it included such an audit tool. And maybe fpc could issue a warning if it sees such a character too ?

As alextp noted, the characters them selves would be easy enough to search for, each individually. Use a bit of time but that would be a OK at load time only I guess.

Davo
« Last Edit: November 02, 2021, 07:23:09 am by dbannon »
Lazarus 4, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

AlexTP

  • Hero Member
  • *****
  • Posts: 2713
    • UVviewsoft
Re: Pascal Security
« Reply #10 on: November 02, 2021, 07:53:16 am »
Audit tool, another dialog in the IDE? Better not, just add to SynEdit the special highlight of those chars, that's it. (Optional, or not, I don't care)

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 379
Re: Pascal Security
« Reply #11 on: November 02, 2021, 08:29:46 am »
Well, here's some source material to help things along. The idea is that you either ban bidi characters altogether (using hasUnicodeBiDiChars) or you at least ensure that whitespace and string and comments are well formed unicode (using checkUnicodeWellFormed)  I think it's solid, though I'll post updates as I hear about issues (I have a java port of the same code):

Code: Pascal  [Select][+][-]
  1. unit UnicodeUtilities;
  2.  
  3. interface
  4.  
  5. uses
  6.   Classes, SysUtils, Contnrs;
  7.  
  8. const
  9.   LRE = #$202a;
  10.   RLE = #$202b;
  11.   PDF = #$202c;
  12.   LRO = #$202d;
  13.   RLO = #$202e;
  14.   LRI = #$2066;
  15.   RLI = #$2067;
  16.   FSI = #$2068;
  17.   PDI = #$2069;
  18.   LRM = #$200E;
  19.   RLM = #$200F;
  20.   ALM = #$061C;
  21.   PARA = #10;
  22.  
  23.   ALL_BIDI_CHARS : TUCharArray = (LRE, RLE, PDF, LRO, RLO, LRI, RLI, FSI, PDI, LRM, RLM, ALM, PARA);
  24.   CONTROL_CHARS_1 : TUCharArray = (LRE, RLE, LRO, RLO, LRM, RLM, ALM);
  25.   CONTROL_CHARS_2 : TUCharArray = (LRI, RLI, FSI);
  26.  
  27. type
  28.  
  29.   { TUnicodeUtilities }
  30.  
  31.   TUnicodeUtilities = class (TObject)
  32.   public
  33.     class function hasUnicodeBiDiChars(src : String) : boolean;
  34.     class function describe(c : UnicodeChar) : String;
  35.     class function replaceBiDiChars(src : String) : String;
  36.  
  37.     // returns '' if src is well formed, or a description of a structure problem with bi-directional characters
  38.     class function checkUnicodeWellFormed(src : String) : String;
  39.   private
  40.     FList : TObjectList;
  41.     constructor Create;
  42.     function checkWellFormed(src : String) : String;
  43.     procedure popJustOne(chars : TUCharArray);
  44.     procedure popOneAndOthers(chars, others : TUCharArray);
  45.     function summary : String;
  46.   public
  47.     destructor Destroy; override;
  48.   end;
  49.  
  50.  
  51. implementation
  52.  
  53. type
  54.  
  55. function unicodeChars(s : String) : TUCharArray;
  56. var
  57.   i, c, l, cl : integer;
  58.   ch : UnicodeChar;
  59.   p: PChar;
  60. begin
  61.   l := length(s);
  62.   SetLength(result, l); // maximum possible length
  63.   i := 0;
  64.   c := 1;
  65.   p := @s[1];
  66.   while l > 0 do
  67.   begin
  68.     ch := UnicodeChar(UTF8CodepointToUnicode(p, cl));
  69.     result[i] := ch;
  70.     inc(i);
  71.     dec(l, cl);
  72.     inc(p, cl);
  73.   end;
  74.   SetLength(result, i);
  75. end;
  76.  
  77.  
  78. function InSet(c : UnicodeChar; arr : TUCharArray) : boolean;
  79. var
  80.   t : UnicodeChar;
  81. begin
  82.   result := false;
  83.   for t in arr do
  84.     if t = c then
  85.       exit(true);
  86. end;
  87.  
  88.  
  89.   { TStateStack }
  90.  
  91.   TStateStack = class
  92.   private
  93.     c : UnicodeChar;
  94.     i : integer;
  95.     constructor create(aC: UnicodeChar; aI : integer);
  96.   end;
  97.  
  98. { TStateStack }
  99.  
  100. constructor TStateStack.create(aC: UnicodeChar; aI: integer);
  101. begin
  102.   inherited Create;
  103.   c := aC;
  104.   i := aI;
  105. end;
  106.  
  107.  
  108. { TUnicodeUtilities }
  109.  
  110. class function TUnicodeUtilities.hasUnicodeBiDiChars(src: String): boolean;
  111. var
  112.   c : UnicodeChar;
  113. begin
  114.   result := false;
  115.   for c in unicodeChars(src) do
  116.     if inSet(c, ALL_BIDI_CHARS) then
  117.       exit(true);
  118. end;
  119.  
  120. class function TUnicodeUtilities.describe(c: UnicodeChar): String;
  121. begin
  122.   case c of
  123.     LRE : result := 'LRE';
  124.     RLE : result := 'RLE';
  125.     PDF : result := 'PDF';
  126.     LRO : result := 'LRO';
  127.     RLO : result := 'RLO';
  128.     LRI : result := 'LRI';
  129.     RLI : result := 'RLI';
  130.     FSI : result := 'FSI';
  131.     PDI : result := 'PDI';
  132.     LRM : result := 'LRM';
  133.     RLM : result := 'RLM';
  134.     ALM : result := 'ALM';
  135.     PARA : result := 'PARA';
  136.   else
  137.     result := c;
  138.   end;
  139. end;
  140.  
  141. class function TUnicodeUtilities.replaceBiDiChars(src: String): String;
  142. var
  143.   b : TStringBuilder;
  144.   c : UnicodeChar;
  145. begin
  146.   b := TStringBuilder.create;
  147.   try
  148.     for c in unicodeChars(src) do
  149.       if inSet(c, ALL_BIDI_CHARS) then
  150.         b.append('|'+describe(c)+'|')
  151.       else
  152.         b.append(TEncoding.UTF8.getString(TEncoding.UTF8.GetBytes(c)));
  153.     result := b.toString();
  154.   finally
  155.     b.free;
  156.   end;
  157. end;
  158.  
  159. class function TUnicodeUtilities.checkUnicodeWellFormed(src: String): String;
  160. var
  161.   this : TUnicodeUtilities;
  162. begin
  163.   this := TUnicodeUtilities.create;
  164.   try
  165.     result := this.checkWellFormed(src);
  166.   finally
  167.     this.free;
  168.   end;
  169. end;
  170.  
  171. constructor TUnicodeUtilities.Create;
  172. begin
  173.   inherited Create;
  174.   FList := TObjectList.create;
  175.   FList.OwnsObjects := true;
  176. end;
  177.  
  178. function TUnicodeUtilities.checkWellFormed(src: String): String;
  179. var
  180.   i : integer;
  181.   c : UnicodeChar;
  182. begin
  183.   i := 0;
  184.   for c in unicodeChars(src) do
  185.   begin
  186.     inc(i);
  187.     if inSet(c, ALL_BIDI_CHARS) then
  188.     begin
  189.       case c of
  190.         PARA:
  191.           FList.clear();
  192.         LRO, RLO:
  193.           FList.add(TStateStack.create(c, i));
  194.         PDF:
  195.           popJustOne(CONTROL_CHARS_1);
  196.         LRI, RLI, FSI:
  197.           FList.add(TStateStack.create(c, i));
  198.         PDI:
  199.           popOneAndOthers(CONTROL_CHARS_2, CONTROL_CHARS_1);
  200.         LRM, RLM, ALM:
  201.           FList.add(TStateStack.create(c, i));
  202.       end;
  203.     end;
  204.   end;
  205.   if (FList.Count = 0) then
  206.     result := ''
  207.   else
  208.     result := summary;
  209. end;
  210.  
  211. procedure TUnicodeUtilities.popJustOne(chars: TUCharArray);
  212. begin
  213.   if (FList.count > 0) and InSet(TStateStack(Flist.Last).c, chars) then
  214.     FList.Delete(FList.count-1);
  215. end;
  216.  
  217. procedure TUnicodeUtilities.popOneAndOthers(chars, others: TUCharArray);
  218. var
  219.   found, done : boolean;
  220.   i : integer;
  221. begin
  222.   found := false;
  223.   for i := 0 to FList.count - 1 do
  224.   begin
  225.     if InSet(TStateStack(Flist[i]).c, chars) then
  226.     begin
  227.       found := true;
  228.       break;
  229.     end;
  230.   end;
  231.  
  232.   if (found) then
  233.   begin
  234.     while (FList.count > 0) and (InSet(TStateStack(Flist.Last).c, chars) or InSet(TStateStack(Flist.Last).c, others)) do
  235.     begin
  236.       done := InSet(TStateStack(Flist.Last).c, chars);
  237.       FList.Delete(FList.count-1);
  238.       if (done) then
  239.         break;
  240.     end;
  241.   end;
  242. end;
  243.  
  244. function TUnicodeUtilities.summary: String;
  245. begin
  246.   result := 'Unicode Character '+describe(TStateStack(Flist.Last).c)+' at index '+inttostr(TStateStack(Flist.Last).i)+' has no terminating match';
  247. end;
  248.  
  249. destructor TUnicodeUtilities.Destroy;
  250. begin
  251.   FList.Free;
  252.   inherited Destroy;
  253. end;
  254.  
  255. end.
  256.  
« Last Edit: November 02, 2021, 08:31:30 am by Grahame Grieve »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: Pascal Security
« Reply #12 on: November 02, 2021, 09:25:20 am »
In other words FPC accepts only a fairly restricted ANSI 7-bit alphabet for identifiers (and likewise for operators and separators).

Where is that documented?And to what extent is it enforced?

I don't see anything like that in either the formal documentation or in the wiki page (pointing out firmly that the wiki is not formal documentation).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

AlexTP

  • Hero Member
  • *****
  • Posts: 2713
    • UVviewsoft
Re: Pascal Security
« Reply #13 on: November 02, 2021, 09:32:31 am »
>Where is that documented?And to what extent is it enforced?

https://wiki.freepascal.org/Identifiers
It tells exactly that, ie what chars are allowed in Ids.

Bart

  • Hero Member
  • *****
  • Posts: 5727
    • Bart en Mariska's Webstek
Re: Pascal Security
« Reply #14 on: November 02, 2021, 09:42:47 am »
I don't see anything like that in either the formal documentation or in the wiki page (pointing out firmly that the wiki is not formal documentation).

Official documentation: https://www.freepascal.org/docs-html/current/ref/refse4.html#x15-140001.4.

Bart

 

TinyPortal © 2005-2018