Recent

Author Topic: implicit domain-check in case-statement  (Read 478 times)

nanobit

  • New member
  • *
  • Posts: 30
implicit domain-check in case-statement
« on: May 24, 2019, 11:36:11 am »
Current situation:
"case cv of ..." can crash (SIGSEGV) if cv is not in cv-type domain.
Example: (compiler optimization level > 0),
cv-size is 1 byte (256 values), but cv-type defines only 15 values (0..14),
and cv can hold outlier (undefined) after stream.read(var buf;) or typecasting.
https://bugs.freepascal.org/view.php?id=35603

Better (predictable) reaction:
Outliers should be catchable in else-branch, like known from:
if (cv = lbl1) then do1
else if (cv = lbl2) then do2
else doElse; // filter out

Arguments for implicit domain-check:
1) Delphi compatible
2) Docs can be understood to handle outliers:
  Else-branch means "outside" by not-inside (logical negation).
  Thus outside of type-domain is outside as well.
  Thirdparty code (years old or new) might use this assumption.
  I would assume, most don't even know the current limit.
3) if absence of implicit domain-check is intentional, then
  {$R+} ERangeError would need to be supported instead of arbitrary crash.
  This would be an improvement, but doesn't solve the first arguments.

Other constructs which also deal with outliers:
e := TEnum(17); if (e in setOf15)  then ... else ...
f := Nan; if isNumber( f) then ... else ...

Thaddy

  • Hero Member
  • *****
  • Posts: 7963
Re: implicit domain-check in case-statement
« Reply #1 on: May 24, 2019, 12:49:00 pm »
Oh, well...
Use rangechecks.... during development.
Hamlet 1.4 (nothing wrong with the Danish, btw)

sash

  • Sr. Member
  • ****
  • Posts: 251
Re: implicit domain-check in case-statement
« Reply #2 on: May 24, 2019, 02:17:49 pm »
cv-size is 1 byte (256 values), but cv-type defines only 15 values (0..14),

Shouldn't be your enumerated-type variable explicitly checked for validity at the moment of assignment (knowing that manipulations with pointers/buffers are unsafe by definition) instead?
Lazarus 2.0.2 FPC 3.0.4 x86_64-linux-gtk2 -- Ubuntu 19.04 XFCE

howardpc

  • Hero Member
  • *****
  • Posts: 3009
Re: implicit domain-check in case-statement
« Reply #3 on: May 24, 2019, 02:30:37 pm »
It is the programmer's responsibility to check for invalid data, not the compiler's.
Sure the language and the compiler can and does help in numerous ways, and you can add range checking during development to add a further layer of implicit checking. However, you can never avoid explicit data checks altogether if you want robust programs, most especially if you indulge in typecasts on enumerated values with restricted ranges (and similar hacks).

nanobit

  • New member
  • *
  • Posts: 30
Re: implicit domain-check in case-statement
« Reply #4 on: May 24, 2019, 02:58:07 pm »
I agree with all here who say, own checking is required before fpc case-statement.
But in Delphi this was not required, and as seen in several given fpc bug reports,
devs assume fpc to be Delphi compatible, if docs don't cite the difference of behavior.

I'm not asking for a solution in my own code,
but that undocumented, diverging behaviour is to be removed
to be Delphi-compatible.

The Delphi solution is reasonable and I also cited three other
fpc outlier-catching constructs which work without extra checking.
So there is no natural law that a case-statement must stick out here with a crash.

The case-statement crash is fpc specific and not giving ERangeError (if {R+}). Fpc bug reports (see links) confirm this. Others are unnoticed during development, because of the rare condition (compiler cfg + type definition + label usage + runtime value). The solution is easy only (no further bug reports in the future) if implemented in fpc.

nanobit

  • New member
  • *
  • Posts: 30
Re: implicit domain-check in case-statement
« Reply #5 on: May 25, 2019, 08:05:07 am »
Shouldn't be your enumerated-type variable explicitly checked for validity at the moment of assignment (knowing that manipulations with pointers/buffers are unsafe by definition) instead?

in FPC, Yes. But consider the following example,
which is not supported by fpc (but by Delphi):

An enumeration type can also have the meaning of a dataView (subset):
means something like (inside/outside), (namedValues/unnamedValues).
One can have multiple views to the same data, for example:
case tview1(char) of ... end;
case tview2(char) of ... end;
This is a valid approach if the case-statement can handle out-of-view:
outside-value goes to else-branch or is ignored (if else-branch is absent).
The else-branch can also contain own checking.
But in fpc, a check is needed before the whole case-statement.
Case-statement means value-checking, but it's less complete in fpc.

***************************
Another difference, already mentioned before:
In Delphi, one can convert

if (cv = lbl1) then do1
else if (cv = lbl2) then do2
else doElse; // filter out

to a case-statement. The old else-branch can
contain own checking, the new else-branch is the same.
(but in fpc, one needs to move a check before the case-statement).

case cv of
lbl1: do1;
lbl2: do2;
else doElse; // filter out
end;

**********************
The main challenge is, which code is used by the application,
which thirdparty code is using fpc incompatible case-statements.
This is not easy to see.

sash

  • Sr. Member
  • ****
  • Posts: 251
Re: implicit domain-check in case-statement
« Reply #6 on: May 25, 2019, 09:07:49 am »
case tview1(char) of ... end;
case tview2(char) of ... end;
This is a valid approach if the case-statement can handle out-of-view:

Of course, this one is not valid.

As I said before, you're converting logically incompatible (default-typecasted for binary compatibility at best) types without range-checking/proper conversion function/or whatever must be done to ensure your "char/pointer"  is actually valid TYourEnumTypeVariable.

Instead, you're proposing such checks should be done elsewhere.
Lazarus 2.0.2 FPC 3.0.4 x86_64-linux-gtk2 -- Ubuntu 19.04 XFCE

nanobit

  • New member
  • *
  • Posts: 30
Re: implicit domain-check in case-statement
« Reply #7 on: May 25, 2019, 11:59:17 am »
Instead, you're proposing such checks should be done elsewhere.

This is the consequence. The case-statement receives a byte value;
after that, the case-statement will deduce something based on the type,
and the Delphi case-statement will decide:
Use the else-branch (or ignore it (if else-branch absent)).

In other words, the case-statement itself is used as the value-checker.
(it's a value-checking construct anyway). And this result is compatible
with the "if .... else if ... else ...". Here the input can be undefined too,
which is supported by fpc.

But the whole point is: I don't propose a short coding style,
I just don't want arbitrary crashes from thirdparty code used in lazarus apps.
That's all I want. The bug reporters confirm the unawareness, not surprisingly.
And making all source-code out there compatible with the fpc case-statement is not realistic.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 635
Re: implicit domain-check in case-statement
« Reply #8 on: May 25, 2019, 02:28:06 pm »
First of all, regarding the FPC behaviour:
* Internally, FPC has always reasoned the same about enumerations and subranges: it assumes that only values within the type range are valid. This allows us to give warnings about incomplete case-statements (added in trunk), perform bitpacking (since then fewer bits will be available, and values outside the declared range would be silently cut off), and also perform various optimisations
* the fact that a particular expression seems to work correctly even when you feed it (as far as FPC is concerned) invalid input, does not mean it correctly handles out-of-range values. You are merely observing a detail of the code generation and optimisation passes in this case. It's (again, as far as FPC is concerned) similar to how writing and reading values past the end of an allocated memory block may seem to work fine, simply because the allocation was rounded up to a multiple of X bytes and you are writing in that padding space at the end. It does not mean the program will keep working in the future, because the padding may change or a checksum may be added by a memory manager.

The above is also what you observe with the case-statement: it's a change in code generation that triggers this particular issue now, but storing out-of-range values in enumerations has always resulted in undefined behaviour in FPC. E.g. in case of inlining and constant propagation, the compiler has also always (well, since a very long time at least) optimized away if-then-else checks when it could prove that you are comparing an enumeration or subrange value with a value that's out of the range. It indeed cannot reason about chained if-then-else statements, but e.g. LLVM can and once we pass on the enumeration range information to it, such constructs will also be optimised out. Specifically regarding the case statement, the trunk compiler now at least warns that the "else" branch of such a case statement is unreachable.

That said, I understand this is an issue with code coming from Delphi. In many cases, we can solve such issues by having different behaviour in FPC and Delphi modes, but here that would make the behaviour only more unpredictable because you can mix code in FPC and Delphi modes in the same program:
* a first option would be to define the behaviour of the enumeration type based on the syntax mode of the unit in which it is defined. This would cause issues since both the RTL and (much of?) the LCL are compiled in mode (obj)fpc, so Delphi code using these units would still work in a different way
* a second option would be to define the behaviour of the enumeration type based on the syntax mode of the current unit. However, this means that if a {$mode objfpc} unit uses an enumeration variable set to an out-of-range value by a {$mode delphi} unit, the FPC unit's code will still break

So there is no way to fix this other than by basically ignoring the range information from the declaration, like Delphi does (it does use the range information to a limited extent, but for the most part it has to ignore it since the variable can contain anything). This makes enumeration types much less useful and less safe. The caveat is indeed that this is only true as long as you don't bypass the type checking of the compiler by explicitly typecasting invalid values, but that goes for any type: accessing a pointer that was set to pointer(1), or an ansistring that was set to ansistring(pointer(1)), cannot be caught by any compiler check either, even though the value 1 can also be represented exactly by a pointer variable. Therefore I would consider this to be a very big loss for the language.

While I understand this is not a trivial solution for existing code since you have to look at all of its enumerations and how they are used, enumerations easily support having an "invalid" value by simply declaring an enumeration value for this purpose. After all, that's also what you do when you manually use an integer in this sense, except that the compiler cannot check that it does not overlap with an actual member of the enumeration.

nanobit

  • New member
  • *
  • Posts: 30
Re: implicit domain-check in case-statement
« Reply #9 on: May 25, 2019, 02:40:54 pm »
Ok, it's your decision. But nevertheless, thanks for all your work.