Recent

Author Topic: implicit domain-check in case-statement  (Read 1266 times)

nanobit

  • New Member
  • *
  • Posts: 40
implicit domain-check in case-statement
« on: May 24, 2019, 11:36:11 am »
Current situation:
"case cv of ..." can crash (SIGSEGV) if cv is not in cv-type domain.
Example: (compiler optimization level > 0),
cv-size is 1 byte (256 values), but cv-type defines only 15 values (0..14),
and cv can hold outlier (undefined) after stream.read(var buf;) or typecasting.
https://bugs.freepascal.org/view.php?id=35603

Better (predictable) reaction:
Outliers should be catchable in else-branch, like known from:
if (cv = lbl1) then do1
else if (cv = lbl2) then do2
else doElse; // filter out

Arguments for implicit domain-check:
1) Delphi compatible
2) Docs can be understood to handle outliers:
  Else-branch means "outside" by not-inside (logical negation).
  Thus outside of type-domain is outside as well.
  Thirdparty code (years old or new) might use this assumption.
  I would assume, most don't even know the current limit.
3) if absence of implicit domain-check is intentional, then
  {$R+} ERangeError would need to be supported instead of arbitrary crash.
  This would be an improvement, but doesn't solve the first arguments.

Other constructs which also deal with outliers:
e := TEnum(17); if (e in setOf15)  then ... else ...
f := Nan; if isNumber( f) then ... else ...

Thaddy

  • Hero Member
  • *****
  • Posts: 8973
Re: implicit domain-check in case-statement
« Reply #1 on: May 24, 2019, 12:49:00 pm »
Oh, well...
Use rangechecks.... during development.
Most people that want to use threading should learn to patch their jeans first: use a needle.

sash

  • Sr. Member
  • ****
  • Posts: 266
Re: implicit domain-check in case-statement
« Reply #2 on: May 24, 2019, 02:17:49 pm »
cv-size is 1 byte (256 values), but cv-type defines only 15 values (0..14),

Shouldn't be your enumerated-type variable explicitly checked for validity at the moment of assignment (knowing that manipulations with pointers/buffers are unsafe by definition) instead?
Lazarus 2.0.2 FPC 3.0.4 x86_64-linux-gtk2 -- Ubuntu 19.04 XFCE

howardpc

  • Hero Member
  • *****
  • Posts: 3158
Re: implicit domain-check in case-statement
« Reply #3 on: May 24, 2019, 02:30:37 pm »
It is the programmer's responsibility to check for invalid data, not the compiler's.
Sure the language and the compiler can and does help in numerous ways, and you can add range checking during development to add a further layer of implicit checking. However, you can never avoid explicit data checks altogether if you want robust programs, most especially if you indulge in typecasts on enumerated values with restricted ranges (and similar hacks).

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #4 on: May 24, 2019, 02:58:07 pm »
I agree with all here who say, own checking is required before fpc case-statement.
But in Delphi this was not required, and as seen in several given fpc bug reports,
devs assume fpc to be Delphi compatible, if docs don't cite the difference of behavior.

I'm not asking for a solution in my own code,
but that undocumented, diverging behaviour is to be removed
to be Delphi-compatible.

The Delphi solution is reasonable and I also cited three other
fpc outlier-catching constructs which work without extra checking.
So there is no natural law that a case-statement must stick out here with a crash.

The case-statement crash is fpc specific and not giving ERangeError (if {R+}). Fpc bug reports (see links) confirm this. Others are unnoticed during development, because of the rare condition (compiler cfg + type definition + label usage + runtime value). The solution is easy only (no further bug reports in the future) if implemented in fpc.

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #5 on: May 25, 2019, 08:05:07 am »
Shouldn't be your enumerated-type variable explicitly checked for validity at the moment of assignment (knowing that manipulations with pointers/buffers are unsafe by definition) instead?

in FPC, Yes. But consider the following example,
which is not supported by fpc (but by Delphi):

An enumeration type can also have the meaning of a dataView (subset):
means something like (inside/outside), (namedValues/unnamedValues).
One can have multiple views to the same data, for example:
case tview1(char) of ... end;
case tview2(char) of ... end;
This is a valid approach if the case-statement can handle out-of-view:
outside-value goes to else-branch or is ignored (if else-branch is absent).
The else-branch can also contain own checking.
But in fpc, a check is needed before the whole case-statement.
Case-statement means value-checking, but it's less complete in fpc.

***************************
Another difference, already mentioned before:
In Delphi, one can convert

if (cv = lbl1) then do1
else if (cv = lbl2) then do2
else doElse; // filter out

to a case-statement. The old else-branch can
contain own checking, the new else-branch is the same.
(but in fpc, one needs to move a check before the case-statement).

case cv of
lbl1: do1;
lbl2: do2;
else doElse; // filter out
end;

**********************
The main challenge is, which code is used by the application,
which thirdparty code is using fpc incompatible case-statements.
This is not easy to see.

sash

  • Sr. Member
  • ****
  • Posts: 266
Re: implicit domain-check in case-statement
« Reply #6 on: May 25, 2019, 09:07:49 am »
case tview1(char) of ... end;
case tview2(char) of ... end;
This is a valid approach if the case-statement can handle out-of-view:

Of course, this one is not valid.

As I said before, you're converting logically incompatible (default-typecasted for binary compatibility at best) types without range-checking/proper conversion function/or whatever must be done to ensure your "char/pointer"  is actually valid TYourEnumTypeVariable.

Instead, you're proposing such checks should be done elsewhere.
Lazarus 2.0.2 FPC 3.0.4 x86_64-linux-gtk2 -- Ubuntu 19.04 XFCE

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #7 on: May 25, 2019, 11:59:17 am »
Instead, you're proposing such checks should be done elsewhere.

This is the consequence. The case-statement receives a byte value;
after that, the case-statement will deduce something based on the type,
and the Delphi case-statement will decide:
Use the else-branch (or ignore it (if else-branch absent)).

In other words, the case-statement itself is used as the value-checker.
(it's a value-checking construct anyway). And this result is compatible
with the "if .... else if ... else ...". Here the input can be undefined too,
which is supported by fpc.

But the whole point is: I don't propose a short coding style,
I just don't want arbitrary crashes from thirdparty code used in lazarus apps.
That's all I want. The bug reporters confirm the unawareness, not surprisingly.
And making all source-code out there compatible with the fpc case-statement is not realistic.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 659
Re: implicit domain-check in case-statement
« Reply #8 on: May 25, 2019, 02:28:06 pm »
First of all, regarding the FPC behaviour:
* Internally, FPC has always reasoned the same about enumerations and subranges: it assumes that only values within the type range are valid. This allows us to give warnings about incomplete case-statements (added in trunk), perform bitpacking (since then fewer bits will be available, and values outside the declared range would be silently cut off), and also perform various optimisations
* the fact that a particular expression seems to work correctly even when you feed it (as far as FPC is concerned) invalid input, does not mean it correctly handles out-of-range values. You are merely observing a detail of the code generation and optimisation passes in this case. It's (again, as far as FPC is concerned) similar to how writing and reading values past the end of an allocated memory block may seem to work fine, simply because the allocation was rounded up to a multiple of X bytes and you are writing in that padding space at the end. It does not mean the program will keep working in the future, because the padding may change or a checksum may be added by a memory manager.

The above is also what you observe with the case-statement: it's a change in code generation that triggers this particular issue now, but storing out-of-range values in enumerations has always resulted in undefined behaviour in FPC. E.g. in case of inlining and constant propagation, the compiler has also always (well, since a very long time at least) optimized away if-then-else checks when it could prove that you are comparing an enumeration or subrange value with a value that's out of the range. It indeed cannot reason about chained if-then-else statements, but e.g. LLVM can and once we pass on the enumeration range information to it, such constructs will also be optimised out. Specifically regarding the case statement, the trunk compiler now at least warns that the "else" branch of such a case statement is unreachable.

That said, I understand this is an issue with code coming from Delphi. In many cases, we can solve such issues by having different behaviour in FPC and Delphi modes, but here that would make the behaviour only more unpredictable because you can mix code in FPC and Delphi modes in the same program:
* a first option would be to define the behaviour of the enumeration type based on the syntax mode of the unit in which it is defined. This would cause issues since both the RTL and (much of?) the LCL are compiled in mode (obj)fpc, so Delphi code using these units would still work in a different way
* a second option would be to define the behaviour of the enumeration type based on the syntax mode of the current unit. However, this means that if a {$mode objfpc} unit uses an enumeration variable set to an out-of-range value by a {$mode delphi} unit, the FPC unit's code will still break

So there is no way to fix this other than by basically ignoring the range information from the declaration, like Delphi does (it does use the range information to a limited extent, but for the most part it has to ignore it since the variable can contain anything). This makes enumeration types much less useful and less safe. The caveat is indeed that this is only true as long as you don't bypass the type checking of the compiler by explicitly typecasting invalid values, but that goes for any type: accessing a pointer that was set to pointer(1), or an ansistring that was set to ansistring(pointer(1)), cannot be caught by any compiler check either, even though the value 1 can also be represented exactly by a pointer variable. Therefore I would consider this to be a very big loss for the language.

While I understand this is not a trivial solution for existing code since you have to look at all of its enumerations and how they are used, enumerations easily support having an "invalid" value by simply declaring an enumeration value for this purpose. After all, that's also what you do when you manually use an integer in this sense, except that the compiler cannot check that it does not overlap with an actual member of the enumeration.

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #9 on: May 25, 2019, 02:40:54 pm »
Ok, it's your decision. But nevertheless, thanks for all your work.

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #10 on: July 05, 2019, 02:59:28 pm »
I've just read a few mails in the dev list about the same topic and I think, my notes above could benefit from more words (just for completeness, could be useful one day):

Enum variables may contain invalid values (outliers).
Therefore, value-comparing statements (if, case, in, is) should simply continue in the else-branch.
And more general, also convenient: the explicit typecast "case tenum(cv) of ... end",
as I said: tenum as view (selector, filter) of input data (cv).
This is supported in Delphi as well.

The user knows, the enum variable has sizeof( tenum),
which is at least minimum size {$packenum byteCount}.
Only in bitpacked containers, a slim enum (bits as needed for high(tenum))
can be enforced, but this is not relevant to the discussed issue.

According to the slim enum policy in FPC,
calling stream.read( enum, sizeof( tenum)) is unsupported,
because invalidStream.read() can produce an unsupported value (beyond slim enum).
This makes FPC software unsafer than Delphi software, because outliers
can lead to undefined behavior already in basic operations (crash in case statement).
According to the slim enum policy, users need to read the byte untyped:
stream.read( someByte, sizeof( byte)) and check the value later, which can be quite extensive;
much more convenient would be a case-statement like in Delphi.

A slim enum policy is not sufficient, but also the size is defined:
Users need to deal with enum variables and they know the size,
depending on {$packenum byteCount} and high(tenum).
Calling stream.write( enum, sizeof(tenum)) is safe (lossless), 
and srcStream.read( enum, sizeof( tenum)) is a safe (lossless) typecast:
enum := Tenum(src); safe because of (sizeof(enum) = sizeof(src)).

In assign, (sizeof(enum) > sizeof(src)) allows a safe (lossless) typecast too.
But (sizeof(enum) < sizeof(src)) would be downcasting, which truncates higher bits (if existing), thus an invalid enum (with more bytes) can convert to a valid enum (with less bytes), the user should not use such assigns.
The point is, the user knows, independently of compiler version, which typecast is safe, and which not.

And generally, Language should be defined first (for user convenience),
and then optimization is restricted by language.

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #11 on: July 07, 2019, 09:17:31 am »
Another note, because this issue is still open and can be solved easily :)

The case-statement needs to handle undefined values.
First justification: The internal treatment should and can be better than a crash.
Another: Just include this tiny check, to be Delphi compatible.
And if reading time allows, more details:

Regardless of enumsize, the variable may contain an invalid value.
Even in bitpacked storage: tenum (0,1,2), bitpacked to 2 bits,
means we have the undefined value 3 ((True, True), but together invalid).
Only one bit and a "half" bit are defined by tenum.
But in most examples, users have unpacked enums of sizeof(tenum)
and if they want to check the value, they can do it in the next operation.
And the user knowingly abstains from size-reducing conversions prior to the check.
If even in this situation, nontransparent optimizations reverse the value validness
(remove all invalid bits) prior to the check, then we have a compiler mistake.
The compiler has to disable nontransparent optimizations on invalid enums,
because the language has larger priority. If there is no bitpacked container,
then the compiler should just transfer the whole enum (sizeof(tenum)) around.
It could not be simpler.

The existence of undefined values requires treatment:
Undefined behavior for undefined values should mean in our case:
The value-specific behavior is undefined, but defined is a common default handling
for all undefined values in value-testing operations (if, case, in, is)
These operations all check for equality (a specific case of inside) or inside subsets.
It's a law of logic that they should use the else-branch for not-inside.

The variable may contain undefined bits (maybe half bit) outside of type range.
Generally, ALL bits of an enum variable (storage (could be even persistent))
belong to its value, therefore operations need to copy and handle all bits.
(This is like with a valid bytevalue represented by a word with higher zerobits)
For outer values, the operation needs to detect ONLY that the value is not inside, thus outside.
The operation does not know the meaning of the outer value, but only its existence.
That's all what the operation can safely deduce from such typed value.
Such inside-test is fully sufficient for operations like (if, case, in, is).
And users can use this simple ability also for intentional inside/outside tests:

case tenum(src) of ...
The user says, the case statement should receive a value with userdefined sizeof(tenum).
Tenum(src) means, the case statement will silently (unchecked) get the src value.
The user should ensure (sizeof(enum) >= sizeof(src)), to keep all source bits.
The case statement receives the value and the inside test is applied.
« Last Edit: July 07, 2019, 10:59:56 am by nanobit »

Thaddy

  • Hero Member
  • *****
  • Posts: 8973
Re: implicit domain-check in case-statement
« Reply #12 on: July 07, 2019, 01:03:31 pm »
Tenum(src)  is a hard-cast.
Hard -cast means to the compiler: "programmer says he knows better, so let him be...." and all bets are off for that reason....

As already written by others: if you do something silly like that it is always the programmers responsibility..
Most people that want to use threading should learn to patch their jeans first: use a needle.

nanobit

  • New Member
  • *
  • Posts: 40
Re: implicit domain-check in case-statement
« Reply #13 on: July 09, 2019, 09:39:32 am »
This explicit typecast is not knowing better, but defining the parameter type of the case-statement.
And casting integers to enums is supported in Pascal and comparable languages.
If unsupported, then it would be a serious bug.

Even in C# is known, callees have to assume that enum type arguments can be any integer value,
regardless of whether the value is defined in the enumeration. From C# docs:
"The set of values that an enum type can take on is not limited by its enum members."
"Enum values can be converted to integral values and vice versa using type casts."

Enum validation is needed in the callee, especially if the callee is a dll,
which uses an older enum type with less enums (but compatible).
The question is where to validate. Naturally where the user knows
which value he wants or not. The C# switch statement will go to "default:"

Thaddy

  • Hero Member
  • *****
  • Posts: 8973
Re: implicit domain-check in case-statement
« Reply #14 on: July 09, 2019, 01:14:06 pm »
The C# switch statement will go to "default:"
The Pascal case switch goes to else..... Which is the same...
Most people that want to use threading should learn to patch their jeans first: use a needle.