Print Page - Boolean32

Free Pascal => General => Topic started by: nanobit on March 22, 2019, 05:23:42 pm

Title: Boolean32
Post by: nanobit on March 22, 2019, 05:23:42 pm

https://www.freepascal.org/docs-html/ref/refsu4.html [^]
In the last column, there is the difference between Boolean32 (1) and LongBool (nonZero):
The question is about Boolean32:
Either value 1 is the recommended use case (but not enforced) or wrongly documented or I misunderstand. Or the table refers only to the compiler default True (-1 currently for longbool, but we should not assume this forever)?

My understanding is: The value 1 is created in write-direction (b32 := True), (similar to -1 if bLong := True)
But in the read-direction, any Boolean32 nonZero resolves to True (within targettype).
ord( boolean32) goes from low( dword) to high( dword), while
ord( longbool) goes from low( longint) to high( longint).
The similar behaviour of both types is useful.

But the table shows a difference, which so far I'm unable to confirm,
Here are some "random" checks, which don't help me to confirm the table.

var
bLong: longbool;
b32, b32Copy: boolean32;
b: boolean;

bLong := True;
assert( bLong); // ok
assert( longint(bLong) = -1); // ok
assert( ord(bLong) = -1); // ok
// bLong := 1000; // disallowed type
bLong := longbool(1000);
assert( longint(bLong) = 1000); // ok
b := bLong;
assert( longint(b) = 1); // ok

b32 := True;
assert( b32); // ok
assert( longint(b32) = 1); // ok
b32 := boolean32(1000);
assert( b32); // ok
assert( ord( b32) = 1000); // ok
b := b32; // b has own True
assert( longint(b) = 1); // ok
b32Copy := b32;
assert( longint(b32Copy) = 1000); // ok
bLong := b32; // bLong has own True (-1)
assert( longint(bLong) = -1); // ok
bLong := longbool(1000);
b32 := bLong; // b32 has own True (1)
assert( longint(b32) = 1); // ok

Title: Re: Boolean32
Post by: Jonas Maebe on March 22, 2019, 05:46:21 pm

Pascal-style boolean types are enumeration types declared as (false, true), with a storage size ({$packenum x}) set to 1, 2, 4 or 8. Hence, any value outside the false (0) - true (1) range is undefined.

Some cases where you can notice this:

Code: [Select]

type
  ta = array[boolean32] of byte;
var
  a: ta;
  b: boolean16;
begin
  writeln(sizeof(ta));
  word(b):=high(longint);
  a[b]:=5;  // writing to random memory
end.

Range checking can even not catch this, because explicit type casts disable them (so it cannot be caught during the assignment to b), and indexing operations only perform range checking when needed (when the value/variable you use for indexing has a different type than the range type, while they are the same here).

Similarly,

Code: [Select]

type
  tbparray = bitpacked array[0..7] of boolean8;
begin
  writeln(sizeof(tbparray));
end.

This prints 1, because the compiler knows that the only valid values for boolean8 are 0 and 1, and hence it allocates only one bit to store their value. If any value between 0 and 255 were valid, then the size of that array would have to be 8 bytes (and the same for fields in bitpacked records).

That said: even if you could not write a program right now in which a difference was observable, forcing variables to contain values that are out of range can never be supported, and may in the future always lead to different behaviour (or possibly only different behaviour on different platforms, or if different optimization settings are enabled). The reason is that the compiler uses the type information to optimize the code it generates. If this type information does not corresponds to the actual values that variables can contain, all bets are off.

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 12:28:12 am

Quote from: Jonas Maebe on March 22, 2019, 05:46:21 pm

Range checking can even not catch this, because explicit type casts disable them
<snip>
forcing variables to contain values that are out of range can never be supported, and may in the future always lead to different behaviour (or possibly only different behaviour on different platforms, or if different optimization settings are enabled). The reason is that the compiler uses the type information to optimize the code it generates. If this type information does not corresponds to the actual values that variables can contain, all bets are off.

That is one of the contradictions that causes problems in FPC. Through casting the compiler allows values out of range to be assigned to a data type, consequently it cannot give itself the luxury of later assuming the value is in the declared range of the type, yet it does and, generates code based on that mistaken assumption.

Title: Re: Boolean32
Post by: ASerge on March 23, 2019, 11:46:47 am

Quote from: Jonas Maebe on March 22, 2019, 05:46:21 pm

Code: [Select]
type ta = array[boolean32] of byte; var a: ta; b: boolean16; begin writeln(sizeof(ta)); word(b):=high(longint); a[b]:=5; // writing to random memory end.

Interesting code. On my machine in R+ mode it doesn't even compile. In R- mode, it is compiled, but the variable b is explicitly converted to 1, i.e. the compiler even provide such case:

Code: ASM [Select][+]

...
# Var b located in register ax
.Ll4:
# [11] word(b):=high(longint);
        movw    $65535,%ax
.Ll5:
# [12] a[b]:=5;  // writing to random memory
        orw     %ax,%ax
        setneb  %al
# PeepHole Optimization,var9
# PeepHole Optimization,var1
        andl    $255,%eax
        leaq    U_$P$PROGRAM_$$_A(%rip),%rdx
        movb    $5,(%rdx,%rax,1)

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 11:54:52 am

Quote from: 440bx on March 23, 2019, 12:28:12 am

Quote from: Jonas Maebe on March 22, 2019, 05:46:21 pm
Range checking can even not catch this, because explicit type casts disable them
<snip>
forcing variables to contain values that are out of range can never be supported, and may in the future always lead to different behaviour (or possibly only different behaviour on different platforms, or if different optimization settings are enabled). The reason is that the compiler uses the type information to optimize the code it generates. If this type information does not corresponds to the actual values that variables can contain, all bets are off.
That is one of the contradictions that causes problems in FPC. Through casting the compiler allows values out of range to be assigned to a data type,

That is what explicit typecasting means: ignore all language and compiler checks, just do this. It's more or less the signature of the Borland-style Pascal family: strict type checking on the one hand (most of it at compile time, but some kinds can only be done at run time such as certain range and overflow checks), but at the same time the programmer has very low-level access and can explicitly circumvent/disable the type checking (usually, but not always, for performance reasons). In this second case, it's indeed the responsibility of the programmer that they don't break the type system, or anything else for that matter.

There is no way around this with typed language that at the same time also provides low-level functionality. You can do the same thing without typecasting through inline assembly, pointers, variant records, "absolute", and probably a dozen other language features I'm not thinking of right now. It's just like the compiler assumes that an ansistring always contains either nil or a pointer that points right past the end of a tansirec, even though through all of the aforementioned language features you can put any random data in them. If the compiler cannot assume that data is valid, it cannot generate code, because it has no idea what effect any instruction will have.

FWIW, Delphi will have the same issue with an array[boolean] if you put the value 255 in a boolean variable using one of these language features (they don't have a boolean32 type afaik).

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 11:57:30 am

Quote from: ASerge on March 23, 2019, 11:46:47 am

Quote from: Jonas Maebe on March 22, 2019, 05:46:21 pm
Code: [Select]
type ta = array[boolean32] of byte; var a: ta; b: boolean16; begin writeln(sizeof(ta)); word(b):=high(longint); a[b]:=5; // writing to random memory end.
Interesting code. On my machine in R+ mode it doesn't even compile. In R- mode, it is compiled, but the variable b is explicitly converted to 1, i.e. the compiler even provide such case:
Code: ASM [Select][+][-]
...
# Var b located in register ax
.Ll4:
# [11] word(b):=high(longint);
movw $65535,%ax
.Ll5:
# [12] a[b]:=5; // writing to random memory
orw %ax,%ax
setneb %al
# PeepHole Optimization,var9
# PeepHole Optimization,var1
andl $255,%eax
leaq U_$P$PROGRAM_$$_A(%rip),%rdx
movb $5,(%rdx,%rax,1)

You're right, the code generator actually still contains a workaround for this. That dates back to the 90s when FPC still tried to be code-generator compatible with Turbo Pascal. And because we did that at one point, this hack will probably remain in the compiler forever, making all code that uses booleans for indexing array horribly inefficient forever.

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 12:40:04 pm

Quote from: Jonas Maebe on March 23, 2019, 11:54:52 am

That is what explicit typecasting means: ignore all language and compiler checks, just do this.

And the compiler should respect that but, it doesn't. Instead, for some constructs it decides to do something else which is not what the programmer told it to do. That's a problem.

There is a difference between a compiler doing range checking and a compiler downright imposing some range it arbitrarily chose, particularly when the programmer didn't even ask for range checking.

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 12:52:54 pm

Quote from: 440bx on March 23, 2019, 12:40:04 pm

Quote from: Jonas Maebe on March 23, 2019, 11:54:52 am
That is what explicit typecasting means: ignore all language and compiler checks, just do this.
And the compiler should respect that but, it doesn't.

It does: whenever you perform an explicit typecast, the compiler will at most warn that what you are doing is wrong (unless it has no idea how to perform the explicit typecast at all, like trying to typecast records of different sizes). It won't forbid you to do it. That's all what checks can do: warn, or forbid something.

Quote

Instead, for some constructs it decides to do something else which is not what the programmer told it to do. That's a problem.

There is a difference between a compiler doing range checking and a compiler downright imposing some range it arbitrarily chose, particularly when the programmer didn't even ask for range checking.

I really don't know anymore how else I can explain to you that there is nothing arbitrary about the array example you are so hung up on. Maybe this:

Code: [Select]

type
  arrayrange = 0..0;
  tarray1 = array[arrayrange] of byte;
  tarray2 = array[0..0] of byte;

The above declarations are equivalent. Declaring an array type always implies declaring two types: a range/index type, and the array type itself.

When you index an array, the index will always be converted to the range type. This is not an explicit type conversion, but an implicit type conversion. The compiler will not perform any range checking for this conversion, unless range checking is enabled. However, the conversion itself always happens, regardless of whether the index type is boolean, byte, 0..0 or ansichar. This conversion does the following things:
* it checks for type compatibility between the index you passed and the declared type. This is what causes a compile-time type conversion error if you try to index an array with e.g. an ansistring. If no implicit type conversion would be inserted, such errors would not be caught.
* it performs a range check if range checking is enabled

After that, the resulting value is considered to be a valid value of the range type. In the above example, it is hence assumed to contain a value in the range 0..0. So the compiler will generate code that is valid as long as the index contains a value within this range. At this point, the compiler has no clue any more about what the original type of the index variable was.

Again, this is unrelated to range checking or explicit type casts. And it is not arbitrary, because the compiler is using literally the range type that the programmer told the compiler to use for this array.

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 02:24:05 pm

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

It does: whenever you perform an explicit typecast, the compiler will at most warn that what you are doing is wrong (unless it has no idea how to perform the explicit typecast at all, like trying to typecast records of different sizes). It won't forbid you to do it. That's all what checks can do: warn, or forbid something.

It would be _great_ if it gave at least a warning or downright forbid it instead of silently producing incorrect code.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

I really don't know anymore how else I can explain to you that there is nothing arbitrary about the array example you are so hung up on.

At least we've managed to have a meeting of the minds on that one.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

Maybe this:

Code: [Select]
type arrayrange = 0..0; tarray1 = array[arrayrange] of byte; tarray2 = array[0..0] of byte;The above declarations are equivalent. Declaring an array type always implies declaring two types: a range/index type, and the array type itself.

So far, so good but, as you, yourself pointed out below, one of those types is an implied type, not a declared type and, I'm sure you know I am referring to the implied type of the array range.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

When you index an array, the index will always be converted to the range type. This is not an explicit type conversion, but an implicit type conversion.

This is where things start going awfully wrong. The index has a declared type and the compiler, in the specific example we are both thinking about, chose to override an explicit type declaration with an incorrect data type assumption it made.

There are two things that are awfully wrong with that:

1. an implied type cannot override a declared type. That doesn't make any sense. It's illogical, unjustifiable and unreasonable.
2. a data type cannot be derived from the constant zero. It is beyond wrong for the compiler to incorrectly presume that zero implies an unsigned type. It is mathematically wrong.

and just as bonuses, Delphi, various flavors of C and various flavors of C++ don't override a variable's declared type with what is actually a _presumed_ type and, don't produce a different result depending on the bitness as FPC does. That alone, should be a clear tell-tale sign that something isn't the way it should be.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

The compiler will not perform any range checking for this conversion, unless range checking is enabled.

It did worse than that, it forced a signed data type to become an unsigned type based on the completely incorrect assumption that the constant zero indicates a Pascal unsigned type, which it does not.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

This conversion does the following things:
* it checks for type compatibility between the index you passed and the declared type. This is what causes a compile-time type conversion error if you try to index an array with e.g. an ansistring. If no implicit type conversion would be inserted, such errors would not be caught.
* it performs a range check if range checking is enabled

That's all fine and dandy but, the conversion and the result is driven by the declared types of the _variables_ and in the cases where the constant drives the conversion, it _promotes_ the variables to a superset type e.g int32 to int64. A constant can never demote a variable's declared type which is what FPC did in the specific case we are talking about.

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

In the above example, it is hence assumed to contain a value in the range 0..0. So the compiler will generate code that is valid as long as the index contains a value within this range. At this point, the compiler has no clue any more about what the original type of the index variable was.

The compiler cannot assume that a variable does or does not contain the value that the range denotes. It can check that the variable is in the range if and only if the programmer has told it to do so. Otherwise, the compiler's job is to calculate the address referenced using the old "address(array_variable) + (n * elementsize)" where n is the programmer specified index not some value the compiler decided to use because it mistakenly believes that the constant zero is only a member of the unsigned Pascal datatypes (we both know that zero belongs to both, signed and unsigned types. That alone tears down every argument you've presented to justify that FPC bug.)

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm

Again, this is unrelated to range checking or explicit type casts. And it is not arbitrary, because it the compiler is using literally the range type that the programmer told the compiler to use for this array.

I'm guessing that applies to the Boolean thing this thread is actually about. I have no comment on that.

It would be really nice if the FPC bug, we both know which one, got fixed. I'll throw in a Boolean48 in the deal if that's what it takes.

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 03:52:52 pm

Quote from: 440bx on March 23, 2019, 02:24:05 pm

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm
It does: whenever you perform an explicit typecast, the compiler will at most warn that what you are doing is wrong (unless it has no idea how to perform the explicit typecast at all, like trying to typecast records of different sizes). It won't forbid you to do it. That's all what checks can do: warn, or forbid something.
It would be _great_ if it gave at least a warning or downright forbid it instead of silently producing incorrect code.

This would mean giving a warning for or forbidding any implicit casting between integer types. Every single real-world program would throw hundreds if not thousands of such warnings. As mentioned in the other thread, Ada follows this approach and gives an error whenever one integer type need to be converted to another if the target type cannot represent the entire range of the source type (and you need to use an explicit cast instead). It is a valid approach, but not how Pascal is defined by any official or de facto standard. In Pascal such checks are optional, at run time, through range checking.

Quote

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm
Maybe this:

Code: [Select]
type arrayrange = 0..0; tarray1 = array[arrayrange] of byte; tarray2 = array[0..0] of byte;The above declarations are equivalent. Declaring an array type always implies declaring two types: a range/index type, and the array type itself.
So far, so good but, as you, yourself pointed out below, one of those types is an implied type, not a declared type

The conversion is implicit, but not the type itself or its declaration. Look at the formal definition of an array: https://www.freepascal.org/docs-html/current/ref/refsu14.html . The range type is simply an ordinal type. It is part of the declaration of the array type. The type conversions to the range type are implicit, but the same goes for any time you pass a parameter, assign a value to another variable/fields etc (all of which also have declared types). And in all of those cases, if you pass an out-of-range type, the result is undefined.

Quote

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm
When you index an array, the index will always be converted to the range type. This is not an explicit type conversion, but an implicit type conversion.

This is where things start going awfully wrong. The index has a declared type and the compiler, in the specific example we are both thinking about, chose to override an explicit type declaration with an incorrect data type assumption it made.

You use an expression of Type A in a context that expects Type B. Hence, the expression gets converted to Type B. That is all there is to it, and how it works everywhere in the language, from indexing arrays to passing parameters and assigning values. No assumption gets made, and there are no implicit/explicit type declarations or assumptions.

Quote

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm
The compiler will not perform any range checking for this conversion, unless range checking is enabled.
It did worse than that, it forced a signed data type to become an unsigned type based on the completely incorrect assumption that the constant zero indicates a Pascal unsigned type, which it does not.

It's not the constant zero. It's the subrange type 0..0. These are completely different things. And the compiler assumes that an expression of this type can have no other valid value that 0, which is correct according to the declaration.

This whole discussion reminds me a bit of ruckus that happened when Clang came out, because that one actually actively exploits undefined behaviour: as soon as it can prove that something is undefined according to the C standard, it will usually throw that code and anything that depends on it away. After all, doing nothing is equally undefined as doing something. The reasoning behind this was that it is easier to notice an error when this happens, instead of discovering that an expression behaves slightly different when compiler with a single compiler on a single platform with a particular optimization option.

Other compilers used to be way more lenient as far as undefined behaviour was concerned, so many people were upset when their code broke when compiled with Clang and considered it to be a buggy compiler. In the mean time, however, Clang has become to be regarded as one of the better compilers to use when you want to stomp out code quality issues (I don't claim FPC is the best in class, and it does not try to actively exploit undefined behaviour, but this is just for context). A very good blog post about that is http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html . A similar one should probably be written about what is undefined in Pascal (Pascal and C do not regard the same things as undefined). In fact, once I start adding range information to the LLVM backend of FPC, then when using that code generator a lot more wrong code will break (but correct code will be optimized better). For the record, the existing code generators will stay, you will never be forced to switch to the LLVM backend.

Quote

Quote from: Jonas Maebe on March 23, 2019, 12:52:54 pm
In the above example, it is hence assumed to contain a value in the range 0..0. So the compiler will generate code that is valid as long as the index contains a value within this range. At this point, the compiler has no clue any more about what the original type of the index variable was.
The compiler cannot assume that a variable does or does not contain the value that the range denotes.

On the contrary, it must assume that. It is the whole and only point of type information.

And now I'm going back to actually work on the compiler and make it better (or worse, if you use want undefined behaviour to have a particular defined result, since I'm working on the LLVM backend).

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 04:28:29 pm

Quote from: Jonas Maebe on March 23, 2019, 03:52:52 pm

On the contrary, it must assume that. It is the whole and only point of type information.

I was hoping that since you had had a few days to think about it, you might have at least some doubts (and you should!!)

I give up but, it's obvious that the compiler is incorrectly assuming that the numeral zero indicates an unsigned data type, that is mathematically incorrect. No way around it. Demoting a declared variable from signed to unsigned, that's nothing less than an atrocity.

But, that is the way it is. Unfortunately!.

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 04:47:22 pm

Quote from: 440bx on March 23, 2019, 04:28:29 pm

Quote from: Jonas Maebe on March 23, 2019, 03:52:52 pm
On the contrary, it must assume that. It is the whole and only point of type information.
I was hoping that since you had had a few days to think about it, you might have at least some doubts (and you should!!)

The Pascal standard (http://pascal-central.com/iso7185.html#6.4.3.2%20Array-types) says I shouldn't, especially regarding arrays:

Quote

Let i denote a value of the index-type ; let V_i denote a value of that component of the array-type that corresponds to the value i by the structure of the array-type ; let the smallest and largest values specified by the index-type be denoted by m and n, respectively ; and let k = (ord(n)-ord(m)+1) denote the number of values specified by the index-type ; then the values of the array-type shall be the distinct k-tuples of the form

(V_m ,...,V_n ).

NOTE -- 2 A value of an array-type does not therefore exist unless all of its component-values are defined. If the component-type has c values, then it follows that the cardinality of the set of values of the array-type is c raised to the power k.

While TP/Delphi/FPC-style Pascal is not the same as ISO Standard Pascal, the definition of how regular arrays behave is still the same.

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 05:33:49 pm

Quote from: Jonas Maebe on March 23, 2019, 04:47:22 pm

The Pascal standard (http://pascal-central.com/iso7185.html#6.4.3.2%20Array-types) says I shouldn't, especially regarding arrays:

Quote
Let i denote a value of the index-type ; let V_i denote a value of that component of the array-type that corresponds to the value i by the structure of the array-type ; let the smallest and largest values specified by the index-type be denoted by m and n, respectively ; and let k = (ord(n)-ord(m)+1) denote the number of values specified by the index-type ; then the values of the array-type shall be the distinct k-tuples of the form

(V_m ,...,V_n ).

NOTE -- 2 A value of an array-type does not therefore exist unless all of its component-values are defined. If the component-type has c values, then it follows that the cardinality of the set of values of the array-type is c raised to the power k.

Rather basic and plain vanilla stuff that doesn't support or justify your argument. However, what is not stated is interesting. Among them, the compiler being free to _demote_ variable types.... assuming that the range is of a particular type or "assigning" it one. Also conspicuously missing is that it does not say that zero is a member of positive types only. IOW, there are quite a few things in your flawed argument that the standard (unsurprisingly) does not "address".

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 05:47:32 pm

Quote from: 440bx on March 23, 2019, 05:33:49 pm

Quote from: Jonas Maebe on March 23, 2019, 04:47:22 pm
The Pascal standard (http://pascal-central.com/iso7185.html#6.4.3.2%20Array-types) says I shouldn't, especially regarding arrays:

Quote
Let i denote a value of the index-type ; let V_i denote a value of that component of the array-type that corresponds to the value i by the structure of the array-type ; let the smallest and largest values specified by the index-type be denoted by m and n, respectively ; and let k = (ord(n)-ord(m)+1) denote the number of values specified by the index-type ; then the values of the array-type shall be the distinct k-tuples of the form

(V_m ,...,V_n ).

NOTE -- 2 A value of an array-type does not therefore exist unless all of its component-values are defined. If the component-type has c values, then it follows that the cardinality of the set of values of the array-type is c raised to the power k.
Rather basic and plain vanilla stuff that doesn't support or justify your argument.

It says that only the array elements defined by its index-type exist. So in an array[0..0], only element 0 exist. Addressing a non-existent element in an array is an error.

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 06:20:41 pm

Quote from: Jonas Maebe on March 23, 2019, 05:47:32 pm

It says that only the array elements defined by its index-type exist. So in an array[0..0], only element 0 exist. Addressing a non-existent element in an array is an error.

In that case, there are more bugs in FPC as the code snippet below demonstrates

Code: Pascal [Select][+]

]{$APPTYPE CONSOLE}
program RangeChecks;
 
const
  SPACE = ' ';
 
var
  AnArray : array[0..0] of ansichar;
 
begin
  AnArray[0]  := SPACE;      // no problem here
 
  {$ifdef FPC}
    // -------------------------------------------------------------------------
    // case 1.
    // Delphi emits an error for this expression (as it should)
    // FPC emits a warning but, at least, generates CORRECT code.
 
    AnArray[5]  := SPACE;    // Delphi won't compile this (which is correct)
 
 
    // -------------------------------------------------------------------------
    // case 2.
    // same as above for this case
 
    AnArray[-5] := SPACE;    // nor this but, FPC does. At least, in this case
                             // it generates CORRECT code for it.
  {$endif}
 
etc
 

Based on your last statement compiling case 1 and case 2 are errors yet, FPC compiles them (and it should not), it does emit a warning, that's nice but that doesn't justify compiling those erroneous statements.

It should also be noted that, while FPC in the above cases emits a warning, when it decided to unceremoniously mutilate a signed variable into an unsigned one, it did so silently, no warning and to compound the problem it did attempt to access the array element with that "mutated" variable, which since it no longer represented the correct value, caused an exception.

You can't have it both ways. If indexing out of range is an error then FPC should not compile the statements in the code snippet above. If it compiles them then it should produce code that is semantically correct, in this case, address(array) + (n * elementsize).

To its credit, even though it did compile those incorrect statements, it generated semantically correct code but, the warnings don't mitigate the fact that it should _not_ have compiled those statements.

Since you justify _forcing_ variables to be in the declared range of an array, how do you justify that FPC did _not_ turn the indexing numerals 5 and -5 into zero ? or is mutilation a "feature" reserved for variables only ?

It seems FPC is rather choosy and unpredictable when it comes to complying with Pascal standards.

Title: Re: Boolean32
Post by: Jonas Maebe on March 23, 2019, 07:06:03 pm

Quote from: 440bx on March 23, 2019, 06:20:41 pm

Quote from: Jonas Maebe on March 23, 2019, 05:47:32 pm
It says that only the array elements defined by its index-type exist. So in an array[0..0], only element 0 exist. Addressing a non-existent element in an array is an error.
In that case, there are more bugs in FPC as the code snippet below demonstrates.

You are correct that FPC does not generate compile-time errors in all cases when it detects undefined behaviour. I explained why in the other thread: compatibility with older Delphi-versions. This will probably be changed into an error at some point, at the very least for the delphiunicode mode (which corresponds to Delphi 2009+) and for the FPC/ObjFPC modes.

Quote

You can't have it both ways. If indexing out of range is an error then FPC should not compile the statements in the code snippet above. If it compiles them then it should produce code that is semantically correct, in this case, address(array) + (n * elementsize).

As explained by the Pascal standard quoted above, that is not the semantics of the array indexing operation in Pascal. It's the semantics of array indexing in C/C++.

Quote

when it decided to unceremoniously mutilate a signed variable into an unsigned one, it did so silently, no warning

Why do you keep repeating this? I must have answered with the explanation about how type checking and type conversions work, the fact that your programs would get hundreds of warnings if we did warn in such scenarios, the comparison with Ada, and the concept of range checking, about five times by now. The program will not be silently go wrong if you enable range checking. That is the whole and sole purpose of range checking: emit errors at run time for cases that cannot be checked at compile time.

Quote

Since you justify _forcing_ variables to be in the declared range of an array, how do you justify that FPC did _not_ turn the indexing numerals 5 and -5 into zero ? or is mutilation a "feature" reserved for variables only ?

Indexing an array out of its bounds is an error. The result of erroneous code is undefined. Turning such indices into zero would be therefore be "correct". Not turning them into zero is equally correct (that's why Delphi's behaviour is also correct). It's the difference between undefined (anything can happen at any time) and implementation-defined (the compiler can choose what happens, but it needs to be consistent). Undefined behaviour is unpredictable by its very nature.

That said, I fully agree consistent behaviour trumps inconsistent behaviour any day, regardless of what the standard allows. It's inherent to trying to be compatible to undefined behaviour of another compiler (which we tried in the past, hence the allowed out-of-range constants), and why it's such a bad idea (people will expect that you'll stay compatible to it in all possible forms and future versions, which is perfectly understandable but equally untenable).

Title: Re: Boolean32
Post by: 440bx on March 23, 2019, 10:40:07 pm

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

You are correct that FPC does not generate compile-time errors in all cases when it detects undefined behaviour. I explained why in the other thread: compatibility with older Delphi-versions. This will probably be changed into an error at some point, at the very least for the delphiunicode mode (which corresponds to Delphi 2009+) and for the FPC/ObjFPC modes.

You can't quote standards then wrap them in a sheet of Delphi compatibility to justify bugs. Obviously no standard can be used to redefine proven mathematical theory and justify the totally erroneous decision that zero (0) is a positive-only number.

In the particular case we are talking about the only appropriate error message the compiler could emit would be something along the lines of "Error: per creative interpretation of a Pascal standard, the compiler is going to put your 32bit signed variable into a 64bit register without extending the sign because zero (0) is a member of Pascal unsigned types."

At the very least, emphasis on least, FPC should not compile a statement indexed by a constant that is out of range. That, is a bug and it cannot be hidden under the Delphi compatibility rug because Delphi doesn't compile it and it obviously violates the Pascal standard too.

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

As explained by the Pascal standard quoted above, that is not the semantics of the array indexing operation in Pascal. It's the semantics of array indexing in C/C++.

The standards passage you quoted does not say anything about how arrays and their elements are arranged in memory and how the address of said elements are determined. Its main point is the determination of the number of elements in the array and their applicable range. I will note too, that it doesn't say anything at all about the type of the range, which you have been using to justify that bug in FPC (or more accurately and worse, claim it is not a bug.)

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

Why do you keep repeating this? I must have answered with the explanation about how type checking and type conversions work, the fact that your programs would get hundreds of warnings if we did warn in such scenarios, the comparison with Ada, and the concept of range checking, about five times by now. The program will not be silently go wrong if you enable range checking.

I keep repeating it because you are grossly misusing range and type checking to justify FPC's incorrect code generation. An array whose range consists of constants cannot always be assigned a type unambiguously and, when the range is 0..0 then no type can be assigned to it. Assigning a type to it, is mathematically incorrect, forcing that incorrect type onto a variable that _does_ have a declared type is beyond incorrect, it's dismal.

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

That is the whole and sole purpose of range checking: emit errors at run time for cases that cannot be checked at compile time.

The compiler didn't do any range checking (it wasn't asked to do any either.) but, it turned a 32bit signed integer into a 64bit unsigned integer. No definition of range checking covers that kind of behavior.

If the compiler had generated code and compared the value (after properly sign extending the 32bit value into the 64bit register) against zero (0), that would be range checking. Mutilating variables is not range checking.

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

Indexing an array out of its bounds is an error.

That almost sounds reasonable but, then the existence of open arrays is an error in itself. Those array don't even have a declared range much less a type associated with it. To make matters even worse, their number of elements can vary at runtime. That definitely violates the standards passage you quoted.

And if it is an error then FPC should not compile a reference to an array with a declared range that is using a constant that is out of the range, yet it does (and as stated before, Delphi doesn't.)

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

The result of erroneous code is undefined. Turning such indices into zero would be therefore be "correct". Not turning them into zero is equally correct (that's why Delphi's behaviour is also correct). It's the difference between undefined (anything can happen at any time) and implementation-defined (the compiler can choose what happens, but it needs to be consistent). Undefined behaviour is unpredictable by its very nature.

You cannot, as you are attempting to, claim that AnArray[SomeIndex] is erroneous code. If it were "erroneous" as you pretend it is then, indexing open arrays is erroneous (by the definition you are attempting to slide, not mine.) With a "normal" array, it simply may or may not be correct. The compiler's job is to compute the address referenced _correctly_, emphasis on _correctly_ which FPC fails to do. And it fails to do that, not because it didn't generate the correct code to calculate the address but, because it mutilated the indexing variable.

Quote from: Jonas Maebe on March 23, 2019, 07:06:03 pm

It's inherent to trying to be compatible to undefined behaviour of another compiler (which we tried in the past, hence the allowed out-of-range constants), and why it's such a bad idea (people will expect that you'll stay compatible to it in all possible forms and future versions, which is perfectly understandable but equally untenable).

You can't hide that bug behind the incorrect claim of "undefined behavior". If it were "undefined" as you pretend it is, why do most compilers not only get it right but, do it the same way.

FPC had a simple task, take the address of the array add to that (index * elementsize) and it would have gotten it right had it not "decided" to turn a signed type into an unsigned type thereby generating an address that was not in the range of the array's allocated memory.

When a compiler generates the wrong address for an array element, that is a bug. That's what FPC does in that case and there is nothing "erroneous" in the statement "AnArray[AnIndex]", that is, until FPC decides to chop the index's sign because the index range is "positive" (great, got to inform mathematicians around the world that zero is positive and correct "a few" books - no doubt Carl Friedrich Gauss would have been impressed.)

All that said, it's as unfortunate as it is evident that the bug is here to stay.

Let's agree to disagree.

Title: Re: Boolean32
Post by: ASBzone on March 24, 2019, 09:01:09 pm

Quote from: 440bx on March 23, 2019, 06:20:41 pm

Quote from: Jonas Maebe on March 23, 2019, 05:47:32 pm
It says that only the array elements defined by its index-type exist. So in an array[0..0], only element 0 exist. Addressing a non-existent element in an array is an error.
In that case, there are more bugs in FPC as the code snippet below demonstrates .....

Doesn't just simply fall into the category of undefined behavior? As in: cannot be relied upon in any specific way, because the output is not defined?

Title: Re: Boolean32
Post by: ASBzone on March 24, 2019, 09:23:58 pm

Quote from: 440bx on March 23, 2019, 06:20:41 pm

Based on your last statement compiling case 1 and case 2 are errors yet, FPC compiles them (and it should not), it does emit a warning, that's nice but that doesn't justify compiling those erroneous statements.
...
It seems FPC is rather choosy and unpredictable when it comes to complying with Pascal standards.

With range checking on, FPC does not compile that.

Title: Re: Boolean32
Post by: 440bx on March 24, 2019, 11:52:16 pm

Quote from: ASBzone on March 24, 2019, 09:23:58 pm

With range checking on, FPC does not compile that.

That demonstrates that FPC is mixing apples and oranges. Range checking isn't meant to change the compile time data type of a variable. It's purpose is to tell the compiler to generate code to verify that a _variable_ is in the specified range at _run time_ not compile time (since it obviously cannot do it at compile time when a variable is used). Pascal's strong type checking is neither associated nor dependent in any way on any runtime feature, such as range checking.

In a definition such as:

Code: Pascal [Select][+]

type
  TRange = 0..0;

and in any range definition in Pascal, the lower and upper bounds are known and they limit the acceptable values of the underlying type, which the compiler is forced to _assume_. In this particular example, it is forced to assume the underlying type is: integer.

now, in a statement such as:

Code: Pascal [Select][+]

AnArray[-5]

that's, first and foremost, a data type violation. No correctly implemented Pascal compiler would compile that. Range checking is not necessary for the compiler to determine there is a data type violation there. That is as incorrect as

Code: Pascal [Select][+]

AnArray["somestring"]

Now when using a variable instead of a constant, in a statement such as:

Code: Pascal [Select][+]

var
  AnIndex : integer;
  ..
  ..
  AnArray[AnIndex]

the type of AnIndex is assignment compatible with the compiler _assumed_ type of the range, therefore, the statement is perfectly valid. Of course, when using a variable, the compiler does not have enough information at compile time to enforce the range boundaries. That's when range checking comes into play, when enabled the compiler can generate code to verify that the variable, in addition to being assignment compatible - which it checked at compile time against the _assumed_ underlying type of the range - holds a value that is in the range.

The crucial thing to understand is that, in no way does that change the code that is generated. The compiler generates the same code to calculate what address the expression refers to and, when range checking is enabled, it also generates code to enforce the range boundaries. The claim that the code is erroneous when using a variable is false but it is true when using a constant that is not in the range yet, FPC compiles it, that by itself is a compiler bug.

In that specific case, FPC compiles code which is erroneous (AnArray[-5]) and generates erroneous code for a valid Pascal statement (AnArray[AnIndex]).

Quote from: ASBzone on March 24, 2019, 09:01:09 pm

Doesn't just simply fall into the category of undefined behavior? As in: cannot be relied upon in any specific way, because the output is not defined?

The claim that there is something undefined is yet another instance of mixing apples and oranges. The result, when the variable is not in the range is undefined but, there is nothing undefined as far as code generation is concerned.

An example will make this clearer

Code: Pascal [Select][+]

var
  AShortInt : shortint;
  AnInteger, AnotherInteger : integer;
 
begin
  AnInteger      := 240;
  AnotherInteger := 240;
  AShortInt := AnInteger + AnotherInteger;
 
  etc
 

The statement

Code: Pascal [Select][+]

  AShortInt := AnInteger + AnotherInteger;

isn't wrong/erroneous or invalid in any way. The compiler generates code to add the two variables AnInteger + AnotherInteger and stores the result in AShortInt. The one thing that is legitimately undefined is what the result is going to be since the sum causes an overflow but, ensuring there is no overflow is the programmer's responsibility, not the compiler's and, much less a justification for the compiler to get "creative" and truncate the values stored in AnInteger and AnotherInteger to force the result to fit in a shortint.

When overflow checking is off, the result will be whatever bit pattern is generated by the sum truncated to a shortint but, the compiler does not generate different code when overflow checking is on. It simply adds instructions to verify that the result is in the range of the variable and, it does so only if the programmer requests it.

That's exactly what it should do - and rather unfortunately isn't doing - when indexing an array with a variable.

When variables are used to access memory, it is the programmer's responsibility to ensure that those references point to the location they are supposed to reference. It is most definitely not the compiler's prerogative to mutilate variables to force-fit references in a range it incorrectly determined.

Unfortunately, nothing short of a miracle is going to cause that bug to be corrected. Maybe the way to have it fixed is go to church or Lourdes.

Lazarus

Free Pascal => General => Topic started by: nanobit on March 22, 2019, 05:23:42 pm