Lazarus

Free Pascal => FPC development => Topic started by: 440bx on March 13, 2019, 06:07:23 am

Title: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 06:07:23 am
Hello,

In 64bit, when using a negative integer index to index into an array, the index isn't loaded into a 64bit register, instead it is loaded into a 32bit register which is then later used as a 64bit register in a load effective address instruction.  This causes the integer index to always be used as if it were a positive displacement.

for the line:
Code: Pascal  [Select]
  1.   LineIdxPtr^[RowIdx][I + 1] := LineAreaPtr;
  2.  
in 32bit, FPC generates :
Code: Pascal  [Select]
  1. // (from GDB) - 32bit code generated  (this works)
  2.  
  3. SignExtend.lpr:132                LineIdxPtr^[RowIdx][I + 1] := LineAreaPtr;
  4. 0040198F 8b1520804100             mov    0x418020,%edx
  5. 00401995 a130804100               mov    0x418030,%eax
  6. 0040199A 8b0d40804100             mov    0x418040,%ecx
  7. 004019A0 8d0482                   lea    (%edx,%eax,4),%eax
  8. 004019A3 8b1510804100             mov    0x418010,%edx
  9. 004019A9 891488                   mov    %edx,(%eax,%ecx,4)      

which works correctly because there is no need to sign extend the integer index RowIdx.

in 64bit, FPC generates:
Code: Pascal  [Select]
  1. // (from GDB) - 64bit code generated (this does NOT work)
  2.  
  3. SignExtend.lpr:132                        LineIdxPtr^[RowIdx][I + 1] := LineAreaPtr;
  4. 0000000100001B02 488b0517e50100           mov    0x1e517(%rip),%rax        # 0x100020020
  5. 0000000100001B09 8b1521e50100             mov    0x1e521(%rip),%edx        # 0x100020030
  6. 0000000100001B0F 48630d2ae50100           movslq 0x1e52a(%rip),%rcx        # 0x100020040
  7. 0000000100001B16 488d04d0                 lea    (%rax,%rdx,8),%rax
  8. 0000000100001B1A 488b15efe40100           mov    0x1e4ef(%rip),%rdx        # 0x100020010
  9. 0000000100001B21 488914c8                 mov    %rdx,(%rax,%rcx,8)
  10.  
in that sequence of code, RowIdx is being loaded into edx, instead of rdx with sign extension.  This causes the instruction
Code: Pascal  [Select]
  1. 0000000100001B16 488d04d0                 lea    (%rax,%rdx,8),%rax
to compute an address which is incorrect.

The attached program works when compiled for 32bit but causes an access violation when compiled for 64bit.

Changing the line
Code: Pascal  [Select]
  1.   LineIdxPtr^[RowIdx][I + 1] := LineAreaPtr;
  2.  
to include a typecast to ptrint or int64:
Code: Pascal  [Select]
  1.   LineIdxPtr^[ptrint(RowIdx)][I + 1] := LineAreaPtr;
  2.  
causes the correct code to be generated.

Just in case, I thought I'd wait for comments before reporting this on Mantis.







Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Jonas Maebe on March 13, 2019, 09:00:40 am
The reason the compiler uses a load without sign extension is that your arrays are declared as having only valid unsigned index types (0..0 and 1..1). I know this is a common way to declare variable sized arrays, but it's still hack and these statements are all invalid (they cause range errors, which means the result is undefined).

The correct way to declare such arrays is to either declare them as array[low(ptrint) div sizeof(elementsize)..high(ptrint) div sizeof(elementsize)], or to use a pointer type to the element size (you can index pointer types like arrays both in FPC and in Delphi).
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 09:06:37 am
The reason the compiler uses a load without sign extension is that your arrays are declared as having only valid unsigned index types (0..0 and 1..1). I know this is a common way to declare variable sized arrays, but it's still hack and these statements are all invalid (they cause range errors, which means the result is undefined).

The correct way to declare such arrays is to either declare them as array[low(ptrint) div sizeof(elementsize)..high(ptrint) div sizeof(elementsize)], or to use a pointer type to the element size (you can index pointer types like arrays both in FPC and in Delphi).
I don't buy that explanation but, leaving that aside.  The fact that the code behaves differently when compiled for 32bit than in 64bit is clearly a problem.   The result should _not_ depend on the bitness.  Either there is a bug when compiling for 64bit or there is a bug when compiling for 32bit.  You choose.

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: PascalDragon on March 13, 2019, 10:05:04 am
No, it's not a problem, because as Jonas said you've entered the realm of undefined behaviour and it's nowhere written that this undefined behaviour needs to be the same on all platforms.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 11:00:51 am
No, it's not a problem, because as Jonas said you've entered the realm of undefined behaviour and it's nowhere written that this undefined behaviour needs to be the same on all platforms.
RowIdx is an integer.  Specifically, a 32bit integer.  If it is going to be moved into a 64bit register then it _must_ be sign extended.  Anything else is incorrect and cannot be justified.

It is not reasonable to claim that is some kind of "undefined behavior".

That is a bug, a nasty one at that, because the result depends on the target bitness. 



Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: BeniBela on March 13, 2019, 12:38:24 pm
Undefined behaviour? In Pascal??  :o So all the crap is copied from C, and the useful things are left off
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 12:54:13 pm
Undefined behaviour? In Pascal??  :o So all the crap is copied from C, and the useful things are left off

In Pascal "undefined behavior" is now even better than in C, C++ and any other language.
Code: Pascal  [Select]
  1. var
  2.   Index32 : int32 = -1;
  3.   Index64 : int64 = -1;
  4. ...
  5. ...
  6. AnArray[Index32] <> AnArray[Index64];
That really takes "undefined behavior" to a new level.   
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: JernejL on March 13, 2019, 02:47:51 pm
For some reason i find this last example absolutely hilarious :) i love it when logic and common sense falls apart.
 
I do think that this should be consistient behavior - and work same on all 64 and 32 bit platforms, at minimum, if this should not be fixed for some reason - this warrants at least a compiler warning that the result is undefined and that either array should be made start with negative range low, or that index variable should be of typecasted to correct (unsigned) type, as such code could be strangely failing on 64 bit platforms and give people porting issues.
 
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Thaddy on March 13, 2019, 03:33:25 pm
Undefined behaviour? In Pascal??  :o So all the crap is copied from C, and the useful things are left off

In Pascal "undefined behavior" is now even better than in C, C++ and any other language.
Code: Pascal  [Select]
  1. var
  2.   Index32 : int32 = -1;
  3.   Index64 : int64 = -1;
  4. ...
  5. ...
  6. AnArray[Index32] <> AnArray[Index64];
That really takes "undefined behavior" to a new level.
Think hex, ffíng idiot. $FFFFFFFFFFFFFFFF<> $FFFFFFFF, of course. In any pattern it is undefined
Also in the GNU compiler suite and -let me guess - Visual studio.
You can add more f's to taste....
$FFFFFFFFFFFFFFFF<> $00000000FFFFFFFF in a well behaved compiler, which al three are...
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 06:53:17 pm
Think hex, ffíng idiot. $FFFFFFFFFFFFFFFF<> $FFFFFFFF, of course. In any pattern it is undefined
Also in the GNU compiler suite and -let me guess - Visual studio.
You can add more f's to taste....
$FFFFFFFFFFFFFFFF<> $00000000FFFFFFFF in a well behaved compiler, which al three are...
Thaddy, you poor thing, at least try to fake some self control.  I'd ask you to think but, that would really be undefined behavior for you.  Maybe someone close to you can hand hold you to -1 = -1 regardless of how it is represented.  If not, consider applying for government help.

I don't want to traumatize you more than you already are but, Delphi Seattle gives the same result in both, 32 and 64bit.  In your case, it is probably necessary to mention that the result is not the same as FPC's and, that's not all, other compilers, unlike FPC, including C and C++ also produce the same result for 32 and 64bit.  Such a coincidence that time and production tested compilers all have the same "undefined behavior".




Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Jonas Maebe on March 13, 2019, 08:27:41 pm
No, it's not a problem, because as Jonas said you've entered the realm of undefined behaviour and it's nowhere written that this undefined behaviour needs to be the same on all platforms.
RowIdx is an integer.  Specifically, a 32bit integer.  If it is going to be moved into a 64bit register then it _must_ be sign extended.
When you index an array with any expression, this expression gets type-converted to the range type of the array. This is also the step at which range checking gets performed (if enabled). So what gets loaded in a 64bit registers is not RowIdx, but implicit_cast(RowIdx, array_range_type). And array_range_type cannot have negative values according to its declaration.

Quote
The fact that the code behaves differently when compiled for 32bit than in 64bit is clearly a problem.
The code gets interpreted the same on both platforms by the compiler (index = unsigned). However, because on 32 bit platforms an address register is only 32 bit long, you don't notice a difference there.

In general, when code contains a range error, it can even behave differently when compiled with a different compiler version, with different optimization settings, or for a different architecture (even if it is also 32/64 bit). This is not done "because we can", but simply because compiler uses the type information it got from the program (and that is also the only information it can use). What we could do, is remove the taking into account of the upper/lower bound of subrange types to determine whether a type is signed or not (and always consider them signed unless the upper bound is > high(longint) on 32 bit or > high(int64) on 64 bit -- which may be what Delphi does, and which means the behaviour would still be different on 32 and 64 bit, but in a different way), but that would require a complete audit of places in the compiler that use is_signed() would have to be checked to see whether they need to be changed. E.g. all of the bitpacking code will need to be changed, ssince if you interpret 0..3 as a signed 2-bit type, then 3 will be "sign-extended" to -1 when loaded from a bitpacked location if you just make is_signed() return true for this type. It may also result in additional (unwanted) warnings about mixing signed/unsigned types in expressions.

Finally, as you undoubtedly know, range errors can be caught with {$r+} or by using the -Cr command line option. C and C++ have no concept of range checking (although recent versions of clang are finally introducing some support for it). They don't even have a real concept of arrays. E.g., arrayvar[1] and 1[arrayvar] mean exactly the same in C/C++, because they define arrayvar[arrayindex] as being equivalent to *(arrayvar + arrayindex). If you use the Pascal equivalent of these expressions expression (declare arrayvar as a pointer and use either arrayvar[arrayindex] or (arrayvar + arrayindex)^, you also get that same behaviour.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 10:25:42 pm
@Jonas

The range checking argument you are putting forward, simply does not work and it is not even applicable.   A range checking argument cannot be put forward to justify turning a signed type into an unsigned type.  That is simply incorrect and, the short demo program in this post proves it conclusively.

When you index an array with any expression, this expression gets type-converted to the range type of the array. This is also the step at which range checking gets performed (if enabled). So what gets loaded in a 64bit registers is not RowIdx, but implicit_cast(RowIdx, array_range_type). And array_range_type cannot have negative values according to its declaration.
If FPC was doing range checking the way you describe then it would _not_ compile the example program shown below. (cases 1 and 2.)


The code gets interpreted the same on both platforms by the compiler (index = unsigned). However, because on 32 bit platforms an address register is only 32 bit long, you don't notice a difference there.
That is not a valid argument.  The index variable is a signed integer.  The compiler cannot decide to turn it into an unsigned type because it's being used to access an array which can be indexed with an unsigned type.  If you apply what you are saying then the compiler should not even accept a signed type as an index for such an array.

simply because compiler uses the type information it got from the program (and that is also the only information it can use).
You can't use an argument that the compiler "uses type information it got from the program" when the compiler is dropping the sign of a signed data type.  The compiler has been told the indexing variable is a signed integer, it cannot simply turn that into an unsigned integer behind the programmer's back.  That is incorrect.

Even with range checking enabled, the compiler cannot simply decide to change the data type(s) the programmer declared.   It can give an error, a warning, a hint or whatever but, no compiler can simply choose to turn a signed type into an unsigned type. 

here is a sample program which illustrates some of the problems FPC's incorrect sign extension handling/range checking causes:

Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2.  
  3. {-----------------------------------------------------------------------------}
  4. { NOTE: this code tested with FPC v3.0.4 and Delphi 10 (Seattle)              }
  5. {-----------------------------------------------------------------------------}
  6.  
  7. program RangeChecks;
  8.  
  9. const
  10.   SPACE = ' ';
  11.  
  12. var
  13.   AnArray : array[0..0] of ansichar;
  14.  
  15.   b       : integer;
  16.   AnInt   : integer;         // 32bit integer
  17.  
  18.   AnInt64 : int64;           // 64bit integer
  19.  
  20. begin
  21.   AnArray[0]  := SPACE;      // no problem here
  22.  
  23.   {$ifdef FPC}
  24.     // -------------------------------------------------------------------------
  25.     // case 1.
  26.     // Delphi emits an error for this expression (as it should)
  27.     // FPC emits a warning but, at least, generates CORRECT code.
  28.  
  29.     AnArray[5]  := SPACE;    // Delphi won't compile this (which is correct)
  30.  
  31.  
  32.     // -------------------------------------------------------------------------
  33.     // case 2.
  34.     // same as above for this case
  35.  
  36.     AnArray[-5] := SPACE;    // nor this but, FPC does. At least, in this case
  37.                              // it generates CORRECT code for it.
  38.   {$endif}
  39.  
  40.   // if FPC did range checking the way it should be done, the above statements
  41.   // would NOT compile.  A warning is not good enough.
  42.   //
  43.   // to FPC's credit, while it compiles those incorrect statements, the code
  44.   // it generates for them is correct. (the compiler did what the programmer
  45.   // told it to do.)
  46.  
  47.   // ---------------------------------------------------------------------------
  48.   // case 3.
  49.   // Delphi emits neither a warning nor an error but, generates correct code.
  50.   // if/when runtime range checking is enabled then the problem will be
  51.   // reported.
  52.  
  53.   // FPC neither emits a warning nor a hint and generates INCORRECT code.
  54.  
  55.   // in this case, range checking should be done at runtime and, only if the
  56.   // programmer requested range checking.
  57.  
  58.   b := 3;
  59.   AnInt := (b * 2) div 3;
  60.  
  61.   AnArray[AnInt] := SPACE;   // FPC generates INCORRECT code (no sign extension)
  62.                              // effectively turning a signed int into an
  63.                              // unsigned int.   That is INCORRECT (64bit) and
  64.                              // cannot be justified with a range-checking
  65.                              // argument.
  66.  
  67.                              // Delphi generates correct code.
  68.  
  69.   // ---------------------------------------------------------------------------
  70.   // case 4
  71.   // same as case 3 but using an int64 instead of an int32
  72.  
  73.   AnInt64 := (b * 2) div 3;  // for this expression FPC generates correct code
  74.   AnArray[AnInt64] := SPACE; // and doesn't complain about it.
  75.  
  76.   // ---------------------------------------------------------------------------
  77.   // CONCLUSION:
  78.  
  79.   // FPC compiles statements that it shouldn't compile (cases 1 and 2)
  80.   // FPC's incorrect handling of sign extension makes
  81.   //
  82.   // AnArray[AnInt] <> AnArray[AnInt64]
  83.   //
  84.   // there is no correct argument that can justify that behavior.
  85.  
  86.  
  87.   writeln('press enter/return to end this program');
  88.   readln;
  89. end.  

I sincerely hope this bug is corrected.  The difference in results between 32bit and 64bit have nothing to do with platform in this case, it has everything to do with the fact that in 64bit, FPC decides to turn a signed type into an unsigned type.

FPC is the only compiler I've run into where the particular type of an integer makes a difference when indexing an array.  For the same ordinal value (in this particular example, obviously limited to the range of an int8), AnArray[int8] should equal AnArray[int16] should equal AnArray[int32] should equal AnArray[int64] and, that has nothing to do with range checking, given the same ordinal value, the result should be the same.

Please, we the users, depend on you guys to fix these bugs.  FPC should do sign extension like every other correct compiler does it.  This FPC bug makes porting C/C++ and Delphi code more difficult, time consuming and riskier than it should be.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: marcov on March 13, 2019, 10:37:27 pm
@Jonas

The range checking argument you are putting forward, simply does not work and it is not even applicable.   A range checking argument cannot be put forward to justify turning a signed type into an unsigned type. 

He doesn't say that. He says you pass an signed value to a range that is only positive. That should generate a runtime error/exception when range checking is on (and also Delphi does so, e.g. change  b:=3 to b:=-3 and add {$R+} and uses sysutils)

If you turn off range checks, you are in undefined territory, and anything may happen. The exact codegeneration depends on target.



If you
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 10:59:38 pm
He doesn't say that. He says you pass an signed value to a range that is only positive. That should generate a runtime error/exception when range checking is on (and also Delphi does so, e.g. change  b:=3 to b:=-3 and add {$R+} and uses sysutils)
He is not saying it as clearly as I have because the range checking argument is obviously not applicable but, it is being used to justify incorrect code generation.

By the way, Delphi does not need range checking enabled to refuse the statements AnArray[5] and AnArray[-5] in the example I provided.   How can range checking be used to justify dropping the sign of a signed data type when it compiles statements where the range is obviously wrong ?, not in a correctly formed argument.

In your own words: when range checking is on , it is totally reasonable and expected to get an error but, not so when range checking is off.  Even when range checking is on, Delphi - nor any other properly functioning compiler - turns a signed type into an unsigned type which is what FPC is doing.

If you turn off range checks, you are in undefined territory, and anything may happen. The exact codegeneration depends on target.
That is not the case at all.  For a value n, whether positive or negative, the memory referenced by array[n] is addressof(array) + (n * sizeof(arrayelement)).  There is no "undefined territory" in any of this. 

The range checking argument is neither here nor there.  FPC is dropping the sign of a signed type.  Even with range checking enabled it should NOT do that.  It could report and error but, only if the programmer enables range checking, not otherwise.


Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: ASBzone on March 13, 2019, 11:05:31 pm
Question...


Code: Pascal  [Select]
  1.     AnArray[5]  := SPACE;    // Delphi won't compile this (which is correct)
  2.  
  3.     AnArray[-5] := SPACE;    // nor this but, FPC does. At least, in this case
  4.                              // it generates CORRECT code for it.
  5.  
  6.   b := 3;
  7.   AnInt := (b * 2) div 3;
  8.  
  9.   AnArray[AnInt] := SPACE;   // FPC generates INCORRECT code (no sign extension)
  10.  


@440bx

For that construct, why does AnArray[number_index] not work when AnArray[variable_index] does?
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Jonas Maebe on March 13, 2019, 11:17:29 pm
When you index an array with any expression, this expression gets type-converted to the range type of the array. This is also the step at which range checking gets performed (if enabled). So what gets loaded in a 64bit registers is not RowIdx, but implicit_cast(RowIdx, array_range_type). And array_range_type cannot have negative values according to its declaration.
If FPC was doing range checking the way you describe then it would _not_ compile the example program shown below. (cases 1 and 2.)
I think you are mixing up range checking and type conversions.

FPC, and Delphi, always insert necessary implicit type conversions, regardless of the state of range checking. They are fundamental part of the language. Without them, most code would not compile at all.

If you enable range checking, then whenever an implicit ordinal type conversion is inserted whereby the target type cannot hold every valid value of the source type, the compiler will insert extra code that checks at run time whether this conversion is in fact valid. However, it will not change the behaviour of the implicit type conversion itself, nor will it insert extra type conversions.

Regarding Delphi giving errors even when range checking is not enabled, while FPC only prints warnings: when this was implemented, the then-current version of Delphi (Delphi 7) did not print anything at all when compiling that program, even when range checking was enabled. We tried to both keep compatibility with Delphi code (i.e., compile the code), and give feedback to the programmer about the issue (print a warning when range checking is disabled, print an error when it is enabled). Delphi's behaviour obviously has changed in the mean time, and maybe we should now also always give an error instead (although you can bet that would also result in complaints and bug reports).

The code gets interpreted the same on both platforms by the compiler (index = unsigned). However, because on 32 bit platforms an address register is only 32 bit long, you don't notice a difference there.
That is not a valid argument.  The index variable is a signed integer.  The compiler cannot decide to turn it into an unsigned type because it's being used to access an array which can be indexed with an unsigned type.
[/quote]
It is a basic property of typed languages: if you use an entity of type A to perform an operation that expects an entity of type B, a compiler will always insert a type conversion. After that type conversion (which may simply reinterpret an existing bit pattern, e.g. when converting an integer to another integer type of the same size, or transform it in a more complex way, e.g. when converting an ansichar into a unicodestring), the result will be seen as an entity of this new type rather than of the original type.

See e.g. https://en.wikibooks.org/wiki/Computer_Programming/Type_conversion for more information about the concept of type conversions.

Quote
  If you apply what you are saying then the compiler should not even accept a signed type as an index for such an array.
It should, because type conversions between different integer types are defined in the language. It is possible for signed types to contain values that are also valid for unsigned types (any value that is 0 or positive). If it contains an invalid value, then the result is undefined, except if you enable range checking (in which case a range check error/exception will be thrown).

Quote
Even with range checking enabled, the compiler cannot simply decide to change the data type(s) the programmer declared.
The programmer has declared two different data types:
* the type of the indexing variable
* the type of the array's range type

When using the indexing variable to index the array, the indexing variable gets converted to the indexing type of the array. That is simply how type conversions work. Also in Delphi. It is even that way in C/C++. This particular scenario with array indexing is, however, irrelevant in C/C++ because there array indexing is simply an alias for pointer arithmetic (unlike in Pascal).

In the end, the issue is not whether implicit converting from signed to unsigned types should be possible. If it weren't, you would not be able to assign a longint to a byte or cardinal variable without a warning/error or explicit typecast (if you want a language that forbids this, have a look at e.g. Ada, but it takes quite a bit of getting used to this rigidity if you come from Pascal).

The issue is about what the index/range type of an array[0..0] should be. If it would be signed rather than unsigned, then your code would work the same as under Delphi. On the other hand, the efficiency of some code that is type correct would be lowered (unnecessary sign extensions), and as mentioned before it would require of the compiler's code to be audited as well.

However, regardless of the above, your code will always be invalid according to any Pascal standard, because you are accessing an array using an invalid index.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 13, 2019, 11:18:41 pm
Question...


Code: Pascal  [Select]
  1.     AnArray[5]  := SPACE;    // Delphi won't compile this (which is correct)
  2.  
  3.     AnArray[-5] := SPACE;    // nor this but, FPC does. At least, in this case
  4.                              // it generates CORRECT code for it.
  5.  
  6.   b := 3;
  7.   AnInt := (b * 2) div 3;
  8.  
  9.   AnArray[AnInt] := SPACE;   // FPC generates INCORRECT code (no sign extension)
  10.  


@440bx

For that construct, why does AnArray[number_index] not work when AnArray[variable_index] does?
Because when a number is used, the programmer has given the responsibility to do range checking to the compiler.  When a variable is used it is the programmer's responsibility unless and until the programmer chooses to relinquish that responsibility by enabling runtime range checking. 

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Martin_fr on March 13, 2019, 11:19:06 pm
He is not saying it as clearly as I have because the range checking argument is obviously not applicable but, it is being used to justify incorrect code generation.

While I actually think, it would be nice to be able to do what your example tries, Jonas is right as far as the current state goes.

Jonas did not say (or mean to say), that the compiler casts the index value to unsigned.
Jonas said that the compiler omits the code for extending the value as signed value.

The compiler knows that the range for the index goes from 0..0 (that is the declaration). None of those values is negative. Therefore signed expansion is never needed, and the code not generated.

As far as the compiler goes: If you then later in your code present any value outside that range, then this is an invalid value and the program does not need to work for incorrect input. More to the point the compiler can not make it work for such a case, because the compiler would not know what the intention is.

That the variable holding the index (which the compiler expects to be 0) is of signed type does not matter. After all it is valid to store 0 in a signed variable.

Declare an array as "foo: array [-1..-1] of bar" and signextension should work.
Unfortunately, then the first element is -1. So if you want to access the element before the declared min-index that becomes -2. (And likely that breaks your code too)

---------------
Now for the case [0..0] it is easy to solve by declaring the variable as "foo: PBar // PBar = ^Bar"

But if you want a 1-based array [1..1], then there is no solution.

So it would be nice, if there was a feature that would allow to write arrays like that, and at the same time tell the compiler what the range should be.
The problem is, if you want the first element to be at 0 (or 1) then you can not have a negative element in the defined index range. Yet that is what you currently need.

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: marcov on March 13, 2019, 11:28:00 pm
That is not the case at all.  For a value n, whether positive or negative, the memory referenced by array[n] is addressof(array) + (n * sizeof(arrayelement)). 

No, since the array[n]  already implies the conversion to the array index range. So even if it were true, it would be

Code: Pascal  [Select]
  1. addressof(array) + (n' * sizeof(arrayelement)).  

with n' the conversion of n to the range of the array

That in some cases, without range checking turned off, that conversion might turn out differently, is unfortunate, but FPC is a compatible compiler, not an emulator for bad code.

Quote
The range checking argument is neither here nor there.  FPC is dropping the sign of a signed type.  Even with range checking enabled it should NOT do that.  It could report and error but, only if the programmer enables range checking, not otherwise.

Rangechecks do not exist for purity contests only. They guard the boundaries of the typesystem, and with reason. Turn them off, and you better know what you are doing.

In this case, the argument to make an exception is even weaker, since code like this is mostly pre D2009 whatever-works-x86-code, since D2009, pointermath allows to solve the problems more logically.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: ASBzone on March 13, 2019, 11:44:36 pm
Because when a number is used, the programmer has given the responsibility to do range checking to the compiler.  When a variable is used it is the programmer's responsibility unless and until the programmer chooses to relinquish that responsibility by enabling runtime range checking.

Interesting.   Thank you kindly for the response.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 14, 2019, 12:07:29 am
@Martin
That the variable holding the index (which the compiler expects to be 0) is of signed type does not matter. After all it is valid to store 0 in a signed variable.
if the argument that has been presented so far held any water then, the compiler should just move zero (or xor a register) and use that as the index value but, we both know it isn't doing that.

IOW, if type conversion is going to be used to justify what FPC is doing then, the only acceptable "conversion" is to set the index variable to zero.

There is no valid and sensible argument to justify what FPC is doing.


@Marco
No, since the array[n]  already implies the conversion to the array index range. So even if it were true, it would be

Code: Pascal  [Select]
  1. addressof(array) + (n' * sizeof(arrayelement)).  

with n' the conversion of n to the range of the array

That in some cases, without range checking turned off, that conversion might turn out differently, is unfortunate, but FPC is a compatible compiler, not an emulator for bad code.
If that were true then the compiler should simply use 0 since that is the only valid index for the array.  Claiming that the compiler should convert the indexing variable to the index type is absurd since there isn't a data type for every possible combination of start/end array range.


@ASBzone
Interesting.   Thank you kindly for the response.
You're most welcome.  It was a good question, I like those. :)


@Jonas
When using the indexing variable to index the array, the indexing variable gets converted to the indexing type of the array.
You propose that statement as an argument to justify dropping the sign of a signed type while invoking Delphi compatibility to justify compiling AnArray[-5].  I should note that {$MODE DELPHI} was not specified in the code to justify that behavior.

One thing that is rather nice in a computer language and, required in most, is consistency.

In the end, the issue is not whether implicit converting from signed to unsigned types should be possible. If it weren't, you would not be able to assign a longint to a byte or cardinal variable without a warning/error or explicit typecast (if you want a language that forbids this, have a look at e.g. Ada, but it takes quite a bit of getting used to this rigidity if you come from Pascal).
Type conversion should be done when necessary, not when unnecessary and particularly not when it is incorrect to do so (as FPC is doing in this case.)

The bottom line is:

1. Delphi does it correctly.  C/C++ do it correctly.  FPC does not. 
2. with Delphi and C/C++ the results obtained in 32bit and 64bit are the same, not so in FPC.

Also part of the bottom line is: 

This discussion is futile.  It has become patently obvious that this bug is here to stay no matter how much effort someone puts into explaining why it is wrong.  Therefore, I am done discussing this subject.

Thank you to all who have participated in the discussion and, for presenting your viewpoint.   I appreciate it in spite of the fact that, unfortunately, the arguments are rather flawed.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Martin_fr on March 14, 2019, 12:22:44 am
@Martin
That the variable holding the index (which the compiler expects to be 0) is of signed type does not matter. After all it is valid to store 0 in a signed variable.
if the argument that has been presented so far held any water then, the compiler should just move zero (or xor a register) and use that as the index value but, we both know it isn't doing that.
Two different issues. The absence of signed extension also holds true for an array declared [0..99999]
And for that the index is not constant.

Quote
There is no valid and sensible argument to justify what FPC is doing.
This discussion is futile.  It has become patently obvious that this bug is here to stay no matter how much effort someone puts into explaining why it is wrong.  Therefore, I am done discussing this subject.

As I said I support the idea of this being changed.

But it is not a bug. It is a missing feature.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: ASerge on March 14, 2019, 08:56:12 pm
In order for the initial program to compile in Delphi (the latest version), need to replace PtrUInt with NativeUInt. To make it work without errors, disable range check and replace Char/PChar with AnsiChar/PAnsiChar.
In order for the program to work correctly in FPC, need to replace Integer with NativeInt (in FPC, SizeInt instead of NativeInt is most often used).

Now for RangeChecks. It is clear that the range check program does not work. But I will notice that with R+ the compiler issues not a warning, but an error. In addition, I have generated code does not match the comment:
Code: Pascal  [Select]
  1.   AnArray[-5] := SPACE; // nor this but, FPC does. At least, in this case
  2.   // it generates CORRECT code for it.
Code: ASM  [Select]
  1. # [27] AnArray[-5] := SPACE; // nor this but, FPC does. At least, in this case
  2.         movb    $32,U_$P$RANGECHECKS_$$_ANARRAY+251(%rip)

Code: ASM  [Select]
  1. # [45] b := 3;
  2.         movl    $3,%eax
  3. # Var AnInt located in register eax
  4. .Ll7:
  5. # [46] AnInt := (b * 2) div 3;
  6.         movl    $2,%eax
  7. .Ll8:
  8. # [47] AnArray[AnInt] := SPACE; // FPC generates INCORRECT code (no sign extension)
  9.         movb    $32,U_$P$RANGECHECKS_$$_ANARRAY+2(%rip)
  10. # Var AnInt64 located in register rax
  11. .Ll9:
  12. # [56] AnInt64 := (b * 2) div 3; // for this expression FPC generates correct code
  13.         movq    $2,%rax
  14. .Ll10:
  15. # [57] AnArray[AnInt64] := SPACE; // and doesn't complain about it.
  16.         movb    $32,U_$P$RANGECHECKS_$$_ANARRAY+2(%rip)
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 14, 2019, 10:33:53 pm
In order for the initial program to compile in Delphi (the latest version), need to replace PtrUInt with NativeUInt. To make it work without errors, disable range check and replace Char/PChar with AnsiChar/PAnsiChar.
In order for the program to work correctly in FPC, need to replace Integer with NativeInt (in FPC, SizeInt instead of NativeInt is most often used).
Yes, the initial program does need some minor modifications to make it run under Delphi (Seattle for me.)

The second example I provided runs under both, FPC and Delphi (Seattle) without modification.

As you pointed out, if the RowIdx is declared as ptrint/nativeint then the code FPC generates is correct (it doesn't have to do sign extension since the programmer already did it for it.)  If RowIdx is a 32bit signed integer in a 64bit program then the code it generates is incorrect (as is more than known by now, it fails to sign extend the 32bit signed integer.)

Just for completeness, I've attached the versions I used to test with Delphi (Seattle)

Lesson: in FPC declare all integers as ptrint or ptruint or, NativeInt and NativeUInt (or some other equivalent such as sizeint.)  No bad surprises that way.

Thank you for looking into the original code and mentioning the modifications it requires to make it work with a recent version of Delphi.


Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: ASerge on March 16, 2019, 12:03:29 am
Just for completeness, I've attached the versions I used to test with Delphi (Seattle)
I suggest a shorter example instead of SigExtend (with same idea)
Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2. {$RANGECHECKS OFF}
  3.  
  4. type
  5.   PCardinalIndexArray = ^TCardinalIndexArray;
  6.   TCardinalIndexArray = array[0..2] of Integer;
  7.   TIntegerIndexArray = array[-1..1] of Integer;
  8.  
  9. const
  10.   A: TIntegerIndexArray = (-1, 0, 1);
  11. var
  12.   P: PCardinalIndexArray;
  13.   Index: Integer;
  14. begin
  15.   P := Pointer(@A[0]);
  16.   Index := 0;
  17.   Dec(Index);
  18.   Writeln(P^[Index]);
  19.   Readln;
  20. end.
It works (showing -1) with FPC 32-bit, Delphi (RIO) 32-bit, Delphi 64-bit, but does not with FPC 64-bit (SIGSEGV).
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 16, 2019, 12:36:22 am
It works (showing -1) with FPC 32-bit, Delphi (RIO) 32-bit, Delphi 64-bit, but does not with FPC 64-bit (SIGSEGV).
I think that is a nice and succinct example.  Thank you Serge.

Unfortunately, it looks like that bug won't be corrected but, at least some people will now be aware that, in FPC, they cannot expect a 64bit program to behave as it does in 32bit.

These "little details" make porting code from other languages (or even other flavors of Pascal) more work than it should be.  Very unfortunate.

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: ASerge on March 16, 2019, 01:32:59 am
It works (showing -1) with FPC 32-bit, Delphi (RIO) 32-bit, Delphi 64-bit, but does not with FPC 64-bit (SIGSEGV).
These "little details" make porting code from other languages (or even other flavors of Pascal) more work than it should be.  Very unfortunate.
When porting to a 64-bit platform, the code sometimes requires conversion. For example, replacing Integer with SizeInt. In this example, this solves the problem.
If you look at RTL, there and it is made.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 16, 2019, 01:42:16 am
When porting to a 64-bit platform, the code sometimes requires conversion. For example, replacing Integer with SizeInt. In this example, this solves the problem.
If you look at RTL, there and it is made.
That's true but, it shouldn't be necessary to change the type of an index.  It is not justifiable that AnArray[int32] <> AnArray[int64] for the same index value still within the range of an int32.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Martin_fr on March 16, 2019, 01:54:57 am
When porting to a 64-bit platform, the code sometimes requires conversion. For example, replacing Integer with SizeInt. In this example, this solves the problem.
If you look at RTL, there and it is made.
That's true but, it shouldn't be necessary to change the type of an index.  It is not justifiable that AnArray[int32] <> AnArray[int64] for the same index value still within the range of an int32.

The problem is that "array [0..0] of" is interpreted as array[cardinal] (or array[qword])
Unfortunately that is even true for "array[integer(0)..integer(0)]".

Again, it would be nice, if at least the latter would be treated different. (or both)
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 16, 2019, 04:22:44 am
The problem is that "array [0..0] of" is interpreted as array[cardinal] (or array[qword])

That by itself is incorrect.  Zero (0) does not have a sign, it is neither positive nor negative.  No correctly implemented compiler can "decide" that zero is positive or negative.  That is simply totally incorrect.

Mathematically:
Quote
An integer is a whole number that can be either greater than 0, called positive, or less than 0, called negative. Zero is neither positive nor negative.

To further aggravate matters, after FPC decides that zero (0) is positive, it forces a variable that has been declared as signed to be unsigned.  That is beyond incorrect. 

And to put a cherry on top of that moldy cake, it only forces the variable to be positive when compiling for 64bit and not for 32bit.

Imagine if in an expression such as:
Code: Pascal  [Select]
  1. var ANumber : integer;
  2.   ...
  3.   ...
  4.   ANumber + 5;
  5.  
FPC decided to "convert" ANumber into an unsigned type because the constant 5 is positive.  THAT is effectively what FPC is doing when openly ignoring and overriding the declared type of the indexing variable.  As obviously ludicrous as doing something like that would be, it would actually have greater justification since, at least the numeral 5, unlike zero (0), is positive.

Again, it would be nice, if at least the latter would be treated different. (or both)
I have to agree, it would be rather nice if FPC did it correctly.  Unfortunately, it looks like neither "nice" nor correct is going to happen in this case.

It seems that, no matter how many different ways I explain why what FPC is doing is simply wrong, it is for naught. What does surprise me is that the majority of mainstream compilers do it correctly which should really clue the developers that their decision to make FPC behave the way it does is, putting it diplomatically, "probably" incorrect.

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Martin_fr on March 16, 2019, 05:07:06 am
Afaik, cardinal is "unsigned" not "positive".

In any case, "cardinal" contains 0. Yes, so does (signed-)integer.
So from that declaration, either can be used. Both will contain all the possible indexes.

So (for the discussions sake) cardinal is chosen (actually the sub-range 0..0 is chosen, and it is chosen as a subrange of cardinal).

Now (with range check, and other checks off)
  SomeCardinal := SomeIntegerVariable // where SomeIntegerVariable = -5
will assign $ffff...fb  as a positive SomeCardinal.

Equally you can also assign numbers to a variable of subrange type, even if the number is outside that range.
  type TNum = 1..5;
  ANum := SomeIntegerVariable // where SomeIntegerVariable = -5
Not tested, but on 64 bit that probably also give $00000000fffffffb
If it does not, that would IMHO be inconsistent.



So using
  SomeArray[SomeInteger]
where the index is assumed (subrange of) cardinal does the same.

In 32 bit the index value is also treated as cardinal.
Only
  SomeAddressOfElemenZero + $fffffffB
is the same as
  SomeAddressOfElemenZero - 5

---------------------
Again, I am not saying that I agree with this design decision. Just pointing out my understanding of the current state.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Martin_fr on March 16, 2019, 05:18:34 am
Imagine if in an expression such as:
Code: Pascal  [Select]
  1. var ANumber : integer;
  2.   ...
  3.   ...
  4.   ANumber + 5;
  5.  
FPC decided to "convert" ANumber into an unsigned type because the constant 5 is positive.  THAT is effectively what FPC is doing when openly ignoring and overriding the declared type of the indexing variable.  As obviously ludicrous as doing something like that would be, it would actually have greater justification since, at least the numeral 5, unlike zero (0), is positive.

It is not the same. 5 is within the range of integer. No conversion needed.
But:
Code: Pascal  [Select]
  1. AnOtherInteger := AnInteger + cardinal($F1234567)
That does not fit into int.

Afaik Fpc converts both numbers into int64, but the result then get truncated into integer.

Of course do the above, but
Code: Pascal  [Select]
  1. AnOtherInt64 := AnInt64 + qword($F123456701234567)
Then afaik, the qword becomes an int64.

Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 16, 2019, 06:28:10 am
Afaik, cardinal is "unsigned" not "positive".
Formally, any value other than zero (0) is signed.  Cardinal/word/dword/qword/etc means a non-negative-only value which causes the interpretation of the high bit to change thereby allowing higher magnitudes and, that interpretation is valid because the compiler has been explicitly told that there are no negative values in the type/range.  It should also be pointed out that non-negative-only includes zero (0) since it is neither negative nor positive.

In any case, "cardinal" contains 0. Yes, so does (signed-)integer.
Int/Int32/int64/integer also contain zero (0), therefore it is not possible to assign a sign to zero since both "unsigned" and signed data types include that value.  It is incorrect to simply choose, or presume, that zero is positive which is what FPC is doing by "concluding" that sign extension should not be done.

In other words, if you see a statement like this:
Code: Pascal  [Select]
  1. A := 0;
It is not possible to state that A is a signed or unsigned data type because zero (0) does not provide enough information to draw a correct conclusion but, FPC is drawing a conclusion from that case and, as expected, gets it wrong because it cannot be done.

So (for the discussions sake) cardinal is chosen (actually the sub-range 0..0 is chosen, and it is chosen as a subrange of cardinal).
0 is also in a subrange of int16/int32/int64/etc, therefore, it is incorrect to presume it is _only_ in a subrange of an unsigned type such as Cardinal.

  SomeCardinal := SomeIntegerVariable // where SomeIntegerVariable = -5
will assign $ffff...fb  as a positive SomeCardinal.

Equally you can also assign numbers to a variable of subrange type, even if the number is outside that range.
  type TNum = 1..5;
  ANum := SomeIntegerVariable // where SomeIntegerVariable = -5
Not tested, but on 64 bit that probably also give $00000000fffffffb
If it does not, that would IMHO be inconsistent.
But in that example, you are declaring a type and in addition to that, you have values other than zero (0) which provide enough information to unambiguously state that the type/range does not include negative values.

That example does _not_ reflect the situation that causes the bug.

Again, I am not saying that I agree with this design decision. Just pointing out my understanding of the current state.
I understand you are not saying you are in agreement with the decision. I am clear on that. 

What I'd like to make people understand is why what FPC is doing is a legitimate bug.  It is simply wrong.  The "current state", as you put it, is simply incorrect. It's wrong.  It's not a matter of choice, being nice, implementation or "undefined territory", it is provably wrong.  Zero cannot be presumed to be positive nor negative, it is neither.  No type information can be derived from it.

HTH.

 
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 16, 2019, 07:12:06 am
Just to leave no doubt that FPC's behavior in that case is a bug, I wrote the following simple program:

Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2.  
  3. program SignOfZero;
  4.  
  5. uses
  6.   Math
  7.   ;
  8.  
  9. type
  10.   TZero   = 0..0;
  11.  
  12. var
  13.   AQword  : uint64 = 0;
  14.   AnInt64 : int64  = 0;
  15.  
  16.   ATZero  : TZero  = 0;
  17.  
  18. begin
  19.   {$ifdef VER300}
  20.     writeln('Sign of AQword  : ', Sign(AQword));  // no overloads in FPC for unsigned
  21.                                                   // types, Delphi has them.
  22.   {$endif}
  23.  
  24.   writeln('Sign of AnInt64 : ',  Sign(AnInt64));
  25.  
  26.   writeln('Sign of ATZero  : ',  Sign(ATZero));
  27.  
  28.  
  29.   writeln('press enter/return to end this program');
  30.   readln;
  31. end.
The result is zero in all cases which means neither signed nor unsigned.  The bug mentioned in the first post is caused by FPC "concluding" that zero (0) is  positive which, I hope it is clear now, is incorrect.

Because of that "conclusion" it fails to sign extend a variable which is explicitly declared as signed. 

HTH.

ETA:

Quote
It is not the same. 5 is within the range of integer. No conversion needed.
but, it is also in the range of a qword yet, FPC didn't decide to "convert" the variable into a qword, which is what it is doing when failing to sign extend an integer because it is assuming a data type for a constant.

It's fine and rather commonplace to assume a data type for a constant, it is done all the time when calculating the value of expressions but, that _assumed_ type cannot be forced onto variables (which is what FPC is doing when failing to sign extend a signed integer when moving it into a 64bit register.) Particularly not when the value is the constant zero (0).
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: Akira1364 on March 28, 2019, 02:43:38 am
I'm not going to comment on all the rest, but in general: when indexing into arrays/pointers/e.t.c, you should always use one of the SizeInt, NativeInt, or PtrInt aliases. Using "integer" (which is really 32-bit longint of course) is not really actually "ok" to do on 64-bit.

I know people do it all the time, and it often works, but issues like this are the precise reason you're not supposed to. Same reason it's a bad idea to use "int" instead of "size_t" in C++, for example. Especially if you're declaring specifically positive index ranges, but going negative with them anyways. (Which isn't particularly advisable either IMO.)

If your program is just changed at line 36 like this, as you know I guess:

Code: Pascal  [Select]
  1.   RowIdx      : SizeInt;         // row index
  2.   I           : SizeInt;         // column index

It works fine, as the indices are then always pointer-size.
Title: Re: Likely incorrect 64bit code generation in FPC v3.0.4
Post by: 440bx on March 28, 2019, 03:38:42 am
If your program is just changed at line 36 like this, as you know I guess:

Code: Pascal  [Select]
  1.   RowIdx      : SizeInt;         // row index
  2.   I           : SizeInt;         // column index

It works fine, as the indices are then always pointer-size.
Yes, that is correct and that is the workaround.  As far as indexing an array, as long as the index is numeric and the compiler works as it is supposed to, doing sign extension on signed types for instance, then the size of the integer in use is irrelevant.

There are countless millions of lines of code out there that use int and integer (C/Pascal), no one is going to change all that code to NativeInt, SizeInt, ptrint or any other int because some compiler has "decided" it won't do sign extension when it should.

I believe the advice you gave is good but, it shouldn't be used to circumvent compiler bugs.  It should be used because if someday we go from 64bit to (presumably) 128bit then, those types automatically get the new size which, for things like handles is rather important.

Sadly, that is a bug that won't be fixed. I am most definitely following that advice, I've been following it since I figured out that sign extension occasionally doesn't agree with FPC.