Recent

Author Topic: Untyped hex literals should be non-negative  (Read 871 times)

nanobit

  • Jr. Member
  • **
  • Posts: 86
Untyped hex literals should be non-negative
« on: August 10, 2020, 02:44:24 pm »
Current situation: Almost all (*) untyped hex-literal values found in source code
are non-negative, because negativity in FPC requires 8 bytes with highest bit (bit63) = 1.
(Further extension would be less consistent: 128-bit literals and 64-bit literals can be negative, others not)
(*) For statistics, if array is treated as one entity:)

I found some time to write a summary about how untyped hex literals should be:

First of all: Use untyped literals only if you assume auto-size (sizeof(constName)).
Untyped hexadecimal literals should represent only non-negative numbers (like in Delphi, C++ and other languages). "Non-negative" refers to the literal ($...) only. Literals with sign prefix (-$...) can be stored as result value or as expression.

The literal $FFFFFFFFFFFFFFFF should be equal to high(uint64) (unlike in FPC3.2),
and the minimal storage type for this large value is uint64, followed by int128 and uint128.
(Large negative number (-$FFFFFFFFFFFFFFFF) would need to be stored in int128.)

Hexadecimal numbers (eg. "$100") and decimal (eg. "256") are at same abstraction level
which is higher (input number) than bitpattern (at bit-operations level).

Fortunately the underlying uint64 bitpattern is also $FFFFFFFFFFFFFFFF,
but theoretically the untyped hex number $FFFFFFFFFFFFFFFF can be represented by multiple bitpatterns (eg. BCD or sufficient floating point types) and number can be read as stored byte-array (lowest abstraction) as well.

Due to the format-equality, the hex literal (value 0..high(uint64)) looks equal to
underlying bitpattern, which has unsigned type or belongs to non-negative part of signed type.
Typecast int64($FFFFFFFFFFFFFFFF) can be used to get reinterpretation (-1).
About typecast (explicit or implicit) to typed target:
Pascal has the convention to use explicit typecast,
other assignments are subject to range checking (error on value outside of target range).

The new concept needs decision about implicit cast from uint64 to int64:
i64 := $FFFFFFFFFFFFFFFF;  // Delphi shows out-of-range warning, but error for all smaller types.
The more consistent (stricter) solution would be:
i64 := int64($FFFFFFFFFFFFFFFF); // explicit typecast always required
i64 := -$1; // is in target range [-$8000000000000000..$7FFFFFFFFFFFFFFF]

And possibly to port older sourcecode: FPC compile demands explicit typecast
on every u64 hex literal (not only in assignment) which fulfills this condition:
(u64 > high(int64)) and ((targetType <> uint64) or (targetType indefinite)).
(u64 > high(int64)) means byte7 (of 7..0) is in range ($80..$FF).

Example of noticeable difference:

i64 := -1;
// in Delphi is:
assert( i64 <> $FFFFFFFFFFFFFFFF);
assert( i64 = int64($FFFFFFFFFFFFFFFF));

// in FPC3.2 is:
assert( i64 = $FFFFFFFFFFFFFFFF);

// Delphi calls overloaded functions for uint64 (instead of int64):
writeSomeInt( $FFFFFFFFFFFFFFFF);

********************************************************
Advice for programmers:
If you have untyped hex literals with (bit63 = 1) in your source code:
[$8000000000000000..$FFFFFFFFFFFFFFFF], then declare them with typecasting
to make your intended interpretation portable between compilers:
int64($Fxxxxxxxxxxxxxxx) // this is interpretation of untyped literal in FPC3.2
uint64($Fxxxxxxxxxxxxxxx) // this is interpretation in Delphi and possibly future FPC.
The untyped literal was auto-size, thus you may use (if value allows) a smaller type:
Example: longint($8xxxxxxx) instead of int64($FFFFFFFF8xxxxxxx)

If you are looking for even more fun:
Negative values could also be represented as expressions:
(-$mag) with (mag := (not negInt64Bitpattern) + 1):
-$1   { = int64($FFFFFFFFFFFFFFFF) = -1}
-$2   { = int64($FFFFFFFFFFFFFFFE) = -2}
-$7FFFFFFFFFFFFFFF  { = int64($8000000000000001) = -2**63 + 1}
-$8000000000000000  { = int64($8000000000000000) = -2**63 = low(int64)}

Warning: The last expression "-$8000000000000000" might not be supported yet everywhere:
Reason: Negate(u64) may technically fail for (u64 > high(int64)).
And as long as untyped $8000000000000000 has negative interpretation,
we would actually need "-uint64($8000000000000000)".

********************************************************
http://docwiki.embarcadero.com/RADStudio/Rio/en/Declared_Constants
Delphi has implied max-type uint64 for untyped hex-literals ($FFFFFFFFFFFFFFFF = high(uint64)),
but certain typed locations also accept untyped negative int64Bitpatterns (warning might occur, but no error):

const i: int64 = $FFFFFFFFFFFFFFFF;  { targettype int64, becomes (-1)}
const a: array[0..3] of int64 = ($FFFFFFFFFFFFFFFF, ...);
int64Var := $FFFFFFFFFFFFFFFF; { targettype int64}
case int64Var of $FFFFFFFFFFFFFFFF:...; else .. end;
for int64Var := 0 downto $FFFFFFFFFFFFFFFF do ...;
writeSomeInt64( $FFFFFFFFFFFFFFFF); // calls the
   int64-function only if all preferred (uint64, float) overloading functions are absent

Reason is the implicit typecast from uint64 to int64 in Delphi:
The input literal is accepted correctly as (-1), but better (stricter) would be:
explicitly typecast them as well or even better, provide range-compliant values:
const i: int64 = -$1; // in target range [-$8000000000000000..$7FFFFFFFFFFFFFFF]

********************************************************
A method to find hex-literals of range [$8000000000000000..$FFFFFFFFFFFFFFFF]:
Lazarus -> Search -> Find in Files:
Enable "Regular expressions" for "Text to find":
\$[8-9a-fA-F][0-9a-fA-F]{15}
« Last Edit: August 18, 2020, 01:14:52 pm by nanobit »

nanobit

  • Jr. Member
  • **
  • Posts: 86
Re: Untyped hex literals should be non-negative
« Reply #1 on: August 12, 2020, 10:48:08 am »
Example with many negative hex literals:
https://bugs.freepascal.org/view.php?id=37554
The hex-literals are treated untyped (default type):
Currently they would remain int64 values if only array type is changed:
const CmpArray: array[0..88] of uint64 = (...);

If old Delphi versions really had int64 literals too (I don't know, cannot check),
then new Delphi made a hard switch (reinterpretation) here:
The untyped hex literals are and remain always uint64 regardless of array type.

const CmpArray: array[0..88] of int64 = (...);
The Delphi programmer is warned about the literals (that they are > high(int64)).
But still, the compiler imports the declared values into the array:
$FFFFFFFFFFFFFFFFF becomes -1 (aka typecast from uint64 to int64).
« Last Edit: August 12, 2020, 03:19:22 pm by nanobit »

Thaddy

  • Hero Member
  • *****
  • Posts: 10526
Re: Untyped hex literals should be non-negative
« Reply #2 on: August 12, 2020, 01:56:14 pm »
As stated before, literal untyped consts are signed by definition.
You can force an untyped const with a cast, though.

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #3 on: August 12, 2020, 03:18:07 pm »
OP does deserve credit for the thoroughness of his bug report. However this is something that has been discussed many times, and even if there were consensus that this should be fixed it's fair to assume that finding all affected areas of the compiler would be non-trivial.

(Tongue firmly in cheek) Of course, it is refreshing to find that the core developers acknowledge that C has the right idea, and that without qualification all types should be considered unsigned.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

nanobit

  • Jr. Member
  • **
  • Posts: 86
Re: Untyped hex literals should be non-negative
« Reply #4 on: August 12, 2020, 09:36:17 pm »
I had only limited time to research on options to move to default uint64.
But my early conclusion is that potential problems are not hiddenly, but all
potential culprits are directly visible (as negative literals) in source code.

The focus of attention must be on
1) in expressions and calls: change unnamed negative
literals $FFFFFFFFFFFFFFFF to: int64($FFFFFFFFFFFFFFFF)
2) change untyped negative constants (const c64 = $FFFFFFFFFFFFFFFF;) to:
const c64 = int64($FFFFFFFFFFFFFFFF);

Other places of $FFFFFFFFFFFFFFFF may also use this explicit typecast,
but they would also work correctly due to implicit typecast (to int64) as in Delphi.
Example: In Delphi, in the previous array example, the array got the right values
without my intervention, in spite of different default (uint64),
and the literals have no other use than for array loading.
I see large potential, but more investigation is needed.

Hopefully further investigations won't dry up before seeing all options.
Delphi also found a solution for this.

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #5 on: August 12, 2020, 09:52:33 pm »
It is not safe to assume that stuff internal to the compiler doesn't use literal-hex flags or bitmaps.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

ASBzone

  • Sr. Member
  • ****
  • Posts: 476
  • Automation leads to relaxation...
    • Free Console Utilities for Windows from BrainWaveCC
Re: Untyped hex literals should be non-negative
« Reply #6 on: August 12, 2020, 10:09:03 pm »
OP does deserve credit for the thoroughness of his bug report.

I concur.

<backs away slowly from the rest of the post>  :-X
-ASB: https://www.BrainWaveCC.com

Lazarus v2.0.11 r64032 / FPC v3.2.1-r47152 (via FpcUpDeluxe) -- Windows 64-bit install w/32-bit cross-compile
Primary System: Windows 10 Pro x64, Version 2009 (Build 19042.572)
Other Systems: Windows 10 Pro x64, Version 2004 or greater

PascalDragon

  • Hero Member
  • *****
  • Posts: 2275
  • Compiler Developer
Re: Untyped hex literals should be non-negative
« Reply #7 on: August 12, 2020, 10:13:35 pm »
(Tongue firmly in cheek) Of course, it is refreshing to find that the core developers acknowledge that C has the right idea, and that without qualification all types should be considered unsigned.

Source please. I can't find anything about any of the core devs acknowledging that in the recent threads.

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #8 on: August 12, 2020, 11:14:41 pm »
(Tongue firmly in cheek) Of course, it is refreshing to find that the core developers acknowledge that C has the right idea, and that without qualification all types should be considered unsigned.

Source please. I can't find anything about any of the core devs acknowledging that in the recent threads.

My recollection (from when I have raised this in the past) is that 64-bit hex without a qword() type transfer is assumed to be signed. If my recollection is correct, I'd suggest that's implicit acknowledgement that C's got it right.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

PascalDragon

  • Hero Member
  • *****
  • Posts: 2275
  • Compiler Developer
Re: Untyped hex literals should be non-negative
« Reply #9 on: August 12, 2020, 11:40:59 pm »
(Tongue firmly in cheek) Of course, it is refreshing to find that the core developers acknowledge that C has the right idea, and that without qualification all types should be considered unsigned.

Source please. I can't find anything about any of the core devs acknowledging that in the recent threads.

My recollection (from when I have raised this in the past) is that 64-bit hex without a qword() type transfer is assumed to be signed. If my recollection is correct, I'd suggest that's implicit acknowledgement that C's got it right.

Yes, a 64-bit hexadecimal pattern is assumed to be signed (that's after all what the recent threads (including this one) are about). But why should that be an implicit acknowledgement that C got it right considering you wrote "that without qualification all types should be considered unsigned" (emphasis mine)?

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #10 on: August 13, 2020, 08:56:25 am »
Come on, I /did/ say tongue in cheek :-)

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

PascalDragon

  • Hero Member
  • *****
  • Posts: 2275
  • Compiler Developer
Re: Untyped hex literals should be non-negative
« Reply #11 on: August 13, 2020, 08:59:46 am »
Come on, I /did/ say tongue in cheek :-)

With written text its not really easy to decide what the sarcastic, etc. part was. Not to mention that despite me having a rather good grasp of written English it is not my mother tongue ;)

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #12 on: August 13, 2020, 09:23:05 am »
Come on, I /did/ say tongue in cheek :-)

With written text its not really easy to decide what the sarcastic, etc. part was. Not to mention that despite me having a rather good grasp of written English it is not my mother tongue ;)

I've just read back and spotted my typo... I did, of course, in a humorous fashion intend to suggest that the FPC developers intended that without qualification numbers should be considered /signed/, rather than unsigned.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

PascalDragon

  • Hero Member
  • *****
  • Posts: 2275
  • Compiler Developer
Re: Untyped hex literals should be non-negative
« Reply #13 on: August 13, 2020, 09:30:57 am »
Come on, I /did/ say tongue in cheek :-)

With written text its not really easy to decide what the sarcastic, etc. part was. Not to mention that despite me having a rather good grasp of written English it is not my mother tongue ;)

I've just read back and spotted my typo... I did, of course, in a humorous fashion intend to suggest that the FPC developers intended that without qualification numbers should be considered /signed/, rather than unsigned.

Ah, that explains that better ;)

MarkMLl

  • Hero Member
  • *****
  • Posts: 1374
Re: Untyped hex literals should be non-negative
« Reply #14 on: August 13, 2020, 09:36:34 am »
Yeah, total "facepalm" on my part... you'll have to excuse me, /modern/ isn't my native language :-)

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

 

TinyPortal © 2005-2018