Recent

Author Topic: Type inference and constant values  (Read 3747 times)

440bx

  • Hero Member
  • *****
  • Posts: 3944
Type inference and constant values
« on: August 06, 2020, 09:22:59 pm »
Hello,

I was reading the thread https://forum.lazarus.freepascal.org/index.php/topic,50896.msg372511.html#msg372511 which is now locked and, wanted to provide an explanation that hopefully makes it clear why the compiler behaves the way it does, that is requiring an _apparently_ superfluous typecast.

consider this:
Code: Pascal  [Select][+][-]
  1. const
  2.   ACONTANT = $FF;
  3.  
  4. var
  5.   Variable : integer = 100;
  6.  
  7. begin
  8.   Variable := Variable + ACONSTANT;
  9. end.
Should the result of that expression be 355 (100 + 255) or 99 (100 - 1) ?... without the compiler knowing the type associated with the constant, the expression is ambiguous and there is no way for the compiler to resolve the ambiguity.  The problem is, there is no way to decide if $FF should be interpreted as signed or unsigned, there is simply not enough information in $FF to unambiguously make the decision.

Now, what happens if instead of a constant, the declaration was a typed variable ? ... like this:
Code: Pascal  [Select][+][-]
  1. var
  2.   AByte     : byte = $FF;
  3.  
  4.   Variable  : integer = 100;
  5.  
  6. begin
  7.   Variable := Variable + AByte;
  8. end.
In this case, there is _apparently_ no problem.  AByte should be 255 but, there is still a problem.  The problem is that in the constant $FF there is simply no type information, it could be either as -1 (which would be a range error in this case)  or 255.

The different way of seeing what $FF is, occurs in the scanning portion of the compiler, _not_ the parser.  When the scanner finds $FF, it has to _somehow_ return that value to the parser.  In the case of FPC, it picks an integer type (IntXX/int32/int64) and tells the parser that there is a "symbol_numeral, value = -1".  IOW, the parser gets whatever type the scanning portion of the compiler decided to use to represent that bit sequence which, as the example above shows, isn't necessarily the desired interpretation (in the example above, -1 is a range error since a "byte" type cannot accommodate a negative value.)

It's important to realize that the scanner doesn't parse.  The scanner has _no_ clue whatsoever that the type is "byte".  All it knows is that there is a hex constant in the source and that it has to produce and send an "equivalent" value to the parser.  FPC's implementation picks "integer", produces the equivalent value based on that interpretation and hands that over to the parser. 

Hopefully, that makes it understandable why the additional, apparently superfluous, cast is necessary.

HTH.




(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Type inference and constant values
« Reply #1 on: August 06, 2020, 09:40:26 pm »
And once again. It should be adapting to the type on the left side.
Uutype constants is just that, untyped and when there is no guidance from the left it should default to what the compiler wants it and that is a int.

 Its basic logic and Delphi gets it right every time.

 Thats OK, I think I am giving up on it anyways.

 If only Lazarus could compiled and used the Delphi compiler it would be perfect.

The only thing missing in the delphi complier is the ability to inherent a helper.

Mean while a lot of time at coding has been using delphi. Oh well

The only true wisdom is knowing you know nothing

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Type inference and constant values
« Reply #2 on: August 06, 2020, 09:56:23 pm »
And once again. It should be adapting to the type on the left side.
The problem is, the scanner cannot adapt to the type on the left side because the scanner doesn't have a clue what type $FF is.

There is a way of handling this problem (I don't know, if and how, Delphi does it) but, one way to solve it is for the scanner to emit a "symbol_bitsequence_<size>, value = bitsequence", that allows the parser, which knows the type (it parsed it) to interpret that bit sequence correctly.

That works BUT, it requires the compiler to have an "internal type" that is not part of the language itself.  It also affects the algorithms used for type compatibility and the operators in use in an expression.  For instance a "byte_sequence" would be valid in an expression that does binary operations, e.g, "a and b and $FF"  but would not work in integer expressions because it is not possible to compute the expression's value correctly without having an integer type associated with the $FF constant.

Interestingly, FPC sees $FF (and its longer siblings) in an expression as an unsigned constant.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Type inference and constant values
« Reply #3 on: August 07, 2020, 01:58:47 am »
There would be a few better solutions (at least in my oppinion) to that problem

1. Literals like in C and C++, where $FFU would be unsigned, $FFUL would unsigned long and so on. Then it can be defined that constants are always signed 32 bits except they have a postfix identifying them as otherwise

2. Use another intermediate representation for the lexer and let the parser decide. I think this is how LLVM handles it, internally they simply use arbitrary precision integers and do the type casting in the very end when they have all the neccesary information (also has something to do with optimization I guess)

3. Simply treat everything without a - as positive. $FF would always be 255 because if you wanted -1 you would have written -1. Which is in my oppinion the most intuitive solution, because this way, if you write a positive number you get a positive number and if you want a negative number you write a negative number

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Type inference and constant values
« Reply #4 on: August 07, 2020, 04:44:01 am »
There would be a few better solutions (at least in my oppinion) to that problem
There is always more than one way to skin a cat but, the solutions you propose all have the same deficiency, which is, they all assume that the hex literal will be used as a number (signed or unsigned) instead of possibly as a bitmask. 
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Type inference and constant values
« Reply #5 on: August 07, 2020, 10:05:38 am »
440bx: and on top of that (your first post) the computation must happen in a common type. And if one of them is signed, there is no common type for the highest integer+unsigned integer.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Type inference and constant values
« Reply #6 on: August 07, 2020, 01:32:02 pm »
There is always more than one way to skin a cat but, the solutions you propose all have the same deficiency, which is, they all assume that the hex literal will be used as a number (signed or unsigned) instead of possibly as a bitmask.

But bitmasks are numbers. Take for example my second example. In this case any value would be given from the lexer to the parser as an arbitrary precision integer. The parser then can look at the target type and create a typed value from this. This way $FF would be an unsigned 8 bit integer when the context is byte, a signed 8 bit integer when the context is smallint, etc.

The first approach makes it even easier, you just define for every possible type a suffix, so the programmer has full control over which type is used.

The last approach would in the end boil down to the second approach but would allow for error checking like trying to put $FF in a signed 8 bit integer would be a range check error.

All in all, the fact that you needed to write this post to explain the behaviour of the fpc seems to me as an indicator that the system as it is currently is pretty bad and unintuitive

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Type inference and constant values
« Reply #7 on: August 07, 2020, 02:10:36 pm »
But bitmasks are numbers.
They _can_ be interpreted as numbers but, doing so is often not convenient (and/or problematic) and, in addition to that, whenever they are interpreted as numbers they must be assigned either a signed or unsigned type and, there are always cases where the sign assigned to it is not the desirable one (or convenient one.)  Even though a bitmap can and, very often is, expressed as a number, a bitmap is a structural map, not really a number.

All in all, the fact that you needed to write this post to explain the behaviour of the fpc seems to me as an indicator that the system as it is currently is pretty bad and unintuitive
It's not just FPC that behaves in somewhat less than desirable ways when it comes to compiler constants.  Pascal's abilities to define and manipulate compiler constants is not particularly stellar.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

process_1

  • Guest
Re: Type inference and constant values
« Reply #8 on: August 07, 2020, 02:14:15 pm »
All in all, the fact that you needed to write this post to explain the behaviour of the fpc seems to me as an indicator that the system as it is currently is pretty bad and unintuitive

It is always a problem when some arbitrary rule make life worst than it should be. To be honest, I have first time seen in FPC documentation that it imlicitly convert unsigned to signed number which are far larger than the number itself. Surprisingly, positive constant literals >= 2^63 suddenly becomes negative!?

And that behavior is documented as designed on 64-bit systems as well? Ouch!
« Last Edit: August 07, 2020, 02:16:30 pm by process_1 »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Type inference and constant values
« Reply #9 on: August 07, 2020, 02:18:29 pm »
There is always more than one way to skin a cat but, the solutions you propose all have the same deficiency, which is, they all assume that the hex literal will be used as a number (signed or unsigned) instead of possibly as a bitmask.

But bitmasks are numbers.

Actually, the Wirthian school usually sees bitmask as sets. E.g. BITSET is predefined in Modula2, and logic operators are only on them.

The "map everything on INT" is a Cism stemming from C's untyped early history, mostly to conserve on compiler memory as to compile larger files most notably on a PDP-7. (Unix work was done on both PDP-11 and PDP-7, with the latter being the weaker one)

There would be a few better solutions (at least in my oppinion) to that problem

1. Literals like in C and C++, where $FFU would be unsigned, $FFUL would unsigned long and so on. Then it can be defined that constants are always signed 32 bits except they have a postfix identifying them as otherwise

Well that is how FPC works too, just that the syntax is more cast like than suffix:

Code: Pascal  [Select][+][-]
  1. const  x = qword($FFFFFFFFFFFFFFFF);

Quote from: Warfley link=topic=50921.msg372770#msg372770

2. Use another intermediate representation for the lexer and let the parser decide. I think this is how LLVM handles it, internally they simply use arbitrary precision integers and do the type casting in the very end when they have all the neccesary information (also has something to do with optimization I guess)

Yeah, but keep in mind that a constant in C is a #define, so a preprocessor substitution, while Pascal exports a constant via the unit system.  Processing context-dependent is not exactly a clean design principle, but being clean is a ship that has sailed for C/C++ a long time ago.

For Borland derived dialects too btw, just in different ways

Quote
3. Simply treat everything without a - as positive. $FF would always be 255 because if you wanted -1 you would have written -1. Which is in my oppinion the most intuitive solution, because this way, if you write a positive number you get a positive number and if you want a negative number you write a negative number

Of course you can redo the entire typesystem and make another language to fix one warning or typecast in the highest unsigned int type and fork what remains of the language into oblivion!

But that is simply not realistic
« Last Edit: August 07, 2020, 02:58:42 pm by marcov »

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Type inference and constant values
« Reply #10 on: August 07, 2020, 02:24:55 pm »
They _can_ be interpreted as numbers but, doing so is often not convenient (and/or problematic) and, in addition to that, whenever they are interpreted as numbers they must be assigned either a signed or unsigned type and, there are always cases where the sign assigned to it is not the desirable one (or convenient one.)  Even though a bitmap can and, very often is, expressed as a number, a bitmap is a structural map, not really a number.
Ok, now I get what you are saying, but in that case using hex numbers is the exact same problem, because hex is just another way to write numbers. If one wants to decouple the number aspect from the set aspect of bitsets, it should be done on a higher language level like for example using sets (which pascal does support). This way it's up to the compiler to decide how to implement them and I as a use does not need to think about things like datatypes, signedness, etc.
But as soon as you use hexadecimal, you just brought numbers and all problems they entail into the mix.

I think numbers should be treated as numbers and sets should be treated as sets, which is one of the reasons I like python, because it got rid of all the bitiness of numbers and simply has arbitrary precision integers everywhere. This way a number is nothing more than a number

ASBzone

  • Hero Member
  • *****
  • Posts: 678
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Type inference and constant values
« Reply #11 on: August 10, 2020, 06:56:36 am »
They _can_ be interpreted as numbers but, doing so is often not convenient (and/or problematic) and, in addition to that, whenever they are interpreted as numbers they must be assigned either a signed or unsigned type and, there are always cases where the sign assigned to it is not the desirable one (or convenient one.)  Even though a bitmap can and, very often is, expressed as a number, a bitmap is a structural map, not really a number.
Ok, now I get what you are saying, but in that case using hex numbers is the exact same problem, because hex is just another way to write numbers. If one wants to decouple the number aspect from the set aspect of bitsets, it should be done on a higher language level like for example using sets (which pascal does support). This way it's up to the compiler to decide how to implement them and I as a use does not need to think about things like datatypes, signedness, etc.
But as soon as you use hexadecimal, you just brought numbers and all problems they entail into the mix.

I think numbers should be treated as numbers and sets should be treated as sets, which is one of the reasons I like python, because it got rid of all the bitiness of numbers and simply has arbitrary precision integers everywhere. This way a number is nothing more than a number


I suspect there would be more people who would be happier if bitmaps needed to be explicitly cast, while other numbers were more naturally determined.


These (multiple) threads on this issue have been enlightening for me.  I wonder what the ramifications are of:

A - Changing nothing;  vs
B - Making things Delphi compatible in Delphi mode only;  vs
C - Making them more generally Delphi compatible


I'm sure there are considerable implications either way.
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

FPK

  • Full Member
  • ***
  • Posts: 118
Re: Type inference and constant values
« Reply #12 on: August 13, 2020, 12:23:34 pm »
The only good solution is to set the type of a constant explicitly, if one really depends on it. Everything else will have unexpected side effects soon or later.

Just take the case of floating point constants: people working with hardware which has only an FPU doing single in hardware and double in software will be very unhappy with floating point constants being suddenly double and not the smallest possible type ...

And about Delphi compatibility: it applies in general only to documented stuff and not the result of random test programs because nobody knows from a test program if the behavior is just random or on purpose.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Type inference and constant values
« Reply #13 on: August 13, 2020, 12:43:08 pm »
And once again. It should be adapting to the type on the left side.
The problem is, the scanner cannot adapt to the type on the left side because the scanner doesn't have a clue what type $FF is.

Check first char = $, and if so, use an unsigned type to val to test if it is a literal, instead of a signed type, and/or do multiple val tests to see what fits best. Store the resulting signedness of literal in token. But that is basically a 65-bit type already.

But that latter bit requires all places where literals are potentially consumed to be adapted (including various trajectories that simply expressions in constants) to use this 65-bit type. But it is slightly better than a full 128-bit type, since in many cases it can simply do:

if signed then
 <signed 64-bit trajectory>
else
 <unsigned 64-bit trajectory>

without redefining all operations and buildins for all types. OTOH I expect some unsigned 64-bit buildins to be missing.

and such hack can be used for the next upper unsigned if the whole system is adapted to e.g. an emulated int128.

But since in Delphi the language implementation (the compiler) and the language it consumes are separate. In FPC it is not, since FPC is written in itself. Adding the int128 type, needs bootstrapping workarounds for several years till the int128 type is released, and replaces the bootstrapcompilerl.

So my guess is the C++ dialect that Delphi compiler uses (LLVM based now?) has some int128 emulation, and it was an easy fix for them, with only some runtime penalties that might not even be measurable. It is not so simple in a self bootstrapping system. And somebody has to put in a lot of evenings to get that fixed.
« Last Edit: August 13, 2020, 02:00:05 pm by marcov »

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Type inference and constant values
« Reply #14 on: August 13, 2020, 01:33:24 pm »
Hands up those who've used -ve numbers in any base other than 10.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018