Recent

Author Topic: Discussion: Integer as base type  (Read 4662 times)

440bx

  • Hero Member
  • *****
  • Posts: 6480
Re: Discussion: Integer as base type
« Reply #15 on: April 23, 2020, 02:02:44 pm »
Comparing int to uint is even more dubious.
Why do you think that ? ... they are two integers, nothing wrong with comparing them.

It should be expected that the programmer knows what is being compared.
From a programmer's viewpoint it's quite reasonable to see it as two integers being compared.  In a high level language - something that C may not be - it is quite reasonable to presume that a numeric comparison consistent with the variable type declarations is being made not one that is inconsistent with the variable data types.

IMO giving unsigned precedence over signed is preferred, because the computer doesn't know about signed.
The computer does know about signed and unsigned.  That's why there are "jump on above" and "jump on greater", etc.  The programmer's responsibility is to provide enough information to the compiler for it to make the determination and, in the case you presented, the programmer clearly stated the nature of both variables, one a signed integer and the other an unsigned integer.  That's enough information for the compiler to compare integers without data type loss.

Humans think differently,
True but, most CPUs accommodate the way humans think in that situation by providing different instructions to account for the different ways a bit pattern can be interpreted.

For example, many still do not understand why ABS(-32768) returns -32768 (unless the programming language decides to return something else).
Personally, I consider that an error and so does the CPU actually.  The overflow flag is set to indicate that the result is out of range.

To follow on Marco's comments, if a programmer wants the CPU to consider both variables unsigned - instead of as they were declared as - then the programmer should typecast the signed type to DWORD (or whatever is appropriate.)  In the case you showed and in C specifically, the compiler is giving itself latitude the programmer didn't give.



FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #16 on: April 23, 2020, 02:53:23 pm »
Incorrect. Computers also handle signed. More importantly, there is no real reason to use unsigned, except in narrow parts of the program that do bare metal hardware interfacing.

No. When converting binary numbers to base 10, the result is always [0..max_word_size]. It is by agreement that the left-most bit is chosen as the sign-bit. This becomes apparent when converting numbers to string. Simple conversion does not result in a negative representation. By agreement, when the left-most bit is set, the number is interpreted as negative for integer types. This is why different routines are necessary to convert numbers to strings: one for int and one for uint.
« Last Edit: April 23, 2020, 03:27:24 pm by Munair »
It's only logical.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #17 on: April 23, 2020, 03:26:14 pm »
The computer does know about signed and unsigned.  That's why there are "jump on above" and "jump on greater", etc.  The programmer's responsibility is to provide enough information to the compiler for it to make the determination and, in the case you presented, the programmer clearly stated the nature of both variables, one a signed integer and the other an unsigned integer.  That's enough information for the compiler to compare integers without data type loss.

Let's put a few things in perspective here. The computer doesn't know anything. All it can do is send (combined) signals, either '1' or '0'. How these signals are interpreted is human business. Instructions are used to interpret the signals and different instructions are needed to interpret specific bit-patterns differently. Instructions named with 'greater' and 'less' were designated to interpret bit-patterns as signed (by giving special meaning to the left-most bit), whereas instructions named with 'above' and 'below' were chosen to ignore the sign bit and just compare the bit pattern as it is, which is the computer's natural or uninterpreted approach. Without interpretation the computer will recognize -1 as 4294967295 (with 32-bit words).

It is up to a compiler to choose what type has precedence. This is a choice whereby the one is not better than the other. The problem arises with higher level languages that programmers think in human terms rather than computer terms. It is a nice attempt to raise the computer to the level of humans, but it will never work. Therefore, programmers should know at least in basic terms how computers operate. It would save them many questions regarding the output of certain computations, both integer and real.
It's only logical.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12847
  • FPC developer.
Re: Discussion: Integer as base type
« Reply #18 on: April 23, 2020, 03:27:19 pm »
Incorrect. Computers also handle signed. More importantly, there is no real reason to use unsigned, except in narrow parts of the program that do bare metal hardware interfacing.

No. When converting binary numbers to base 10, the result is always [0..max_word_size].

Textual representation of numbers is for humans only. It has no relevance on the machine state. The whole textual representation is a convention created by library functions. There is no lowlevel asm instruction to convert integers to decimal string representation.

(well strictly there is decimalization as part of BCD, but negative conventions there are different, and this is x86 only anyway)



munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #19 on: April 23, 2020, 03:31:00 pm »
Incorrect. Computers also handle signed. More importantly, there is no real reason to use unsigned, except in narrow parts of the program that do bare metal hardware interfacing.

No. When converting binary numbers to base 10, the result is always [0..max_word_size].

Textual representation of numbers is for humans only. It has no relevance on the machine state. The whole textual representation is a convention.
So it is misleading to state that computers handle signed. By default, they do not. Statements like these only add to confusion and keep people wondering about  the outcome of computations involving different types. Integers and UIntegers are not "just integers".

If only I could change the software that handles my bank-account replacing all integers by unsigned.  :D
« Last Edit: April 23, 2020, 03:34:06 pm by Munair »
It's only logical.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12847
  • FPC developer.
Re: Discussion: Integer as base type
« Reply #20 on: April 23, 2020, 03:32:26 pm »
Let's put a few things in perspective here. The computer doesn't know anything. All it can do is send (combined) signals, either '1' or '0'. How these signals are interpreted is human business.

But how multiple lines are wired together to form registers and do arithmetic on them _IS_ machine business.

Basically the instruction sets are engineered in a way that both signed and unsigned operation is possible, which is where 2-s complement comes in to play. To say the upper bit is 2^31 is as arbitrary as saying it is sign bit.

And there is explicit support for sign operations, e.g. special signed shifting operations and even a special sign bit in the flags register to aid in operating on signed values as much as unsigned using a common core of logic.

Saying that the machine is unsigned, but humans make it signed is an misinterpretation of circuitry. It is designed to do both with a common core, without much preference even.

Even the address logic often interprets the top bit as special (e.g. for kernelspace in both x86 as x86_64), and some older machines wired up higher address lines to memory bankswitching logic
« Last Edit: April 23, 2020, 03:35:38 pm by marcov »

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #21 on: April 23, 2020, 03:39:11 pm »
Saying that the machine is unsigned
A computer IS unsigned. All it knows is '1' and '0'. Interpretation comes with the introduction of programming languages, the first of which is ASM. Well, one could still try to program in bits of course - on that level literally nothing is defined.
It's only logical.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12847
  • FPC developer.
Re: Discussion: Integer as base type
« Reply #22 on: April 23, 2020, 03:51:20 pm »
Saying that the machine is unsigned
A computer IS unsigned. All it knows is '1' and '0'. Interpretation comes with the introduction of programming languages, the first of which is ASM. Well, one could still try to program in bits of course - on that level literally nothing is defined.

Reread. The decision to interpret as unsigned is as much an interpretation as signed. Only a few cases (like shift) and div/idiv really matter anyway.
« Last Edit: April 23, 2020, 03:58:28 pm by marcov »

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #23 on: April 23, 2020, 04:02:26 pm »
Saying that the machine is unsigned
A computer IS unsigned. All it knows is '1' and '0'. Interpretation comes with the introduction of programming languages, the first of which is ASM. Well, one could still try to program in bits of course - on that level literally nothing is defined.

Reread. The decision to interpret as unsigned is as much an interpretation as signed. Only a few cases (like shift), really matter anyway.

No, it is not. This becomes apparent when converting numbers to string (as I said earlier). Without interpretation the value -1 will be converted to string as 4294967295. It requires an additional test to interpret the bit pattern as negative and put the minus sign in front:

Code: ASM  [Select][+][-]
  1. test    eax, eax                   ; test sign bit
  2. jnl     .print_uint
  3. neg     eax                        ; negate value
  4. mov     byte [ebx], '-'            ; put '-' sign in front
  5. .print_uint:
  6. ; ...
  7.  

This is why it is generally adviced to use unsigned types if negative values are not used, because signed types take an extra test here and there.
« Last Edit: April 23, 2020, 05:22:33 pm by Munair »
It's only logical.

440bx

  • Hero Member
  • *****
  • Posts: 6480
Re: Discussion: Integer as base type
« Reply #24 on: April 23, 2020, 04:19:12 pm »
Let's put a few things in perspective here.
Perspective ... good idea.

The computer doesn't know anything.
Since you are splitting hairs, it is true that the computer doesn't know anything - just a bunch of gates.  However, the folks who designed the CPU _wired_ some of their knowledge into the CPU instructions.  That's why the CPU has different instructions for signed and unsigned comparisons.

It is up to a compiler to choose what type has precedence.
No.  It is not.  It is the programmer's responsibility to tell the compiler what he/she wants and it is the compiler's responsibility (and raison d'etre) to translate the programmer's instructions into the equivalent CPU instructions.

The compiler isn't allowed to choose anything.  It is only allowed to translate high level instructions into the _equivalent_ CPU instructions.  When a programmer specifies that a variable is signed, there is no "precedence", the variable is signed, period.  That said, that's how compiler designed and implemented by homo sapiens work, those designed and implemented by ancestors of the missing link seem to work differently (which shouldn't be much of a surprise.)

The problem arises with higher level languages that programmers think in human terms rather than computer terms.
I'm pleased you mention that.  If the programmer wants to "think" in computer terms then the programmer is responsible for telling the compiler to treat the signed variable as an unsigned one.  A compiler doesn't read minds and it's not supposed to either.  If the programmer declares a variable as signed then the compiler, which is not a clairvoyant device, cannot know that in one specific instance, the programmer wants to treat it as unsigned.

It is a nice attempt to raise the computer to the level of humans, but it will never work.
You're right, that will never work but, one thing that should work is, for the human being to specify what he/she wants and not expect the compiler to be clairvoyant.

Therefore, programmers should know at least in basic terms how computers operate.
That's a reasonable expectation.  Those who know would typecast the signed variable to a DWORD.

It would save them many questions regarding the output of certain computations, both integer and real.
It might even improve their perspective.  All good stuff.
« Last Edit: April 23, 2020, 04:24:16 pm by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12847
  • FPC developer.
Re: Discussion: Integer as base type
« Reply #25 on: April 23, 2020, 04:21:39 pm »
This is why it is generally adviced to use unsigned types if negative values are not used, because signed types take an extra test here and there.

A range of 4 billion is 8 divisions and +/- 40% chance on a 9th, when all values are uniformly divided.

A range of 2 billion is 8 divisons and a +/-  20% chance on a 9th.

So that "extra" neg test can actually replace a  20% chance on a div by a neg, so it might be netto faster actually.

But even without this fun wordplay, basing such computer architecture "principle" on one implementation of one routine which prepares for human, not machine representation is a weird argument to make.
« Last Edit: April 23, 2020, 04:29:37 pm by marcov »

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #26 on: April 23, 2020, 04:29:21 pm »
It is up to a compiler to choose what type has precedence.
No.  It is not.  It is the programmer's responsibility to tell the compiler what he/she wants and it is the compiler's responsibility (and raison d'etre) to translate the programmer's instructions into the equivalent CPU instructions.

Yes it is. That is why C gives a different result than Pascal. With the SharpBASIC compiler I could simply change the order of datatype precedence and put integer at the top, which would 'raise' unsigned to signed in expressions. However, the wise decision was made to put unsigned integer at the top.  ;)
It's only logical.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Discussion: Integer as base type
« Reply #27 on: April 23, 2020, 04:33:19 pm »
This is why it is generally adviced to use unsigned types if negative values are not used, because signed types take an extra test here and there.

A range of 4 billion is 8 divisions and +/- 40% chance on a 9th, when all values are uniformly divided.

A range of 2 billion is 8 divisons and a +/-  20% chance on a 9th.

So that "extra" neg test can actually replace a  20% chance on a div by a neg, so it might be netto faster actually.

But even without this fun wordplay, basing such computer architecture "principle" on one implementation of one routine which prepares for human, not machine representation is a weird argument to make.

Not sure what you mean by word play. As I said, "it is generally adviced", but of course, you are entitled to your own opinion. This "one implementation of one routine" demonstrates that by default a bit-pattern is not negative. There is no interpretation with unsigned types in this respect.
« Last Edit: April 23, 2020, 04:53:27 pm by Munair »
It's only logical.

Warfley

  • Hero Member
  • *****
  • Posts: 2056
Re: Discussion: Integer as base type
« Reply #28 on: April 23, 2020, 06:52:32 pm »
It is up to a compiler to choose what type has precedence. This is a choice whereby the one is not better than the other. The problem arises with higher level languages that programmers think in human terms rather than computer terms. It is a nice attempt to raise the computer to the level of humans, but it will never work. Therefore, programmers should know at least in basic terms how computers operate. It would save them many questions regarding the output of certain computations, both integer and real.

This is completely wrong. You need to know what the specification of your language says, not how you think the computer operates. Case in point c/c++. What do you think does this code do:
Code: C  [Select][+][-]
  1. for (int i=1; i>0; i++) ...
It's UB, because the type int in C is defined that it can at least have values between -2^15-1 to 2^15-1. It does not state if it can overflow, how it overflows, what internal representation is used, etc. It could be internally implemented as a struct with a bool for the sign and a 43 bit Integer. It could be implemented using 80 bit float, 1 complement or 2 complement or sign and magnitude Integers, or even with unicorn farts. Fact of the matter is, you as programmer do not know how it will be executed afterwards, except for whats written in the specification. In fact if you write code like this the optimizer can detect that this is UB and make it to:
Code: C  [Select][+][-]
  1. while (true) ...
Because overflow is not defined, and by adding a number will never get smaller.
This is by design, because sure on x86_64 CPU's the best way to implement it is using the 2nd complement number representation supported by the ALU, but C is a high level language that should work on any CPU not only x86.

When using a high level language it's your job to write code for the compiler, and it's the job of the compiler to make it into a running program according to it's specification. If you start assuming how the computer will process it internally, you are writing broken code, because then it is highly likely that it will not work on any other machine. Sure some time this is neccessary, for example when writing really low level code, and sometimes it's also a fun exercise to optimize a program like this, but in general, if you assume things about the underlying execution, you write broken code.

For example, the Apollo guidance computer used in the Saturn V rocked during all of the apollo missions had 3 different number types, 15 bit one complement integers, 30 bit fixpoint integers with 2 sign bits (yeah thats really fucked up ) and 33 bit unsigned integers. A high level language should be able to compile for such a target without any changes to the code.
Thats the case with ISO C++. If you write your code only using ISO-C++ it will work on any machine using any compiler that supports ISO-C++.

So I would challange the Idea that a programmer should need to know about how a computer internally works, while it is helpful and most certainly interesting, this knowledge has nothing to do when using a higher level language. If you assume things about how it will work after the compilation you constrain the possibilities of the compiler and optimizer and you will most certainly write broken code.

PS: This example with int is btw. not true anymorefor C++20, as they now finally standardized 2 complement number representation. Before that C++ contained std::int32 which standardized the exact width (32 bit, not just at least 16 like int) and the use of 2 complement. However in C this is still UB and if you have code that relies on the exact width or overflow of int you simply wrote broken code
« Last Edit: April 23, 2020, 09:53:11 pm by Warfley »

 

TinyPortal © 2005-2018