Recent

Author Topic: */div and shr/shl  (Read 4751 times)

Okoba

  • Hero Member
  • *****
  • Posts: 660
*/div and shr/shl
« on: March 16, 2022, 12:28:41 pm »
Hello,

Are these two codes equivalent all the time? If so, why not FPC optimize it, as shr appears to be 2x faster than div in microbenchmarks? If not, a sample would be appreciated.

Code: Pascal  [Select][+][-]
  1. V := I div 2;

Code: Pascal  [Select][+][-]
  1. V := I shr 1;

Although these two take the same time:

Code: Pascal  [Select][+][-]
  1. V := I shl 1;

Code: Pascal  [Select][+][-]
  1. V := I * 2;

Tested on Trunk, Win64.

MathMan

  • Hero Member
  • *****
  • Posts: 518
Re: */div and shr/shl
« Reply #1 on: March 16, 2022, 12:49:23 pm »
Hello Okoba,

To answer your first question - no they are not egivalent all the time!

If I, V are signed integers then "div" <> "shr". On some processors there exist assembler instructions which are eqivalent - e.g. on x86 this is "sar" (shift arithmetic right). But I don't know offhand if this is available as an intrinsic in Free Pascal.

Cheers,
MathMan

PS - removed a wrong "=" sign which had slipped in
« Last Edit: March 16, 2022, 01:04:02 pm by MathMan »

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: */div and shr/shl
« Reply #2 on: March 16, 2022, 01:04:24 pm »
Can you give me a sample? I saw many uses cases that shl 1 is used instead of * 2 or shr 1 instead of div 2 for performance needs.
And I like to be more aware of pitfalls.

ccrause

  • Hero Member
  • *****
  • Posts: 1119
Re: */div and shr/shl
« Reply #3 on: March 16, 2022, 01:12:06 pm »
If I, V are signed integers then "div" <> "shr". On some processors there exist assembler instructions which are eqivalent - e.g. on x86 this is "sar" (shift arithmetic right). But I don't know offhand if this is available as an intrinsic in Free Pascal.
Note that shift arithmetic right is available in FPC, but this is not a direct equivalent to divide by two of a negative number.

Kays

  • Hero Member
  • *****
  • Posts: 632
  • Whasup!?
    • KaiBurghardt.de
Re: */div and shr/shl
« Reply #4 on: March 16, 2022, 01:13:26 pm »
Write what you mean. The compiler will optimize multiplying/dividing by a constant factor 2ⁿ accordingly.

Are these two codes equivalent all the time?
No. For instance, a recently discussed issue … shl 64.
Yours Sincerely
Kai Burghardt

ccrause

  • Hero Member
  • *****
  • Posts: 1119
Re: */div and shr/shl
« Reply #5 on: March 16, 2022, 01:21:03 pm »
Can you give me a sample? I saw many uses cases that shl 1 is used instead of * 2 or shr 1 instead of div 2 for performance needs.
And I like to be more aware of pitfalls.
A rather trivial example:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. var
  4.   i, j: longint;
  5.  
  6. begin
  7.   j := -1;
  8.   i := j shr 1;
  9.   writeln('j shr 1 = ', i);
  10.   i := j div 2;
  11.   writeln('j div 2 = ', i);
  12.   i := SarLongint(j, 1);
  13.   writeln('SarLongint(j, 1) = ', i);
  14.   readln;
  15. end.

Generated output:
Code: Text  [Select][+][-]
  1. j shr 1 = 2147483647
  2. j div 2 = 0
  3. SarLongint(j, 1) = -1

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: */div and shr/shl
« Reply #6 on: March 16, 2022, 02:06:11 pm »
@ccrause Thank you for the sample. So it only matters when the value is negative. Is it true that positive values are always equivalent?

@Kays, FPC only optimizes them when it is set to -O4, not -O3.

I hope one of the compiler devs like @PascalDragon, can clarify why it is not happening in the normally default -O3 (at least in Lazarus), and only happens in -O4 (with the beware! note). Is there something else that may be dangerous?

Is it best to always write shl/shr when you know the number is not negative, or should you write */div and set the compiler on -O4?

I would like to know what the best approach is.

MathMan

  • Hero Member
  • *****
  • Posts: 518
Re: */div and shr/shl
« Reply #7 on: March 16, 2022, 02:56:03 pm »
If I, V are signed integers then "div" <> "shr". On some processors there exist assembler instructions which are eqivalent - e.g. on x86 this is "sar" (shift arithmetic right). But I don't know offhand if this is available as an intrinsic in Free Pascal.
Note that shift arithmetic right is available in FPC, but this is not a direct equivalent to divide by two of a negative number.

Thanks for reminding me - I forgot the exceptional case of -1 and the trickery with rounding.

Cheers,
MathMan

ccrause

  • Hero Member
  • *****
  • Posts: 1119
Re: */div and shr/shl
« Reply #8 on: March 16, 2022, 03:08:31 pm »
@ccrause Thank you for the sample. So it only matters when the value is negative. Is it true that positive values are always equivalent?
The one exception I can think of for positive or unsigned numbers is that shl does not check for overflow, while the multiplication will (if overflow checking is enabled):
Code: Pascal  [Select][+][-]
  1. var i, j: longint;
  2.  
  3.   j := MaxInt;
  4.   i := j shl 1;  // i = -2
  5.   i := j * 2;    // This results in an arithmetic overflow exception if overflow check is enabled, else result is also -2
In all other cases (I think) for a positive or unsigned number x: x * n = x shl ln(n) and x div n = x shr ln(n), with n an integer power of 2.

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: */div and shr/shl
« Reply #9 on: March 16, 2022, 03:19:36 pm »
Thank you very much @ccrause.

Kays

  • Hero Member
  • *****
  • Posts: 632
  • Whasup!?
    • KaiBurghardt.de
Re: */div and shr/shl
« Reply #10 on: March 16, 2022, 03:23:52 pm »
@Kays, FPC only optimizes them when it is set to -O4, not -O3.
On an x86 Linux target I could not see idiv/imul in the assembly output, regardless of the optimization level. Maybe it depends on {$optimization fastMath} on Win64 targets.
Yours Sincerely
Kai Burghardt

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: */div and shr/shl
« Reply #11 on: March 16, 2022, 04:32:37 pm »
That is interesting.
Maybe some of the FPC team members can explain the behavior better.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6398
  • Compiler Developer
Re: */div and shr/shl
« Reply #12 on: March 17, 2022, 12:53:31 pm »
I can not reproduce. FPC 3.2.2 converts the following example to code without any div instructions on x86_64 and i386 without any optimizations enabled:

Code: Pascal  [Select][+][-]
  1. program ttest;
  2.  
  3. var
  4.   i: LongInt;
  5.   u: LongWord;
  6. begin
  7.   i := i div 2;
  8.   i := i div -2;
  9.   u := u div 2;
  10.   u := u div -2;
  11. end.

Generated assembly:

Code: ASM  [Select][+][-]
  1. # [7] i := i div 2;
  2.         movslq  U_$P$TTEST_$$_I(%rip),%rax
  3.         movq    %rax,%rdx
  4.         shrq    $63,%rdx
  5.         addq    %rdx,%rax
  6.         sarq    $1,%rax
  7.         movl    %eax,U_$P$TTEST_$$_I(%rip)
  8. # [8] i := i div -2;
  9.         movslq  U_$P$TTEST_$$_I(%rip),%rax
  10.         movq    %rax,%rdx
  11.         shrq    $63,%rdx
  12.         addq    %rdx,%rax
  13.         sarq    $1,%rax
  14.         negq    %rax
  15.         movl    %eax,U_$P$TTEST_$$_I(%rip)
  16. # [9] u := u div 2;
  17.         movl    U_$P$TTEST_$$_U(%rip),%eax
  18.         shrl    $1,%eax
  19.         movl    %eax,U_$P$TTEST_$$_U(%rip)
  20. # [10] u := u div -2;
  21.         movl    U_$P$TTEST_$$_U(%rip),%eax
  22.         movq    %rax,%rdx
  23.         shrq    $63,%rdx
  24.         addq    %rdx,%rax
  25.         sarq    $1,%rax
  26.         negq    %rax
  27.         movl    %eax,U_$P$TTEST_$$_U(%rip)
  28.  

If you have another experience then please state your FPC version and platform and provide a full, compilable, self contained example that shows the problem.

Okoba

  • Hero Member
  • *****
  • Posts: 660
Re: */div and shr/shl
« Reply #13 on: March 17, 2022, 01:22:31 pm »
Result in: Lazarus 2.3.0 (rev main-2_3-1428-g9f142eef14) FPC 3.3.1 x86_64-win64-win32/win64 without optimization
Code: Pascal  [Select][+][-]
  1. program project1;
  2. var
  3.   I, V: Cardinal;
  4. begin
  5.   I := 0;
  6.   V := I shr 1;
  7.   V := I div 2;
  8. end.          
  9.  
Code: ASM  [Select][+][-]
  1. # [7] V := I shr 1;
  2.         shrl    $1,%eax
  3.         movl    %eax,U_$P$PROJECT1_$$_V(%rip)
  4. .Ll4:
  5. # [8] V := I div 2;
  6.         movl    U_$P$PROJECT1_$$_I(%rip),%eax
  7.         shrl    $1,%eax
  8.         movl    %eax,U_$P$PROJECT1_$$_V(%rip)
  9.  
Same instruction.

Changing the type to Integer:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. var
  4.   I, V: Integer;
  5. begin
  6.   I := 0;
  7.   V := I shr 1;
  8.   V := I div 2;
  9. end.          
  10.  
Code: ASM  [Select][+][-]
  1. # [7] V := I shr 1;
  2.         shrl    $1,%eax
  3.         movl    %eax,U_$P$PROJECT1_$$_V(%rip)
  4. .Ll4:
  5. # [8] V := I div 2;
  6.         movslq  U_$P$PROJECT1_$$_I(%rip),%rax
  7.         movq    %rax,%rdx
  8.         shrq    $63,%rdx
  9.         addq    %rdx,%rax
  10.         sarq    $1,%rax
  11.         movl    %eax,U_$P$PROJECT1_$$_V(%rip)
  12.  

I can understand this, compiler thinks that as the value is integer, it may be negative so it does not use the same simple shr.

Now for this code and even with -O4 the results will be:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   SysUtils;
  5.  
  6.   procedure TestDiv;
  7.   var
  8.     I, V: Integer;
  9.   begin
  10.     for I := 1 to 1000 * 1000 * 1000 do
  11.       V := I div 2;
  12.     WriteLn(V);
  13.   end;
  14.  
  15.   procedure TestSHR;
  16.   var
  17.     I, V: Integer;
  18.   begin
  19.     for I := 1 to 1000 * 1000 * 1000 do
  20.       V := I shr 1;
  21.     WriteLn(V);
  22.   end;
  23.  
  24. var
  25.   T: QWord;
  26. begin
  27.   T := GetTickCount64;
  28.   TestDiv;
  29.   WriteLn('div: ', GetTickCount64 - T);
  30.  
  31.   T := GetTickCount64;
  32.   TestSHR;
  33.   WriteLn('shr: ', GetTickCount64 - T);
  34.   ReadLn;
  35. end.                        


Code: ASM  [Select][+][-]
  1. # [11] V := I div 2;
  2.         movslq  %eax,%rdx
  3.         movq    %rdx,%rcx
  4.         shrq    $63,%rcx
  5.         addq    %rcx,%rdx
  6.         sarq    $1,%rdx
  7.         movl    %edx,%ebx
  8.         addl    $1,%eax
  9.         cmpl    $1000000000,%eax
  10.         jng     .Lj5

Code: ASM  [Select][+][-]
  1. # [20] V := I shr 1;
  2.         movl    %eax,%ebx
  3.         shrl    $1,%ebx
  4.         addl    $1,%eax
  5.         cmpl    $1000000000,%eax
  6.         jng     .Lj10
  7.  

and the times are:
Quote
div: 469
shr: 218

and if I change integer to cardinal:

Code: ASM  [Select][+][-]
  1. # [11] V := I div 2;
  2.         movl    %eax,%ebx
  3.         shrl    $1,%ebx
  4.         addl    $1,%eax
  5.         cmpl    $1000000000,%eax
  6.         jna     .Lj5

Code: ASM  [Select][+][-]
  1. # [20] V := I shr 1;
  2.         movl    %eax,%ebx
  3.         shrl    $1,%ebx
  4.         addl    $1,%eax
  5.         cmpl    $1000000000,%eax
  6.         jna     .Lj10
  7.  

And the times:
Quote
div: 219
shr: 219


As in the integer sample, the loop is clearly in positive values, why compiler does not optimize it? Is it to be safe or something else?


PascalDragon

  • Hero Member
  • *****
  • Posts: 6398
  • Compiler Developer
Re: */div and shr/shl
« Reply #14 on: March 18, 2022, 04:34:35 pm »
As in the integer sample, the loop is clearly in positive values, why compiler does not optimize it? Is it to be safe or something else?

The compiler only considers the type of the left side of the div (which is signed) and not what range it might have. Feel free to file a feature request to improve the data flow analysis so that it might take into account this as well.

 

TinyPortal © 2005-2018