Recent

Author Topic: [SOLVED] Is there something wrong in this code?  (Read 4473 times)

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
[SOLVED] Is there something wrong in this code?
« on: May 12, 2020, 03:43:37 pm »
Freepascal generated executable is taking 2x+ more time to do the same task than other compilers:

Code: [Select]
C:        7.691s
C++:      7.752s
Assembly: 7.909s (handwritten)
D:        8.073s
Rust:     8.974s
Pascal:   20.074s

The sample program outputs a given count of prime numbers. I am not able to make it faster.

Output SHA256 for 1 000 000 prime numbers must be: f13156e206e68386cb86b13093520acc5da04c875926411bd4df4e76590e81cf

If you can make it faster while retaining the same ouput it would be nice:

Code: Pascal  [Select][+][-]
  1. program prime_numbers;
  2.  
  3. uses
  4.   SysUtils, math;
  5.  
  6. (** ----------------------------------------------------------------------------------------------------
  7.     is_prime:
  8.  
  9.       Checks whether a number is prime.
  10.  
  11.       Parameters:
  12.  
  13.         n -> The number to be checked.
  14.  
  15.       Return value:
  16.  
  17.         Returns TRUE (1) if number is prime and FALSE (0) otherwise.
  18.     ---------------------------------------------------------------------------------------------------- **)
  19. function is_prime(const n: longword): boolean;
  20. const
  21.   forever = FALSE;
  22. var
  23.   start: longword;
  24.   limit: longword;
  25. begin
  26.   RESULT := FALSE;
  27.  
  28.   // if 2 or 3 then is prime
  29.   if (n = 2) or (n = 3) then
  30.     RESULT := TRUE
  31. else
  32.   // if even or less than 2 is not prime
  33.   if ((n and $01) = $00) or (n < 2) then
  34.     RESULT := FALSE
  35. else
  36.   begin
  37.     start := 3;
  38.     limit := floor(sqrt(n));
  39.  
  40.     repeat
  41.       // check for divisibility by current number
  42.       if (n mod start = 0) then
  43.       begin
  44.         RESULT := FALSE;
  45.  
  46.         break;
  47.       end
  48.     else if (start >= limit) then
  49.       begin
  50.         RESULT := TRUE;
  51.  
  52.         break;
  53.       end
  54.     else
  55.       inc(start, 2); // a prime number cannot be even at this stage
  56.     until forever;
  57.   end;
  58. end;
  59.  
  60.  
  61.  
  62. (***************************************************)
  63. (* ---------------- Main function ---------------- *)
  64. (***************************************************)
  65. var
  66.   prime_count: longword;
  67.   number     : longword;
  68.   lp0        : longword;
  69. begin
  70.   prime_count := StrToInt(ParamStr(1));
  71.  
  72.   number := 0;
  73.   lp0    := 0;
  74.  
  75.   // write 'prime_count' count of prime numbers
  76.   while (lp0 < prime_count) do
  77.   begin
  78.     if is_prime(number) then
  79.     begin
  80.       WriteLn(number);
  81.  
  82.       // one more prime number found
  83.       inc(lp0, 1);
  84.  
  85.       // check next number
  86.       inc(number, 1);
  87.     end
  88.       else // current number is not a prime, skip it...
  89.     inc(number, 1);
  90.   end;
  91. end.
  92.  

Run the program with:
Code: [Select]
./prime_numbers 1000000
« Last Edit: May 13, 2020, 02:14:11 pm by qq9ebg9acvzx »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12563
  • FPC developer.
Re: Is there something wrong in this code?
« Reply #1 on: May 12, 2020, 03:48:21 pm »
Easiest is to test with 32-bit and 64-bit. If 64-bit is faster, then it is probably the x87 usage of 32-bit windows FPC

Or try passing -Cfsse2 or -Cfcoreavx

Possibly it is also in the Floor(), FPC's roundings are very accurate, but sometimes slightly slower as result, and the floor(sqrt()) is probably the core speed factor here.

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #2 on: May 12, 2020, 04:08:03 pm »
Easiest is to test with 32-bit and 64-bit. If 64-bit is faster, then it is probably the x87 usage of 32-bit windows FPC

Or try passing -Cfsse2 or -Cfcoreavx

Possibly it is also in the Floor(), FPC's roundings are very accurate, but sometimes slightly slower as result, and the floor(sqrt()) is probably the core speed factor here.
I am compiling 64-bit from 64-bit Linux. Both -Cfsse2 and -Cfcoreavx are returning Error: Illegal parameter.

If that is the case, is there a way to solve without inline assembly?

Bart

  • Hero Member
  • *****
  • Posts: 5642
    • Bart en Mariska's Webstek
Re: Is there something wrong in this code?
« Reply #3 on: May 12, 2020, 04:08:58 pm »
The C code etc. uses the same algorithm I assume?

Bart

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12563
  • FPC developer.
Re: Is there something wrong in this code?
« Reply #4 on: May 12, 2020, 04:12:10 pm »
Easiest is to test with 32-bit and 64-bit. If 64-bit is faster, then it is probably the x87 usage of 32-bit windows FPC

Or try passing -Cfsse2 or -Cfcoreavx

That should have been
-Cfsse3  or -Cfavx

seems the floating point names are slightly different from the integer ones.

And of course -O4.
« Last Edit: May 12, 2020, 06:05:45 pm by marcov »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8515
Re: Is there something wrong in this code?
« Reply #5 on: May 12, 2020, 04:21:46 pm »
Assuming that the other implementations really do use the same algorithm, are they smart enough to optimise an integer-only equivalent of  floor(sqrt(n));  ?

I was doing something not dissimilar myself a few weeks ago and found useful code at https://rosettacode.org/wiki/Category:PrimTrial which I distilled to

Code: [Select]
const
  cntsmallPrimes = 6;
  smallPrimes : array[0..cntsmallPrimes-1] of NativeUint = (2,3,5,7,11,13);
  wheelSize = (2-1)*(3-1)*(5-1)*(7-1)*(11-1)*(13-1);

var
  primesInitialised: boolean= false;
  deltaWheel : array[0..wheelSize-1] of byte;


procedure InitWheel;
//search for numbers that are no multiples of smallprimes
//saving only the distance, to keep size small
var
  p0,p1,i,d,res : NativeUint;
Begin
  p0 := 1;d := 0;p1 := p0;
  repeat
    Repeat
      p1 := p1+2;// only odd
      i := 1;
      repeat
        res := p1 mod smallPrimes[i];
        inc(i)
      until (res =0)OR(i >= cntSmallPrimes);
      if res <> 0 then
      Begin
        deltaWheel[d] := p1-p0;
        inc(d);
        break;
      end;
    until false;
    p0 := p1;
  until d >= wheelSize;
end;


function biggerFactor(p: NativeUint):NativeUint;
//trial division by wheel numbers
//reduces count of divisions from 1/2 = 0.5( only odd numbers )
//to 5760/30030 = 0.1918
var
  sp : NativeUint;
  d  : NativeUint;
  r  : NativeUint;
Begin
  if not primesInitialised then begin
    initWheel;
    primesInitialised := true
  end;
  sp := 1;d := 0;
  repeat
    sp := sp+deltaWheel[d];
    r := p mod sp;
    d := d+1;
    //IF d = WheelSize then d := 0;
    d := d AND NativeUint(-ord(d<>WheelSize));
    IF r = 0 then
      BREAK;
  until p < sp*sp;
  IF r = 0  then
    result := sp
  else
    result := p;
end;


function SmallFactor(pr: NativeUint):NativeUint;
//checking numbers omitted by biggerFactor
var
  k : NativeUint;
Begin
  if not primesInitialised then begin
    initWheel;
    primesInitialised := true
  end;
  result := pr;
  IF byte(pr) in [2,3,5,7,11,13] then
    EXIT;
  IF NOT(ODD(pr))then Begin result := 2; EXIT end;
  For k := 1 to cntSmallPrimes-1 do
  Begin
    IF pr Mod smallPrimes[k] = 0 then
    Begin
      result := smallPrimes[k];
      EXIT
    end;
  end;
  k  := smallPrimes[cntsmallPrimes-1];
  IF pr>k*k then
    result := biggerFactor(pr);
end;


(* Return true if the parameter is a prime number. This is available to check a
  newly-hashed passphrase on an occasional basis, and is no claims are made as
  to its efficiency.
*)
function IsPrime(lw: longword): boolean;

(* Liberated from https://rosettacode.org/wiki/Category:PrimTrial               *)

Begin
  initWheel;
  if lw > 1 then
    isPrime := smallFactor(lw) = lw
  else
    isPrime := false;
end { IsPrime } ;

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #6 on: May 12, 2020, 04:25:27 pm »
That should have been
-Cfsse3  or -Cfavx

seems the floating point names are slightly different from the integer ones.

And of course -O4.
My CPU is not AVX capable, I tried both -Cfsse3 and -Cfavx and -O4. Practically the same results.

The C code etc. uses the same algorithm I assume?

Bart
Yes. The same.

Thaddy

  • Hero Member
  • *****
  • Posts: 18490
  • Here stood a man who saw the Elbe and jumped it.
Re: Is there something wrong in this code?
« Reply #7 on: May 12, 2020, 04:32:11 pm »
inc(value,1) everywhere does the same as plain inc(value). That should be faster. Also define {$I-} this has a speed advantage. as well as  compiling for release instead of debug. o-CfSSE2 with {$codealign 16}. See user manual.
« Last Edit: May 12, 2020, 04:50:00 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8515
Re: Is there something wrong in this code?
« Reply #8 on: May 12, 2020, 04:34:00 pm »
Yes. The same.

It used to be common to use the Sieve of Eratosthenes as a compiler benchmark, until compilers started recognising what they were being asked to compile and used a more efficient algorithm.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #9 on: May 12, 2020, 04:56:01 pm »
It used to be common to use the Sieve of Eratosthenes as a compiler benchmark, until compilers started recognising what they were being asked to compile and used a more efficient algorithm.

MarkMLl
Interesting. Thanks!

inc(value,1) everywhere does the same as plain inc(value). That should be faster. Also define {$I-} this has a speed advantage. as well as  compiling for release instead of debug.
I tried that. No noticeable effects.

I am compiling by using these options (on 64-bit Linux): -Mobjfpc -l -O3 -Sih -viewnh -Xs
« Last Edit: May 12, 2020, 04:59:45 pm by qq9ebg9acvzx »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12563
  • FPC developer.
Re: Is there something wrong in this code?
« Reply #10 on: May 12, 2020, 06:06:06 pm »
Then you really need generated asm inspection.

ASBzone

  • Hero Member
  • *****
  • Posts: 733
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Is there something wrong in this code?
« Reply #11 on: May 12, 2020, 06:06:19 pm »
Freepascal generated executable is taking 2x+ more time to do the same task than other compilers:

Code: [Select]
C:        7.691s
C++:      7.752s
Assembly: 7.909s (handwritten)
D:        8.073s
Rust:     8.974s
Pascal:   20.074s

The sample program outputs a given count of prime numbers. I am not able to make it faster.



Have you compared the difference between the hand-crafted assembly and what is generated by the Pascal compiler?
-ASB: https://www.BrainWaveCC.com/

Lazarus v4.3.0.0 (bcf314a670) / FreePascal v3.2.3-46-g77716a79dc (aka fixes)
(Windows 64-bit install w/Win32 and Linux on ARM and x64 cross-compilers via FpcUpDeluxe)

My Systems: Windows 10/11 Pro x64 (Current)

440bx

  • Hero Member
  • *****
  • Posts: 5886
Re: Is there something wrong in this code?
« Reply #12 on: May 12, 2020, 06:28:47 pm »
You can't draw conclusions about any language from that piece of code.

On line 80 you have a call to writeln which will be executed for every prime in the range sqrt(userspecifiedcount).

What you're testing there is, at most, the performance of one I/O routine in the selected languages.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

ASBzone

  • Hero Member
  • *****
  • Posts: 733
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Is there something wrong in this code?
« Reply #13 on: May 12, 2020, 06:57:53 pm »
Freepascal generated executable is taking 2x+ more time to do the same task than other compilers:

Code: [Select]
C:        7.691s
C++:      7.752s
Assembly: 7.909s (handwritten)
D:        8.073s
Rust:     8.974s
Pascal:   20.074s


I ran your code on my Ryzen 2700+ system and it took 12.507 seconds to output the content to a text file.  (I redirected the output to a file).

I ran it with -O4 -OpCOREAVX2 -CfSSE42 on Windows x64 using the FPC/Lazarus config in my signature.


I hashed the output that was generated, and did not get the same SHA256 value that you did, probably because I'm running it on Windows and have different line endings.




My own (other) prime number generator that is based on the Generate Prime Numbers via Segmented Sieve Method and crank them out faster.


Code: [Select]

+>primegentest 15485863


Max Search ......... 15,485,863 (0xEC4BA7)
Total Primes .......  1,000,000 (0xF4240)
Highest Prime ...... 15,485,863 (0xEC4BA7)
---------------------------------------------------------------------------
Start Time ......... 05/12/2020 12:54:13.786
End Time ........... 05/12/2020 12:54:14.005
Elapsed Time .......             0:00:00.219

---------------------------------------------------------------------------

When redirecting the output to a file, and actually printing the resulting values, it takes 4 seconds:



Code: [Select]

+>primegentest 15485863 -s 2> C:\Temp\1M-PrimeNumbers.TXT
Max Search ......... 15,485,863 (0xEC4BA7)
Total Primes .......  1,000,000 (0xF4240)
Highest Prime ...... 15,485,863 (0xEC4BA7)
---------------------------------------------------------------------------
Start Time ......... 05/12/2020 12:55:29.862
End Time ........... 05/12/2020 12:55:34.027
Elapsed Time .......             0:00:04.165

---------------------------------------------------------------------------
-ASB: https://www.BrainWaveCC.com/

Lazarus v4.3.0.0 (bcf314a670) / FreePascal v3.2.3-46-g77716a79dc (aka fixes)
(Windows 64-bit install w/Win32 and Linux on ARM and x64 cross-compilers via FpcUpDeluxe)

My Systems: Windows 10/11 Pro x64 (Current)

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #14 on: May 12, 2020, 07:58:30 pm »
Then you really need generated asm inspection.
OK.

Have you compared the difference between the hand-crafted assembly and what is generated by the Pascal compiler?
It seems FPC is far from the ideal code for this. Other compilers solved it quite differently.

You can't draw conclusions about any language from that piece of code.

On line 80 you have a call to writeln which will be executed for every prime in the range sqrt(userspecifiedcount).

What you're testing there is, at most, the performance of one I/O routine in the selected languages.
If I remove the WriteLn line, the time difference is negligible in all other languages (redirecting output to file or /dev/null).

It takes more time in the is_prime function, apparently.

I ran your code on my Ryzen 2700+ system and it took 12.507 seconds to output the content to a text file.  (I redirected the output to a file).

I ran it with -O4 -OpCOREAVX2 -CfSSE42 on Windows x64 using the FPC/Lazarus config in my signature.

I hashed the output that was generated, and did not get the same SHA256 value that you did, probably because I'm running it on Windows and have different line endings.

My own (other) prime number generator that is based on the Generate Prime Numbers via Segmented Sieve Method and crank them out faster.
Code: [Select]
+>primegentest 15485863


Max Search ......... 15,485,863 (0xEC4BA7)
Total Primes .......  1,000,000 (0xF4240)
Highest Prime ...... 15,485,863 (0xEC4BA7)
---------------------------------------------------------------------------
Start Time ......... 05/12/2020 12:54:13.786
End Time ........... 05/12/2020 12:54:14.005
Elapsed Time .......             0:00:00.219

---------------------------------------------------------------------------
When redirecting the output to a file, and actually printing the resulting values, it takes 4 seconds:
Code: [Select]

+>primegentest 15485863 -s 2> C:\Temp\1M-PrimeNumbers.TXT
Max Search ......... 15,485,863 (0xEC4BA7)
Total Primes .......  1,000,000 (0xF4240)
Highest Prime ...... 15,485,863 (0xEC4BA7)
---------------------------------------------------------------------------
Start Time ......... 05/12/2020 12:55:29.862
End Time ........... 05/12/2020 12:55:34.027
Elapsed Time .......             0:00:04.165

---------------------------------------------------------------------------
Cool, this algorithm is much faster. And your CPU is much faster than mine as well (it is an Ivy Bridge Intel Celeron).

About the different line ending, just replace WriteLn line with Write(number, #10);
« Last Edit: May 12, 2020, 08:01:55 pm by qq9ebg9acvzx »

 

TinyPortal © 2005-2018