Recent

Author Topic: [SOLVED] Is there something wrong in this code?  (Read 4471 times)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12562
  • FPC developer.
Re: Is there something wrong in this code?
« Reply #15 on: May 12, 2020, 10:19:05 pm »
Then, it is probably the frequent flushing, which is mainly meant for easy piping.

See the answer to this question

MarkMLl

  • Hero Member
  • *****
  • Posts: 8515
Re: Is there something wrong in this code?
« Reply #16 on: May 12, 2020, 10:31:58 pm »
Have you compared the difference between the hand-crafted assembly and what is generated by the Pascal compiler?
It seems FPC is far from the ideal code for this. Other compilers solved it quite differently.

What do you mean? It's hardly the job of the compiler to solve the problem, it's there to generate an accurate implementation of the algorithm you've specified... "warts and all".

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

ASBzone

  • Hero Member
  • *****
  • Posts: 733
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Is there something wrong in this code?
« Reply #17 on: May 13, 2020, 01:29:44 am »
Have you compared the difference between the hand-crafted assembly and what is generated by the Pascal compiler?
It seems FPC is far from the ideal code for this. Other compilers solved it quite differently.

I don't think we have enough info to blame the compiler.  We *might* if you showed us the code that was used for the other languages, as well as any compile options.

Without that, and without a comparison of the assembler that was generated in any of the other cases, we have no way of drilling down to where the issue might be.

I did change the WRITELN statements, and the hash matches:

filehash -p C:\Temp\FasterPrime.TXT
SHA2_256: f13156e206e68386cb86b13093520acc5da04c875926411bd4df4e76590e81cf  {File}  C:\Temp\FasterPrime.TXT

Still ~12.4 seconds.

My fast prime generator is using a segmented sieve routine.  Blisteringly fast.   My normal prime generator completes the same 1M prime numbers in ~13 seconds, without printing anything, or ~22 when printing.  It could be faster, but it check for CTRL-C and CTRL-BREAK handling, among other things, to be able to provide a status is you prematurely abort operation.

-ASB: https://www.BrainWaveCC.com/

Lazarus v4.3.0.0 (bcf314a670) / FreePascal v3.2.3-46-g77716a79dc (aka fixes)
(Windows 64-bit install w/Win32 and Linux on ARM and x64 cross-compilers via FpcUpDeluxe)

My Systems: Windows 10/11 Pro x64 (Current)

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #18 on: May 13, 2020, 02:40:53 am »
Then, it is probably the frequent flushing, which is mainly meant for easy piping.

See the answer to this question
I tested with no output at all and the time is nearly the same.

It seems FPC is far from the ideal code for this. Other compilers solved it quite differently.
I don't think we have enough info to blame the compiler.  We *might* if you showed us the code that was used for the other languages, as well as any compile options.

Without that, and without a comparison of the assembler that was generated in any of the other cases, we have no way of drilling down to where the issue might be.
...
Here is the C code of the same program:
Code: C  [Select][+][-]
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <math.h>
  4.  
  5. /** ------------------------------------------------------------------------------------------------
  6.     is_prime:
  7.  
  8.       Checks whether a number is prime.
  9.  
  10.       Parameters:
  11.  
  12.         n -> The number to be checked.
  13.  
  14.       Return value:
  15.  
  16.         Returns true (1) if number is prime and false (0) otherwise.
  17.     ------------------------------------------------------------------------------------------------ */
  18. _Bool is_prime(const unsigned int n)
  19. {
  20.   if ((n == 2) || (n == 3))
  21.   {
  22.     // number is 2 or 3, is prime
  23.     return 1;
  24.   }
  25.   else if (((n & 0x01) == 0x00) || (n < 2))
  26.   {
  27.     // number is even and not 2 or is lower than 2, not a prime
  28.     return 0;
  29.   }
  30.   else
  31.   {
  32.     unsigned int start = 3;              // start dividing by 3
  33.     unsigned int limit = floor(sqrt(n));
  34.  
  35.     for(;;)
  36.     {
  37.       if ((n % start) == 0)
  38.       {
  39.         // divisible by a number smaller than itself, not a prime
  40.         return 0;
  41.       }
  42.       else if (start >= limit)
  43.       {
  44.         // not divisible by anything except 1 and itself, is a prime
  45.         return 1;
  46.       }
  47.       else
  48.       {
  49.         start += 2;
  50.       }
  51.     }
  52.   }
  53. }
  54.  
  55.  
  56.  
  57. /***************************************************/
  58. /* ---------------- Main function ---------------- */
  59. /***************************************************/
  60. unsigned int prime_count;
  61.  
  62. int main(int argc, char **argv)
  63. {
  64.   unsigned int lp0;
  65.   unsigned int number = 0;
  66.  
  67.  
  68.   // write 'prime_count' count of prime numbers
  69.   prime_count = atoi(argv[1]);
  70.   lp0         = prime_count;
  71.  
  72.   while (lp0 > 0)
  73.   {
  74.     if (is_prime(number))
  75.     {
  76.       // print current prime number
  77.       printf("%u\n", number);
  78.  
  79.       // one more prime number found...
  80.       lp0--;
  81.  
  82.       // check next number
  83.       number++;
  84.     }
  85.     else
  86.     {
  87.       // number is not prime, skip it
  88.       number++;
  89.     }
  90.   }
  91.  
  92.  
  93.   return 0;
  94. }
  95.  

Compile flags (GCC): -O3 -lm

And here is the assembly of the is_prime function (generated by FPC):
Code: ASM  [Select][+][-]
  1. .section .text.n_p$prime_numbers_$$_is_prime$longword$$boolean
  2.         .balign 16,0x90
  3. .globl  P$PRIME_NUMBERS_$$_IS_PRIME$LONGWORD$$BOOLEAN
  4.         .type   P$PRIME_NUMBERS_$$_IS_PRIME$LONGWORD$$BOOLEAN,@function
  5. P$PRIME_NUMBERS_$$_IS_PRIME$LONGWORD$$BOOLEAN:
  6. .Lc1:
  7.         pushq   %rbx
  8.         pushq   %r12
  9.         pushq   %r13
  10.         leaq    -32(%rsp),%rsp
  11. .Lc3:
  12.         movl    %edi,%ebx
  13.         movb    $0,%r12b
  14.         cmpl    $2,%ebx
  15.         je      .Lj7
  16.         cmpl    $3,%ebx
  17.         jne     .Lj8
  18. .Lj7:
  19.         movb    $1,%r12b
  20.         jmp     .Lj12
  21. .Lj8:
  22.         movl    %ebx,%eax
  23.         andl    $1,%eax
  24.         testl   %eax,%eax
  25.         je      .Lj13
  26.         cmpl    $2,%ebx
  27.         jnb     .Lj14
  28. .Lj13:
  29.         movb    $0,%r12b
  30.         jmp     .Lj18
  31. .Lj14:
  32.         movl    $3,%r13d
  33.         movl    %ebx,16(%rsp)
  34.         movl    %ebx,%eax
  35.         movl    %eax,24(%rsp)
  36.         movl    $0,28(%rsp)
  37.         fildq   24(%rsp)
  38.         fsqrt
  39.         fstpt   (%rsp)
  40.         call    MATH_$$_FLOOR$EXTENDED$$LONGINT
  41.         movl    %eax,%ecx
  42.         .balign 8,0x90
  43. .Lj25:
  44.         movl    %ebx,%eax
  45.         movl    %r13d,%esi
  46.         cqto
  47.         idivq   %rsi
  48.         testq   %rdx,%rdx
  49.         jne     .Lj29
  50.         movb    $0,%r12b
  51.         jmp     .Lj27
  52. .Lj29:
  53.         cmpl    %ecx,%r13d
  54.         jnae    .Lj34
  55.         movb    $1,%r12b
  56.         jmp     .Lj27
  57. .Lj34:
  58.         addl    $2,%r13d
  59.         jmp     .Lj25
  60. .Lj27:
  61. .Lj18:
  62. .Lj12:
  63.         movzbl  %r12b,%eax
  64.         leaq    32(%rsp),%rsp
  65.         popq    %r13
  66.         popq    %r12
  67.         popq    %rbx
  68.         ret
  69. .Lc2:
  70. .Le0:
  71.         .size   P$PRIME_NUMBERS_$$_IS_PRIME$LONGWORD$$BOOLEAN, .Le0 - P$PRIME_NUMBERS_$$_IS_PRIME$LONGWORD$$BOOLEAN
  72.  

And here is the assembly of the is_prime function (generated by GCC):
Code: ASM  [Select][+][-]
  1.         .p2align 4,,15
  2.         .globl  is_prime
  3.         .type   is_prime, @function
  4. is_prime:
  5. .LFB22:
  6.         .cfi_startproc
  7.         leal    -2(%rdi), %eax
  8.         cmpl    $1, %eax
  9.         jbe     .L27
  10.         testb   $1, %dil
  11.         je      .L24
  12.         cmpl    $1, %edi
  13.         je      .L24
  14.         movl    %edi, %eax
  15.         pxor    %xmm0, %xmm0
  16.         pxor    %xmm1, %xmm1
  17.         subq    $24, %rsp
  18.         .cfi_def_cfa_offset 32
  19.         cvtsi2sdq       %rax, %xmm0
  20.         ucomisd %xmm0, %xmm1
  21.         sqrtsd  %xmm0, %xmm2
  22.         ja      .L28
  23. .L8:
  24.         movsd   .LC2(%rip), %xmm1
  25.         movapd  %xmm2, %xmm3
  26.         movsd   .LC1(%rip), %xmm4
  27.         movapd  %xmm2, %xmm0
  28.         andpd   %xmm1, %xmm3
  29.         ucomisd %xmm3, %xmm4
  30.         ja      .L29
  31. .L9:
  32.         movl    %edi, %eax
  33.         cvttsd2siq      %xmm0, %rcx
  34.         movl    $-1431655765, %edx
  35.         mull    %edx
  36.         shrl    %edx
  37.         movl    %ecx, %esi
  38.         leal    (%rdx,%rdx,2), %eax
  39.         cmpl    %eax, %edi
  40.         je      .L10
  41.         cmpl    $3, %ecx
  42.         jbe     .L11
  43.         movl    $3, %ecx
  44.         jmp     .L12
  45.         .p2align 4,,10
  46.         .p2align 3
  47. .L20:
  48.         cmpl    %ecx, %esi
  49.         jbe     .L11
  50. .L12:
  51.         addl    $2, %ecx
  52.         movl    %edi, %eax
  53.         xorl    %edx, %edx
  54.         divl    %ecx
  55.         testl   %edx, %edx
  56.         jne     .L20
  57. .L10:
  58.         xorl    %eax, %eax
  59.         addq    $24, %rsp
  60.         .cfi_def_cfa_offset 8
  61.         ret
  62.         .p2align 4,,10
  63.         .p2align 3
  64. .L24:
  65.         xorl    %eax, %eax
  66.         ret
  67.         .p2align 4,,10
  68.         .p2align 3
  69. .L11:
  70.         .cfi_def_cfa_offset 32
  71.         movl    $1, %eax
  72.         addq    $24, %rsp
  73.         .cfi_remember_state
  74.         .cfi_def_cfa_offset 8
  75.         ret
  76.         .p2align 4,,10
  77.         .p2align 3
  78. .L29:
  79.         .cfi_restore_state
  80.         cvttsd2siq      %xmm2, %rax
  81.         pxor    %xmm0, %xmm0
  82.         andnpd  %xmm2, %xmm1
  83.         cvtsi2sdq       %rax, %xmm0
  84.         orpd    %xmm1, %xmm0
  85.         jmp     .L9
  86.         .p2align 4,,10
  87.         .p2align 3
  88. .L27:
  89.         .cfi_def_cfa_offset 8
  90.         movl    $1, %eax
  91.         ret
  92. .L28:
  93.         .cfi_def_cfa_offset 32
  94.         movl    %edi, 12(%rsp)
  95.         movsd   %xmm2, (%rsp)
  96.         call    sqrt@PLT
  97.         movl    12(%rsp), %edi
  98.         movsd   (%rsp), %xmm2
  99.         jmp     .L8
  100.         .cfi_endproc
  101. .LFE22:
  102.         .size   is_prime, .-is_prime
  103.         .section        .rodata.str1.1,"aMS",@progbits,1
  104. .LC3:
  105.         .string "%u\n"
  106.         .section        .text.startup,"ax",@progbits
  107.         .p2align 4,,15
  108.         .globl  main
  109.         .type   main, @function
  110.  

MathMan

  • Sr. Member
  • ****
  • Posts: 443
Re: Is there something wrong in this code?
« Reply #19 on: May 13, 2020, 09:02:47 am »
This, I think, explains it. The C compiler generated code for a 32 bit architecture, while FPC generated code for 64 bit arch.

That generates a hughe difference in the execution time for the cenral IDIV instruction. You can look up the timings for different processor types under http://http://instlatx64.atw.hu/ - search for "Ivy Bridge" download the 'InstLatX64' file and have a look for IDIV instruction timings.

Cheers,
MathMan

PascalDragon

  • Hero Member
  • *****
  • Posts: 6229
  • Compiler Developer
Re: Is there something wrong in this code?
« Reply #20 on: May 13, 2020, 09:15:17 am »
This, I think, explains it. The C compiler generated code for a 32 bit architecture, while FPC generated code for 64 bit arch.

No, the C compiler generated 64-bit code as well. There are enough %rxx usages in there and FPC uses %exx when appropriate as well.

The problem could indeed be the Floor(Sqrt(...)) call, as the Sqrt results in a x87 FPU instruction and the call to Floor passes the value using the x87 FPU stack as well.

MathMan

  • Sr. Member
  • ****
  • Posts: 443
Re: Is there something wrong in this code?
« Reply #21 on: May 13, 2020, 09:55:53 am »
This, I think, explains it. The C compiler generated code for a 32 bit architecture, while FPC generated code for 64 bit arch.

No, the C compiler generated 64-bit code as well. There are enough %rxx usages in there and FPC uses %exx when appropriate as well.

The problem could indeed be the Floor(Sqrt(...)) call, as the Sqrt results in a x87 FPU instruction and the call to Floor passes the value using the x87 FPU stack as well.

I stand corrected re 64 bit architecture - I focussed too heavily on the core loop, which checks the modulo operation. But there are two things that still convince me that this is the root cause

1. the computation of Floor(Sqrt(...)) happens only once in a call to is_prime, while the core loop is executed zillions of times
2. both - instlat and Agner Fog - report the substantial timing difference between IDIV r64 and IDIV r32 on Ivy Bridge. The former ranges from 26-88 cycles while the latter is 8-11 cycles (as per Agner Fogs instruction timings).

Cheers,
MathMan

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: Is there something wrong in this code?
« Reply #22 on: May 13, 2020, 02:03:24 pm »
This, I think, explains it. The C compiler generated code for a 32 bit architecture, while FPC generated code for 64 bit arch.

No, the C compiler generated 64-bit code as well. There are enough %rxx usages in there and FPC uses %exx when appropriate as well.

The problem could indeed be the Floor(Sqrt(...)) call, as the Sqrt results in a x87 FPU instruction and the call to Floor passes the value using the x87 FPU stack as well.

I stand corrected re 64 bit architecture - I focussed too heavily on the core loop, which checks the modulo operation. But there are two things that still convince me that this is the root cause

1. the computation of Floor(Sqrt(...)) happens only once in a call to is_prime, while the core loop is executed zillions of times
2. both - instlat and Agner Fog - report the substantial timing difference between IDIV r64 and IDIV r32 on Ivy Bridge. The former ranges from 26-88 cycles while the latter is 8-11 cycles (as per Agner Fogs instruction timings).

Cheers,
MathMan
MathMan, it seems you are right.

I managed to insert an inline assembly to divide 32-bit and look the result:
Code: [Select]
Pascal: 8.860s

Edit: Compiled with -Mobjfpc -l -O3 -Sih -viewnh -Xs

Here is the code:
Code: Pascal  [Select][+][-]
  1. program prime_numbers;
  2.  
  3. uses
  4.   SysUtils, math;
  5.  
  6. (** ----------------------------------------------------------------------------------------------------
  7.     is_prime:
  8.  
  9.       Checks whether a number is prime.
  10.  
  11.       Parameters:
  12.  
  13.         n -> The number to be checked.
  14.  
  15.       Return value:
  16.  
  17.         Returns TRUE (1) if number is prime and FALSE (0) otherwise.
  18.     ---------------------------------------------------------------------------------------------------- **)
  19. function is_prime(const n: longword): boolean;
  20. const
  21.   forever = FALSE;
  22. var
  23.   start: longword;
  24.   limit: longword;
  25.   check: longword;
  26. begin
  27.   RESULT := FALSE;
  28.  
  29.   // if 2 or 3 then is prime
  30.   if (n = 2) or (n = 3) then
  31.     RESULT := TRUE
  32. else
  33.   // if even or less than 2 is not prime
  34.   if ((n and $01) = $00) or (n < 2) then
  35.     RESULT := FALSE
  36. else
  37.   begin
  38.     start := 3;
  39.     limit := floor(sqrt(n));
  40.  
  41.     repeat
  42.       // check for divisibility by current number
  43.       {$ASMMODE intel}
  44.       asm
  45.         mov edx,0
  46.         mov eax,n
  47.         mov esi,start
  48.         div esi
  49.         mov check,edx
  50.       end;
  51.       if check = 0 then
  52.       //if (n mod start = 0) then
  53.       begin
  54.         RESULT := FALSE;
  55.  
  56.         break;
  57.       end
  58.     else if (start >= limit) then
  59.       begin
  60.         RESULT := TRUE;
  61.  
  62.         break;
  63.       end
  64.     else
  65.       inc(start, 2); // a prime number cannot be even at this stage
  66.     until forever;
  67.   end;
  68. end;
  69.  
  70.  
  71.  
  72. (***************************************************)
  73. (* ---------------- Main function ---------------- *)
  74. (***************************************************)
  75. var
  76.   prime_count: longword;
  77.   number     : longword;
  78.   lp0        : longword;
  79. begin
  80.   prime_count := StrToInt(ParamStr(1));
  81.  
  82.   number := 0;
  83.   lp0    := 0;
  84.  
  85.   // write 'prime_count' count of prime numbers
  86.   while (lp0 < prime_count) do
  87.   begin
  88.     if is_prime(number) then
  89.     begin
  90.       Write(number, #10);
  91.  
  92.       // one more prime number found
  93.       inc(lp0, 1);
  94.  
  95.       // check next number
  96.       inc(number, 1);
  97.     end
  98.       else // current number is not a prime, skip it...
  99.     inc(number, 1);
  100.   end;
  101. end.
  102.  

Well, I can say this is solved.

Thanks to all who have contributed!
« Last Edit: May 13, 2020, 02:22:06 pm by qq9ebg9acvzx »

MarkMLl

  • Hero Member
  • *****
  • Posts: 8515
Re: [SOLVED] Is there something wrong in this code?
« Reply #23 on: May 13, 2020, 02:17:03 pm »
Well done. So it's fair to ask why FPC is generating a 64-bit divide opcode for 32-bit operands.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MathMan

  • Sr. Member
  • ****
  • Posts: 443
Re: [SOLVED] Is there something wrong in this code?
« Reply #24 on: May 13, 2020, 02:33:38 pm »
@qq9ebg9acvzx

Glad to be of help.

- you might want to tell the compiler which registers you used in your assembler part - replace 'end' with 'end [eax,edx,esi]'.
- another thing that might address the floor(sqrt(...)) issue is - define a 'var temp:single;' and instead of directly calculating the value do 'temp := sqrt( n ); limit := floor( temp);'

Cheers,
MathMan

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: [SOLVED] Is there something wrong in this code?
« Reply #25 on: May 13, 2020, 04:21:42 pm »
@qq9ebg9acvzx

Glad to be of help.

- you might want to tell the compiler which registers you used in your assembler part - replace 'end' with 'end [eax,edx,esi]'.
- another thing that might address the floor(sqrt(...)) issue is - define a 'var temp:single;' and instead of directly calculating the value do 'temp := sqrt( n ); limit := floor( temp);'

Cheers,
MathMan
The code does not compile when I do that, it says Error: Invalid register name.

avk

  • Hero Member
  • *****
  • Posts: 824
Re: [SOLVED] Is there something wrong in this code?
« Reply #26 on: May 13, 2020, 08:19:25 pm »
... So it's fair to ask why FPC is generating a 64-bit divide opcode for 32-bit operands...
It seems yes.

@qq9ebg9acvzx, I would implement such an algorithm(taking into account the disadvantages of the FPC optimizer) like this:
Code: Pascal  [Select][+][-]
  1. program prime_numbers;
  2. {$mode objfpc}
  3. uses
  4.   SysUtils, math;
  5. (** ----------------------------------------------------------------------------------------------------
  6.     is_prime:
  7.  
  8.       Checks whether a number is prime.
  9.  
  10.       Parameters:
  11.  
  12.         n -> The number to be checked.
  13.  
  14.       Return value:
  15.  
  16.         Returns TRUE (1) if number is prime and FALSE (0) otherwise.
  17.     ---------------------------------------------------------------------------------------------------- **)
  18. function is_prime(const n: longword): boolean;
  19. var
  20.   start, limit, modulus: longword;
  21. begin
  22.   start := 3;
  23.   limit := Trunc(Sqrt(n));
  24.   while start <= limit do
  25.     begin
  26.       // modulus for divisibility by current number
  27.       {$asmmode intel}
  28.       asm
  29.         mov eax, n
  30.         xor edx, edx
  31.         div start
  32.         mov modulus, edx
  33.       end['eax', 'edx'];
  34.       if modulus = 0 then exit(False);
  35.       inc(start, 2); // a prime number cannot be even at this stage
  36.     end;
  37.   Result := True;
  38. end;
  39. (***************************************************)
  40. (* ---------------- Main function ---------------- *)
  41. (***************************************************)
  42. var
  43.   prime_count, number, lp0: longword;
  44.   Buffer: array[0..511] of QWord;
  45. begin
  46.   if (ParamCount = 0) or not(TryStrToDWord(ParamStr(1), prime_count)) or (prime_count = 0) then
  47.     begin
  48.       Write('Usage: prime_numbers REQUIRED_COUNT(>0)');
  49.       Halt;
  50.     end;
  51.   case prime_count of
  52.     1: Write(2, #10);
  53.     2:
  54.       begin
  55.         Write(2, #10);
  56.         Write(3, #10);
  57.       end;
  58.   else
  59.     SetTextBuf(Output, Buffer, SizeOf(Buffer));
  60.     Write(2, #10);
  61.     Write(3, #10);
  62.     number := 5;
  63.     lp0    := 2;
  64.     // write 'prime_count' count of prime numbers
  65.     while (lp0 < prime_count) do
  66.       begin
  67.         if is_prime(number) then
  68.            begin
  69.              Write(number, #10);
  70.              // one more prime number found
  71.              inc(lp0);
  72.            end;
  73.         inc(number, 2);
  74.       end;
  75.   end;
  76. end.
  77.  
« Last Edit: May 13, 2020, 08:44:37 pm by avk »

qq9ebg9acvzx

  • New Member
  • *
  • Posts: 21
Re: [SOLVED] Is there something wrong in this code?
« Reply #27 on: May 14, 2020, 03:43:50 am »
... So it's fair to ask why FPC is generating a 64-bit divide opcode for 32-bit operands...
It seems yes.

@qq9ebg9acvzx, I would implement such an algorithm(taking into account the disadvantages of the FPC optimizer) like this:
...
The result was much better:
Code: [Select]
Pascal: 8.088s

Thank you!

 

TinyPortal © 2005-2018