Recent

Author Topic: [CLOSED-Possible bug in RTL PopCnt] - Critical question about RTL performance  (Read 5258 times)

julkas

  • Sr. Member
  • ****
  • Posts: 431
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Re: [CLOSED] Critical question about RTL performance
« Reply #15 on: June 25, 2019, 10:41:21 am »
May be. But this does not explain strange behavior with if t <> 0 then t := PopCnt(t);
The comparison test is a single cycle on most cpu's , popcnt takes at least three, so testing for zero optimizes a little here.
Yes - a little, but not from 15s to 6.5s. It's impossible.
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;

MathMan

  • Full Member
  • ***
  • Posts: 166
Re: [CLOSED] Critical question about RTL performance
« Reply #16 on: June 25, 2019, 01:50:13 pm »
May be. But this does not explain strange behavior with if t <> 0 then t := PopCnt(t);
The comparison test is a single cycle on most cpu's , popcnt takes at least three, so testing for zero optimizes a little here.
Yes - a little, but not from 15s to 6.5s. It's impossible.

Hi julkas - from my perspective it is very well possible, as the difference simply relates to the number of 0s detected. You seem to calculate the hamming distance over two arrays. If the arrays are held on heap and are not preset with random values prior to calculation then the run-time will be erratic, depending on the values found in the uninitialized heap. If your code uses variant arrays which get sized but again not preset prior to calculation then the array content is explicitely 0. There are other, more theoretical, explanations on the runtime difference.

julkas

  • Sr. Member
  • ****
  • Posts: 431
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Re: [CLOSED] Critical question about RTL performance
« Reply #17 on: June 25, 2019, 02:48:24 pm »
May be. But this does not explain strange behavior with if t <> 0 then t := PopCnt(t);
The comparison test is a single cycle on most cpu's , popcnt takes at least three, so testing for zero optimizes a little here.
Yes - a little, but not from 15s to 6.5s. It's impossible.

Hi julkas - from my perspective it is very well possible, as the difference simply relates to the number of 0s detected. You seem to calculate the hamming distance over two arrays. If the arrays are held on heap and are not preset with random values prior to calculation then the run-time will be erratic, depending on the values found in the uninitialized heap. If your code uses variant arrays which get sized but again not preset prior to calculation then the array content is explicitely 0. There are other, more theoretical, explanations on the runtime difference.
@MathMan thank you for reply.
But, I have only one line of code with RTL PopCnt - time is 15s
Code: Pascal  [Select]
  1. Inc(c, PopCnt(a.bs[i] and b.bs[i]));
I replace call to RTL PopCnt with my own pure Pascal (NOT ASM) function PopCount - now time is 7s.
Code: Pascal  [Select]
  1. function PopCount(x: QWORD): SizeInt; inline;
  2. begin
  3.   Result := 0;
  4.   while x <> 0 do
  5.   begin
  6.     Inc(Result);
  7.     x := x and (x - 1);
  8.   end;
  9. end;
  10. ...
  11. Inc(c, PopCount(a.bs[i] and b.bs[i]));
Is it normal behavoir ? And finally, RTL PopCnt with if t <> 0 then t := PopCnt(t); test runs as my PopCount version (without if test).
If test excludes zero value call (PopCnt(0) = 0)!
Thanks.
« Last Edit: June 25, 2019, 02:49:59 pm by julkas »
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;

MathMan

  • Full Member
  • ***
  • Posts: 166
Hi julkas

No you do not only have one line of code (you popcnt function) which is relevant for your measurement. You must have some surrounding code where you initialize the the arrays a & b, do the popcnt loop over the arrays etc.

Without having seen this I can not comment on whether this is "normal" behaviour or not. Pls. show the complete, compilable program of your popcnt test.

BTW - this was indeed the first question to you at the beginning of this thread.

MathMan

julkas

  • Sr. Member
  • ****
  • Posts: 431
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Hi julkas

No you do not only have one line of code (you popcnt function) which is relevant for your measurement. You must have some surrounding code where you initialize the the arrays a & b, do the popcnt loop over the arrays etc.

Without having seen this I can not comment on whether this is "normal" behaviour or not. Pls. show the complete, compilable program of your popcnt test.

BTW - this was indeed the first question to you at the beginning of this thread.

MathMan
Hi, @MathMan!
Link to the problem - https://www.spoj.com/problems/ADAFUROW/.

Regards.



procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;

MathMan

  • Full Member
  • ***
  • Posts: 166
Hi, @MathMan!
Link to the problem - https://www.spoj.com/problems/ADAFUROW/.

Regards.

Well, thanks. Unfortunately this doesn't help me to help you with your problem. Again - I can only judge if there is a real issue or everythings behaves as expected if you show me your compilable sources.

Regards,
MathMan

julkas

  • Sr. Member
  • ****
  • Posts: 431
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Hi, @MathMan!
Link to the problem - https://www.spoj.com/problems/ADAFUROW/.

Regards.

Well, thanks. Unfortunately this doesn't help me to help you with your problem. Again - I can only judge if there is a real issue or everythings behaves as expected if you show me your compilable sources.

Regards,
MathMan

Thanks @MathMan. I solved this problem in C and Pascal. Link is just for reference.
So, you know my approach and you can try solve it online or generate your own test cases (random or not) for this problem and check PopCnt performance (if you want).

procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;