Recent

Author Topic: For Loops and Registers  (Read 3627 times)

syntonica

  • Full Member
  • ***
  • Posts: 120
For Loops and Registers
« on: December 09, 2019, 06:08:18 am »

I've been playing with small snippets and compiler optimizations. Nothing I specified seemed to do any good, so I finally looked at the assembly. The loop variables are being stored in memory slots and not registers!  >:(


Here is the code:
Code: Pascal  [Select][+][-]
  1. program p;
  2.  
  3.  
  4. uses SysUtils, DateUtils, Math;
  5.  
  6.  
  7. var
  8.   i, j: Int64;
  9.   r: Single;
  10.   t1, t2: TDateTime;
  11. begin
  12.   r := 0;
  13.   t1 := Now();
  14.   for j:=1 to 10000 do begin
  15.     for i:=1 to 100 do begin
  16.       r := r + (power((1+SQRT(5)) * 0.5,i)-power((1-SQRT(5)) * 0.5,i))/SQRT(5)
  17.     end;
  18.   end;
  19.   t2 := Now();
  20.   WriteLn(r,' ',MilliSecondsBetween(t2, t1));
  21. end.
  22.  


Here are my compiler switches:
Code: Pascal  [Select][+][-]
  1. fpc -O4 -a -CpCOREI -CfSSSE3 -Sv "%f"
  2.  


What am I doing wrong?  :'(

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: For Loops and Registers
« Reply #1 on: December 09, 2019, 09:30:23 am »
You need to put your code into a function/procedure and call that. If the code is part of the main program block then the variables are global variables which are considered volatile and thus are not optimized into registers by design.

syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #2 on: December 09, 2019, 10:34:36 am »
Thanks for the tip! I was wondering about that.  Here's Mark II:
Code: Pascal  [Select][+][-]
  1.  
  2. program p;
  3.  
  4.  
  5. uses SysUtils, DateUtils, Math;
  6.  
  7.  
  8. var
  9.   t1, t2: TDateTime;
  10.   r: Single;
  11.  
  12.  
  13. function calc() : Single;
  14.   var
  15.     i, j: Integer;
  16.   begin
  17.     calc := 0;
  18.     for j:=1 to 10000 do begin
  19.       for i:=1 to 100 do begin
  20.         calc := calc + (power((1+SQRT(5))/2,i)-power((1-SQRT(5))/2,i))/SQRT(5)
  21.       end;
  22.     end;
  23.   end;
  24.  
  25.  
  26. begin
  27.   t1 := Now();
  28.   r := calc;
  29.   t2 := Now();
  30.         WriteLn(r,' ',MilliSecondsBetween(t2, t1));
  31. end.
  32.  


I gained about 8ms on my machine.  As it stands, Pascal runs at about 140ms avg and my C version compiled with clang is at about 90ms avg per run, both optimized with -O4.


I have to say, the disparity is very disheartening. The DSP code I wish to port over relies on tight loops that cannot be vectorized due to the complex calculations within.  If the results I have been seeing in my comparison tests were in the neighborhood, I'd take the plunge, but it looks like FP is not for my use case. And I really like the idea of being able to cross-compile the Windows version on my Mac! If I was making a command-line or desktop application, Pascal would be a no-brainer, but right now, I have a need for speed.


Unless somebody has any suggestions (other than raw assembly), I will probably have to move on.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: For Loops and Registers
« Reply #3 on: December 09, 2019, 11:38:14 am »
Why do you do not pre-calculated consts?
Code: Pascal  [Select][+][-]
  1. program p;
  2. {$mode delphi}{$I-}
  3. uses
  4.   SysUtils, DateUtils, Math;
  5.  
  6. var
  7.   t1, t2: TDateTime;
  8.   r: Single;
  9.  
  10. function calc: Single;inline;
  11. const
  12.   sqrt5 = SQRT(5);
  13.   delta = (1+sqrt5)/2;
  14. var
  15.   i, j: Integer;
  16. begin
  17.   result:=0;
  18.   for j:=1 to 10000 do
  19.     for i:=1 to 100 do
  20.       result := result + (power(delta,i)-power(delta,i))/sqrt5
  21. end;
  22.  
  23. begin
  24.   t1 := Now;
  25.   r := calc;
  26.   t2 := Now;
  27.   WriteLn(r,' ',MilliSecondsBetween(t2, t1));
  28. end.

(And why do you count from 1, but that's another matter)
 
« Last Edit: December 09, 2019, 11:45:04 am by Thaddy »
Specialize a type, not a var.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: For Loops and Registers
« Reply #4 on: December 09, 2019, 11:43:52 am »
I checked, and power() is not inline, and double only on win64. I think the constants are inline.

The generated code itself doesn't look really shabby, nor really neat. Well, that is the state of FPC.


Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: For Loops and Registers
« Reply #5 on: December 09, 2019, 11:58:30 am »
I have a slight improvement on ARMHF with my code.
Usually for real-time DSP you would use an approximation function instead of Power().
Specialize a type, not a var.

syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #6 on: December 09, 2019, 03:02:57 pm »
Why do you do not pre-calculated consts?
 
(And why do you count from 1, but that's another matter)


The compiler should be calculating the constants and substituting the result.


(And counting from 1 because I was using the code for something else with divides, I think.)


syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #7 on: December 09, 2019, 03:23:52 pm »
Also, even with you zeroing out the function, there's no speed up in the code.  :D

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: For Loops and Registers
« Reply #8 on: December 09, 2019, 04:46:13 pm »
HI!

Dont call the function calc while the rest of the programm is still busy with initialization.
It makes a huge difference.

I called the function calc in an OnClick event and here are  the values from my maschine:

Direct started by program: 457 ms
Started by OnClick:            212 ms

Winni

PS Linux 64 gtk2

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: For Loops and Registers
« Reply #9 on: December 09, 2019, 04:49:13 pm »
(And why do you count from 1, but that's another matter)

That's probably because x^0 is fairly predictable.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: For Loops and Registers
« Reply #10 on: December 09, 2019, 05:45:31 pm »
The compiler should be calculating the constants and substituting the result.
Although the compiler can often do such things - nowadays - it also shows the capability of the programmer...
A real programmer would have analyzed the issue and - using whatever language at hand - optimized it by improving the algorithm.
That does not mean your code becomes faster, it means that you understand your code.
Specialize a type, not a var.

syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #11 on: December 09, 2019, 06:37:08 pm »
Although the compiler can often do such things - nowadays - it also shows the capability of the programmer...
A real programmer would have analyzed the issue and - using whatever language at hand - optimized it by improving the algorithm.
That does not mean your code becomes faster, it means that you understand your code.
A "real" programmer wouldn't have screwed up refactoring a function either, so let's leave fallacies and ad hominems out of it and stick to the matter at hand.

The whole formula is just giving the CPU something to do. Like I said, The compiler should be calculating the constants and substituting the result, which it was doing.  I'm looking at the optimizer to see what it can (and cannot) do.

At this point, I'm going to have to test out a much larger, more complicated example. My little one-offs are not capable of much optimization, so I'm going to hit it with a low-pass filter and run some samples through it.

syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #12 on: December 09, 2019, 09:40:29 pm »
Hooray! It looks like Pascal is going to work!


Here is a small snippet of the type of processing (wavefolding) I need to do:
Code: Pascal  [Select][+][-]
  1.  
  2.  
  3.  
  4. program test5;
  5.  
  6.  
  7. Uses Math, SysUtils, DateUtils;
  8.  
  9.  
  10. var
  11.   w1, w2, w3: array [0 .. 1023] of Double;
  12.   k1: Double;
  13.   j, k, l: Integer;
  14.   t1, t2: TDateTime;
  15.   x: Int64;
  16.  
  17. begin
  18.   { stuff some random values to look like audio samples }
  19.   for j := 0 to 1023 do
  20.     begin
  21.       w2[j] := Random * 2.0 - 1.0;
  22.       w3[j] := Random * 2.0 - 1.0;
  23.     end;
  24.   l  := Round(Random * 1023.0);
  25.   k1 := Random * 1023.0;
  26.   { time the actual work }
  27.   t1 := Now();
  28.   for x:=1 to 1000 do
  29.     for j := 0 to 1023 do
  30.       begin
  31.         k := (j + l) and 1023;
  32.         w1[j] := (w2[j] * (1.0 - k1) + w3[k] * k1) * 2.0;
  33.         if Abs(w1[j]) > 1.0 then w1[j] := Sign(w1[j]) * 2.0 - w1[j];
  34.         w1[j] := (w3[j] * (1.0 - k1) + w2[k] * k1) * 2.0;
  35.         if Abs(w1[j]) > 1.0 then w1[j] := Sign(w1[j]) * 2.0 - w1[j];
  36.       end;
  37.   t2 := Now();
  38.   WriteLn(MilliSecondsBetween(t2, t1));
  39. end.
  40.  
FPC ranges from 17 to 20ms.  C-code with GCC averages between 18 and 19ms.
Looks like y'all are stuck with me.  :P

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: For Loops and Registers
« Reply #13 on: December 09, 2019, 09:47:58 pm »
I've been translating a FFT routine to assembler in lost moments the last few weeks.  Looks a bit like this code. Though this code makes no sense since it assigns values to w1[j] that are provenly never used?

syntonica

  • Full Member
  • ***
  • Posts: 120
Re: For Loops and Registers
« Reply #14 on: December 09, 2019, 10:15:26 pm »
I've been translating a FFT routine to assembler in lost moments the last few weeks.  Looks a bit like this code. Though this code makes no sense since it assigns values to w1[j] that are provenly never used?
It's just heavily stripped down from my codebase, so none of it will probably make sense to anybody. I chose at random one of ~75 procedures that stuff an array to be used to generate the final audio. The C-code test is uglier since there's no native signum function.  C has a couple of really weird insufficiencies.

Does your FFT routine use complex numbers?  I have a routine that does it in place in a 1024 sample window, no complex numbers needed.

Edit: And I just realized I did that one Random wrong.  Still getting the hang of things...
« Last Edit: December 09, 2019, 10:17:29 pm by syntonica »

 

TinyPortal © 2005-2018