Lazarus

Free Pascal => FPC development => Topic started by: OkobaPatino on June 28, 2020, 12:16:47 am

Title: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 12:16:47 am
Can anyone point me to why these two loops have different times? They should be the same. Is there a optimization missed?
It is the same in Delphi but not C.

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   SysUtils;
  5.  
  6. type
  7.   TTest = record
  8.     P: int64;
  9.   end;
  10.  
  11.   procedure Test;
  12.   var
  13.     V: TTest;
  14.     P: int64;
  15.     T: UInt64;
  16.     i, C: integer;
  17.   begin
  18.     C := 1000 * 1000 * 1000;
  19.  
  20.     T := GetTickCount64;
  21.     P := 1;
  22.     for i := 1 to C do
  23.       Inc(P);
  24.     WriteLn(GetTickCount64 - T); //250
  25.  
  26.     T := GetTickCount64;
  27.     V.P := 1;
  28.     for i := 1 to C do
  29.       Inc(V.P);
  30.     WriteLn(GetTickCount64 - T);
  31.  
  32.     T := GetTickCount64;
  33.     V.P := 1;
  34.     P := V.P;
  35.     for i := 1 to C do
  36.       Inc(P);
  37.     P := V.P;
  38.     WriteLn(GetTickCount64 - T); //1400
  39.   end;
  40.  
  41. begin
  42.   Test;
  43.   ReadLn;
  44. end.
Title: Re: Procedure optimization problem with local variable
Post by: Martin_fr on June 28, 2020, 12:53:05 am
Assuming -O4?
And assuming the bigger time is the loop in the middle? not the last one?

Use -al to see assembler.

Fpc (at least 3.0.4) optimizes the first loop, by using a register for "P".

But the 2nd loop, it does not optimize. I guess its because its a record. V.P remains in memory. So it is slower.
Title: Re: Procedure optimization problem with local variable
Post by: 440bx on June 28, 2020, 01:03:25 am
I was going to post exactly what Martin_fr said above including his corrections about the loop timings you presented.


Title: Re: Procedure optimization problem with local variable
Post by: josh on June 28, 2020, 01:03:47 am
What is odd though,
Tested on latest trunk64 on windows, with 32but cross compiler.

if you compile for 32bit the values are very close,
but compiling for win64 the values are way off.



Title: Re: Procedure optimization problem with local variable
Post by: 440bx on June 28, 2020, 01:07:21 am
if you compile for 32bit the values are very close,
but compiling for win64 the values are way off.
The reason for that is, in 64bit the int64 type fits in a register, therefore the variable can be placed in a register whereas in 32bit it cannot (too big).  Therefore in 32bit the compiler is somewhat forced into a more "pedestrian" way of incrementing the variable.

IOW, in 64bit there is a big difference between incrementing a register or incrementing the value in a memory location.  In 32bit, it is always incrementing the value at a memory location which causes the measurements to always be within the margin of error.


Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 01:20:24 am
@Martin_fr and @440bx Yes the bigger time is for the record and yes optimization is o4 and 64bit and FPC trunk.
So it can be an optimization like it is with 32bit or in the C compiler (Clang)?

@josh interesting point, thanks for the input.

Updated code:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   SysUtils;
  5.  
  6. type
  7.   TTest = record
  8.     P: int64;
  9.   end;
  10.  
  11.   procedure Test;
  12.   var
  13.     V: TTest;
  14.     P: int64;
  15.     T: UInt64;
  16.     i, C: integer;
  17.   begin
  18.     C := 1000 * 1000 * 1000;
  19.  
  20.     T := GetTickCount64;
  21.     P := 1;
  22.     for i := 1 to C do
  23.       Inc(P);
  24.     WriteLn(GetTickCount64 - T); //266
  25.  
  26.     T := GetTickCount64;
  27.     V.P := 1;
  28.     for i := 1 to C do
  29.       Inc(V.P);
  30.     WriteLn(GetTickCount64 - T); //1400
  31.  
  32.     T := GetTickCount64;
  33.     V.P := 1;
  34.     P := V.P;
  35.     for i := 1 to C do
  36.       Inc(P);
  37.     P := V.P;
  38.     WriteLn(GetTickCount64 - T);  //250
  39.   end;
  40.  
  41. begin
  42.   Test;
  43.   ReadLn;
  44. end.
Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 03:01:59 am
@Martin_fr I tried to export the assembly but couldn't find a solution.
Title: Re: Procedure optimization problem with local variable
Post by: Martin_fr on June 28, 2020, 04:05:16 am
@Martin_fr I tried to export the assembly but couldn't find a solution.

The asm is only to see what happens.

Afaik there is no solution. Other than hoping that fpc 3.4 (in 2 or 3 years?) will support this. (Or using trunk, when and if this gets implemented)
From what I know (but I am not one of the fpc core team, so this is 2nd hand knowledge) the fact that there is a record, blocks the register optimizer. (despite the value would fit, in 64bit)

NOT tested, but an idea that you could try
Code: Pascal  [Select][+][-]
  1. var
  2.   Accessor: int64; absolute V.P;
  3.  
May need some syntax fixes....
Maybe that way the compiler can ignore the record, and use a register.... Maybe.


Btw:
Quote
So it can be an optimization like it is with 32bit
The 32 bit compilation did not optimize any loop (from what I read).

Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 04:32:42 am
I tested the absolute way and not helped. Also I am using Trunk, so it is already a reported bug or I should report it. How could I found out about that?

I tried to change the asm code for a better result, although I like to have more opinion on this, so I can solve such a problem with the help of asm for now, until fpc support this optimization.
Title: Re: Procedure optimization problem with local variable
Post by: 440bx on June 28, 2020, 04:46:36 am
I tried to change the asm code for a better result, although I like to have more opinion on this, so I can solve such a problem with the help of asm for now, until fpc support this optimization.
You already found a reasonably good solution which is the last test case in your test program.  Just assign the record field to a variable, use that variable (which the compiler will place in a register) and when the loop is done, move the value of the variable back into the record's field.    It's a bit "pedestrian" but, I believe it is preferable over using assembler.

Placing a comment stating the reason for the "acrobatics" with the record's field might be a good idea if someone other than yourself may need to maintain that code.

HTH.
Title: Re: Procedure optimization problem with local variable
Post by: ASerge on June 28, 2020, 05:51:02 am
And you need to optimize small procedures. For example, let's add this procedure to our sample:
Code: Pascal  [Select][+][-]
  1. procedure Dummy(var Value: Int64); inline;
  2. begin
  3. end;
It's even inline! However, if you add a Dummy(P) to the end of the Test procedure, the compiler will stop putting the variable P in the register, even in version 3.3.1 with the optimization level -O4.
Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 12:10:47 pm
@440bx thanks for the suggestion, although I like to find a cleaner way as this loop will be used a lot.
@ASerge I'm afraid I didn't understand exactly your point.
Title: Re: Procedure optimization problem with local variable
Post by: ASerge on June 28, 2020, 04:21:26 pm
@ASerge I'm afraid I didn't understand exactly your point.
I took your last example. I got results:
Quote
343
2496
327
Then I added the procedure I had indicated to the end. I got results:
Quote
2496
2480
2325
Title: Re: Procedure optimization problem with local variable
Post by: PascalDragon on June 28, 2020, 04:22:50 pm
And you need to optimize small procedures. For example, let's add this procedure to our sample:
Code: Pascal  [Select][+][-]
  1. procedure Dummy(var Value: Int64); inline;
  2. begin
  3. end;
It's even inline! However, if you add a Dummy(P) to the end of the Test procedure, the compiler will stop putting the variable P in the register, even in version 3.3.1 with the optimization level -O4.

For passing the Value parameter the compiler needs a memory location. Apparantly it doesn't recognize correctly that it doesn't need to handle it as a memory value. Would you please report this as a bug with a selfcontained example?
Title: Re: Procedure optimization problem with local variable
Post by: ASerge on June 28, 2020, 04:26:16 pm
Would you please report this as a bug with a selfcontained example?
In my opinion, this is not a bug. You can't force the compiler to optimize everywhere and always, something must remain for the developer.
Title: Re: Procedure optimization problem with local variable
Post by: PascalDragon on June 28, 2020, 04:34:44 pm
Would you please report this as a bug with a selfcontained example?
In my opinion, this is not a bug. You can't force the compiler to optimize everywhere and always, something must remain for the developer.

You should leave that to e.g. Florian to decide. After all using inline the compiler might discover better optimization opportunities, but if something like this is stopping it from doing better, then what use does it have? E.g. if the inlined function does a simple Inc(Value) then it would be worse than without the inlining.
Title: Re: Procedure optimization problem with local variable
Post by: ASerge on June 28, 2020, 05:46:32 pm
You should leave that to e.g. Florian to decide.
Ok - 37282 (https://bugs.freepascal.org/view.php?id=37282).
Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 06:08:34 pm
@PascalDragon the first problem of optimizing usage of a record value needs reporting too?
Title: Re: Procedure optimization problem with local variable
Post by: BeniBela on June 28, 2020, 06:47:17 pm
You already found a reasonably good solution which is the last test case in your test program.  Just assign the record field to a variable, use that variable (which the compiler will place in a register) and when the loop is done, move the value of the variable back into the record's field.    It's a bit "pedestrian" but, I believe it is preferable over using assembler.

Placing a comment stating the reason for the "acrobatics" with the record's field might be a good idea if someone other than yourself may need to maintain that code.


You cannot do that acrobatics, when the loop is a for..in loop with a record enumerator


@PascalDragon the first problem of optimizing usage of a record value needs reporting too?

They did not like it: https://bugs.freepascal.org/view.php?id=34915
Title: Re: Procedure optimization problem with local variable
Post by: PascalDragon on June 28, 2020, 07:03:08 pm
@PascalDragon the first problem of optimizing usage of a record value needs reporting too?

They did not like it: https://bugs.freepascal.org/view.php?id=34915

Jonas simply said to move the discussion about this to the mailing list and not to have it in the bugtracker. As you can see the bug was set to "suspended", not "won't fix", "no change required" or something like that.
Title: Re: Procedure optimization problem with local variable
Post by: OkobaPatino on June 28, 2020, 07:25:01 pm
I guessed there should be a bug report already, thanks @BeniBela. Did you find any workaround for now?
Title: Re: Procedure optimization problem with local variable
Post by: 440bx on June 28, 2020, 07:46:07 pm
You cannot do that acrobatics, when the loop is a for..in loop with a record enumerator
I am not surprised there are cases when that method won't produce the desired results. 

Personally, I would try different ways for each case and hopefully find one that leads the compiler to produce the code I want.  Other than that, hand coded assembly seems to be the only way to get the code optimized as desired.
TinyPortal © 2005-2018