measuring fpc codegen speed

fcu

Jr. Member
Posts: 90

hi
i wanted to see the difference between fpc releases in term of godegen speed , so i tried measuring png image loading time ( decoding time ) using fcl-image , the png is ~5mb , the thing is there is no significant difference , only tiny milliseconds ( fpc versions was 2.6 till 3.3.1 ) , so i am wondering if fpc codegen optimizer has been improved over the releases !

Logged

Martin_fr

Administrator
Hero Member
Posts: 9869
Debugger - SynEdit - and more

Re: measuring fpc codegen speed

« Reply #1 on: March 03, 2023, 09:30:34 pm »

Well... There is work on the optimizer in 3.3.1 (i.e. current git main branch). Afaik a lot on generating better assembler code. Not sure what/if changes on the high level.
And well, you may or may not benefic from it.

There is a lot more (a hell lot more) to the speed of your app, than the speed at which the code itself can be processed.

To start with, modern cpu do quite some part of the optimization themself. That is, even if you have unoptimized code your cpu may run it as fast as optimized code in some (many?) cases.... Though of course that depends on the optimizations that are performed. Better assembler code makes often just marginal differences.

What can differ (and I don't know where FPC stands on it) is optimizing the logic/program flow. Such as moving code that is inside a loop, to outside the loop. In other words rewrite your code for you.

But more - and I guess your png example may partly fall into this - code is not the only factor. Data needs to be optimized too. If there is huge amount of data to be processed then it needs to be loaded (from mem to cache). And if the code run faster (or could run faster) than the time needed to load the data into the cache (and if potentially you have lots of cache misses) then speeding up the code does nothing. The data must be stored in a way that it makes maximum use of each cache line.

And lots of other stuff.
A fun bit of background https://www.youtube.com/watch?v=r-TLSBdHe1A

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

Leledumbo

Hero Member
Posts: 8757
Programming + Glam Metal + Tae Kwon Do = Me

Re: measuring fpc codegen speed

« Reply #2 on: March 03, 2023, 10:07:37 pm »

Quote from: fcu on March 03, 2023, 08:42:52 pm

so i tried measuring png image loading time ( decoding time ) using fcl-image

Kinda bad selection, the package's slow implementation will bottleneck it. No codegen improvement can save an intentionally chosen to be slow (for portability reason) implementation. If you want to see the actual improvement, use CPU bound codes. After all, codegen is about what the CPU will execute when the code is run. You can pick the benchmark game codes instead. You also don't mention any use of optimization switches, the default is -O0 that doesn't do any optimization, almost like a 1:1 mapping, will not even optimize x := x + 1 statement:

Code: [Select]

 # default -O0
  24   │ # [4] x := x + 1;
  25   │     movl    U_$P$PROGRAM_$$_X,%eax
  26   │     leal    1(%eax),%eax
  27   │     movl    %eax,U_$P$PROGRAM_$$_X
 # -O1
  24   │ # [4] x := x + 1;
  25   │     addl    $1,U_$P$PROGRAM_$$_X

Try at least -O2 (-O3 will be better, but -O4 may cause unwanted side effects), 2.6 and 3.2 will show quite some improvements in certain types of codes.

Logged

Follow this if you want me to answer: http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

http://pascalgeek.blogspot.com
https://bitbucket.org/leledumbo
https://github.com/leledumbo
Code first, think later - Natural programmer B)

fcu

Jr. Member
Posts: 90

Re: measuring fpc codegen speed

« Reply #3 on: March 04, 2023, 10:38:51 am »

thanks
i choose fcl-image because its pure pascal and png decoder has alot of calculations, and the implementation still the same , so if the compiler is improved sure it will do better work and the speed would be tangible.

but iam not sure if fcl-image package was compiled with optimization enabled or not !

@Leledumbo the test was done with -O3 switch

Logged

Martin_fr

Administrator
Hero Member
Posts: 9869
Debugger - SynEdit - and more

Re: measuring fpc codegen speed

« Reply #4 on: March 04, 2023, 12:05:51 pm »

Most of the optimizations I have seen being added in 3.3.1 are for better assembler code.

E.g
- replacing "conditional jump" with "conditional set value" (which means the cpu wont have to predict the jump)
- changing the order of 2 statements, to allow the CPU to compute ahead more statements (register-rename, pipeline-stalls, ...)

All those will gain time, if the CPU did not find ways to optimize the lesser code on it's own.

And also the time gain is not that big. Well, if you put a statement that benefits, into a loop (1 to 1 million), and nothing else is in that loop, then it's noticeable. In real life, the code that benefits makes a few percent of your app's code, and if a few percent are speed up just a little, the overall app wont have much of a measurable gain.

The other form of *code* optimization is changes to the actual code before (during) compiling it.

Code: Pascal [Select][+]

for a := 1 to 100000 do begin b := a * x; writeln(b); end;

could become

Code: Pascal [Select][+]

b := x; for a := 1 to 100000 do begin writeln(b); b := b + 1; end;

The addition computes faster than the multiplication.
(In the example the slow "writeln" will eat most of the time / In real life this may gain some speed, but depends... If the cpu was able to do other work of the loop, while the multiplication was done, then the benefit may be small(er))

Such code exists for example when accessing array elements.

I don't know what FPC does on this sort of code.... It is possible that there is still some potential for fpc improvements.

Long ago, I did this by hand (among other stuff). https://gitlab.com/freepascal.org/fpc/source/-/issues/10275
There are rewritten code examples, that manually do such implementations. Of course read-ability of the code suffers a lot.
But that particular code gained IIRC approx 40% speed.
(Though that was a very old fpc version, and I tricked fpc into doing some register optimization, that it may nowadays do without tricks, yet some of those code changes may still make a difference on similar code when using a current fpc)

Then there is stuff the compiler generally wont do for you. The choice of algorithm. Using sorted data and do binary search, or even doing hash lookups => that can speed up an app by several orders of magnitude.

And yet then again, as I said: data in memory needs to be optimized too. Or the order in which it is accessed. And I don't know if any compiler will do much of that for you.

As an example google "optimizing matrix multiplication".
If you just do nested loops over the data, you get a lot of access to memory addresses far away from each other. That means you completely loose the benefit from holding data in the cpu cache. And any memory operations that can not be cached, is slow. For large data that can be truly significant.

Having said all that, there is room in fpc for more optimizations.
And maybe, or maybe not, there are a few things left that may gain you more than 2 or 3 percent (on very specific code).

But my experience is that today's fpc allows you to write very well performing code already.

« Last Edit: March 04, 2023, 12:09:26 pm by Martin_fr »

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

Awkward

Full Member
Posts: 135

Re: measuring fpc codegen speed

« Reply #5 on: March 04, 2023, 12:23:30 pm »

Quote from: Martin_fr on March 04, 2023, 12:05:51 pm

Code: Pascal [Select][+][-]
for a := 1 to 100000 do begin b := a * x; writeln(b); end;
could become
Code: Pascal [Select][+][-]
b := x; for a := 1 to 100000 do begin writeln(b); b := b + 1; end;

sorry, did you noticed what your 2nd code line is not equivalent of 1st code line?

Logged

Martin_fr

Administrator
Hero Member
Posts: 9869
Debugger - SynEdit - and more

Re: measuring fpc codegen speed

« Reply #6 on: March 04, 2023, 12:34:11 pm »

Quote from: Awkward on March 04, 2023, 12:23:30 pm

Quote from: Martin_fr on March 04, 2023, 12:05:51 pm

Code: Pascal [Select][+][-]
for a := 1 to 100000 do begin b := a * x; writeln(b); end;
could become
Code: Pascal [Select][+][-]
b := x; for a := 1 to 100000 do begin writeln(b); b := b + 1; end;

sorry, did you noticed what your 2nd code line is not equivalent of 1st code line?

Ok, typo, should be "+x" instead of "+1"
But the idea is still the same.

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

Paolo

Hero Member
Posts: 510

Re: measuring fpc codegen speed

« Reply #7 on: March 04, 2023, 12:36:38 pm »

maybe the second one should be...

Code: Pascal [Select][+]

b := x; 
for a := 1 to 100000 do begin 
  writeln(b); 
  b := b + X; //<---- add X here
end;
 

Logged

marcov

Administrator
Hero Member
Posts: 11452
FPC developer.

Re: measuring fpc codegen speed

« Reply #8 on: March 04, 2023, 12:46:34 pm »

Also, the fcl-image loaders are very generic, and therefore won't benefit (much) from optimization

Logged

dseligo

Hero Member
Posts: 1221

Re: measuring fpc codegen speed

« Reply #9 on: March 04, 2023, 01:17:48 pm »

Quote from: Paolo on March 04, 2023, 12:36:38 pm

maybe the second one should be...

Code: Pascal [Select][+][-]
b := x;
for a := 1 to 100000 do begin
writeln(b);
b := b + X; //<---- add X here
end;

It should be like this to be equivalent

Code: Pascal [Select][+]

b := 0; 
for a := 1 to 100000 do begin 
  b := b + x;
  writeln(b); 
end;

Logged

Paolo

Hero Member
Posts: 510

Re: measuring fpc codegen speed

« Reply #10 on: March 04, 2023, 06:07:01 pm »

yes.
the code was equivalent for the output purpose, not for the B content

Logged

korba812

Sr. Member
Posts: 394

Re: measuring fpc codegen speed

« Reply #11 on: March 04, 2023, 06:08:09 pm »

Quote from: fcu on March 03, 2023, 08:42:52 pm

hi
i wanted to see the difference between fpc releases in term of godegen speed , so i tried measuring png image loading time ( decoding time ) using fcl-image , the png is ~5mb , the thing is there is no significant difference , only tiny milliseconds ( fpc versions was 2.6 till 3.3.1 ) , so i am wondering if fpc codegen optimizer has been improved over the releases !

Maybe the deflate implementation is not efficient and any optimization on the compiler side won't help anything?

Logged

lagprogramming

Sr. Member
Posts: 406

Re: measuring fpc codegen speed

« Reply #12 on: March 05, 2023, 11:29:49 am »

1/4 packages/fcl-image/src/fpreadpng.pp contains the following procedure:

Code: Pascal [Select][+]

procedure TFPReaderPNG.InternalRead (Str:TStream; Img:TFPCustomImage);
begin
  {$ifdef FPC_Debug_Image}
  if Str<>TheStream then
    writeln('WARNING: TFPReaderPNG.InternalRead Str<>TheStream');  
  {$endif}
  with Header do
    Img.SetSize (Width, Height);
  ZData := TMemoryStream.Create;
  try
    EndOfFile := false;
    while not EndOfFile do
      begin
      ReadChunk;
      HandleChunk;
      end;
    ZData.position:=0;
    Decompress := TDecompressionStream.Create (ZData);
    try
      DoDecompress;
    finally
      Decompress.Free;
    end;
  finally
    ZData.Free;
    if not img.UsePalette and assigned(FPalette) then
      begin
      FreeAndNil(FPalette);
      end;
  end;
end;

I think the source code can be improved by replacing the "while" loop with a "repeat" loop like:

Code: Pascal [Select][+]

repeat
ReadChunk;
HandleChunk;
until EndOfFile;

2/4 There are a couple of reasons why fcl-image is slow, few of them having to do with compiler optimizations. A significant improvement would be to get rid of those "swap" functions in fpimgcmn.pp and "swap32" in qoicomn.pas by replacing their usage with fpc's optimized versions named "SwapEndian". The code would be cleaner, faster and easier to maintain. It shouldn't be a big deal to make those changes. Most likely, a bigger problem is to test those changes in order to make sure everything is working properly as new bugs can be easily added.
As a side note, the implementation part of packages/fcl-image/src/fpreadxwd.pas starts with:

Code: Pascal [Select][+]

//==============================================================================
// Endian utils
//
// Copied from LCLProc unit
//==============================================================================
{$push}{$R-}
function BEtoN(const AValue: DWord): DWord;
begin
  {$IFDEF ENDIAN_BIG}
    Result := AValue;
  {$ELSE}
    Result := (AValue shl 24)
           or ((AValue and $0000FF00) shl 8)
           or ((AValue and $00FF0000) shr 8)
           or (AValue shr 24);
  {$ENDIF}
end;
{$pop}  

Obviously, same as with the "swap" functions, an improvement would be to get rid of this one, too.

3/4 packages/fcl-image/src/fpreadpng.pp contains class function TFPReaderPNG.InternalSize(Str: TStream): TPoint; and function TFPReaderPNG.InternalCheck (Str:TStream) : boolean;. These two functions compare 8 byte blocks byte by byte using inefficient "for" loops.

4/4 packages/fcl-image/src/fpreadpng.pp contains procedure TFPReaderPNG.HandleAlpha;, which calls the "move" procedure to copy word sized values. The preparation to call the "move" procedure is more expensive than the actual 2 bytes copy.

There are additional ways to make fcl-image run faster without improving the compiler but as we can see, significant code changes are needed.

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: measuring fpc codegen speed (Read 1071 times)

fcu

measuring fpc codegen speed

Martin_fr

Re: measuring fpc codegen speed

Leledumbo

Re: measuring fpc codegen speed

fcu

Re: measuring fpc codegen speed

Martin_fr

Re: measuring fpc codegen speed

Awkward

Re: measuring fpc codegen speed

Martin_fr

Re: measuring fpc codegen speed

Paolo

Re: measuring fpc codegen speed

marcov

Re: measuring fpc codegen speed

dseligo

Re: measuring fpc codegen speed

Paolo

Re: measuring fpc codegen speed

korba812

Re: measuring fpc codegen speed

lagprogramming

Re: measuring fpc codegen speed

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook