Useful optimizations for a video game project

Martin_fr

Administrator
Hero Member
Posts: 9855
Debugger - SynEdit - and more

Re: Useful optimizations for a video game project

« Reply #15 on: June 23, 2022, 06:34:08 pm »

So here is an example for the 32 byte alignment

"foo" has whatever alignment it gets by surrounding code. Also, its loop is offset by the code in front of it.
It takes 4000 ms (on my PC: I7-8700)

Then the loop at exactly 32 byte aligned: 3640 ms (almost 10% faster)
The loop with an offset of 32+8 also is fast => so relevant code inside the loop must have just hit the right alignment.

The loop that is intentionally 32+16 takes 4000.

So (on modern CPU), just adding the right align can make a noticeable diff.

And since functions are aligned at 16 bytes, it depends on where the previous function ended. And be sometime fast, and sometime not.
Which also means, if you benchmark, and you change code in one place, then code in another place may be re-aligned, and be faster or slower. Your total benchmark then may change more by the accidental align change, than by the change you tried to measure.

See https://lists.freepascal.org/pipermail/fpc-devel/2022-January/044336.html
Includes a very interesting video presentation on the topic

Code: Text [Select][+]

Code: Pascal [Select][+]

program Project1;
 
{$mode objfpc}{$H+}
 
uses
  {$IFDEF UNIX}
  cthreads,
  {$ENDIF}
  Classes, SysUtils
  { you can add units after this };
 
{$R *.res}
 
const
  N = 150*1024*1024;
var
  a, b, c: array of byte;
 
procedure foo;
var
  i: Integer;
begin
  c[0] := (a[0] + b[0]) div 2;
 
  for i := 1 to N-1 do begin
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
  end;
end;
 
procedure foo2;
var
  i: Integer;
begin
  c[0] := (a[0] + b[0]) div 2;
 
  asm
  .align 32
  end;
 
  for i := 1 to N-1 do begin
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
  end;
end;
 
procedure foo3;
var
  i: Integer;
begin
  c[0] := (a[0] + b[0]) div 2;
 
  asm
  .align 32
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  end;
 
  for i := 1 to N-1 do begin
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
  end;
end;
 
procedure foo4;
var
  i: Integer;
begin
  c[0] := (a[0] + b[0]) div 2;
 
  asm
  .align 32
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  nop
  end;
 
  for i := 1 to N-1 do begin
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
    c[i] := ( (a[i] + b[i]) div 2) xor c[i-1];
  end;
end;
 
var
  t: QWord;
  i: Integer;
begin
  SetLength(a, N);
  SetLength(b, N);
  SetLength(c, N);
  for i := 0 to N-1 do begin
    a[i] := Random(255);
    b[i] := Random(255);
  end;
 
 
  t := GetTickCount64;
  foo;
  t := GetTickCount64 -t;
  writeln(t);
 
  t := GetTickCount64;
  foo;
  t := GetTickCount64 -t;
  writeln(t);
 
 
  t := GetTickCount64;
  foo2;
  t := GetTickCount64 -t;
  writeln(t);
 
  t := GetTickCount64;
  foo2;
  t := GetTickCount64 -t;
  writeln(t);
 
 
  t := GetTickCount64;
  foo3;
  t := GetTickCount64 -t;
  writeln(t);
 
  t := GetTickCount64;
  foo3;
  t := GetTickCount64 -t;
  writeln(t);
 
 
  t := GetTickCount64;
  foo4;
  t := GetTickCount64 -t;
  writeln(t);
 
  t := GetTickCount64;
  foo4;
  t := GetTickCount64 -t;
  writeln(t);
 
 
  readln;
end.
 

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

furious programming

Hero Member
Posts: 858

Re: Useful optimizations for a video game project

« Reply #16 on: June 23, 2022, 09:52:18 pm »

Thank you very much for the example. I will definitely check this trick in the future.

But I just tested your test program on my Intel® Core™ i7-640LM (which is quite old) and I can't reproduce your results. Aligned code is slightly faster in the debug build mode (generated in the project options window), below are the results:

Code: Pascal [Select][+]

but in the release mode (also generated by the Lazarus), there is no gain — aligned code is actually slower than not aligned:

Code: Pascal [Select][+]

It looks like the optimizations itself are giving the best performance in this case.

« Last Edit: June 23, 2022, 10:09:33 pm by furious programming »

Logged

Lazarus 3.2 with FPC 3.2.2, Windows 10 — all 64-bit

Working solo on an acrade, action/adventure game in retro style (pixelart), programming the engine and shell from scratch, using Free Pascal and SDL. Release planned in 2026.

PascalDragon

Hero Member
Posts: 5462
Compiler Developer

Re: Useful optimizations for a video game project

« Reply #17 on: June 24, 2022, 09:04:09 am »

Quote from: furious programming on June 23, 2022, 05:32:52 pm

Quote
I would assume that you'll need some thight kernel here programmed in asm which fully uses AVX2 / AVX512 (or comparable) capabilities to get somewhere.

I can always use calculations only on integers (as in the good old days), because high precision of calculations will not be required — after all, the image will be highly pixelated. But there will be time for that.

SIMD instruction sets are not restricted to floating point values, but can be used with integers as well. Thus if you have multiple, equivalent integer operations that can be done in parallel (e.g. adding a vector) you can utilize SIMD.

Logged

BrunoK

Sr. Member
Posts: 452
Retired programmer

Re: Useful optimizations for a video game project

« Reply #18 on: June 24, 2022, 09:58:38 am »

Quote from: furious programming on June 23, 2022, 09:52:18 pm

Thank you very much for the example. I will definitely check this trick in the future.

But I just tested your test program on my Intel® Core™ i7-640LM (which is quite old) and I can't reproduce your results.

I can't either reproduce the results.

11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz (Laptop) :

Code: Pascal [Select][+]

C:\fpc-laz\fpc\3.2.2-git\bin\i386-win32\ppc386.exe
-MObjFPC
-Scghi
-O1
-gw2
-godwarfsets
-gl
-l
-vewnhibq
-Filib\i386-win32
-Fu.
-FUlib\i386-win32
-FE.
-opgmSpeedTest.exe
-OoREGVAR

Compiling -O1 -OoREGVAR gives very satisfactory speed and reasonable debugging.
Times for I386 and x86_64 are very similar.

Timings win10 FPC 3.2.2 i386 :
719
734
1063
1062
1079
1062
1078
1063
Trying to align code seems to be counterproductive for -O1 -OoREGVAR

What is strange is that my times are lower than those of Martin on a fairly low range laptop (and also my desktop).

Logged

furious programming

Hero Member
Posts: 858

Re: Useful optimizations for a video game project

« Reply #19 on: June 24, 2022, 10:46:50 am »

Quote from: PascalDragon on June 24, 2022, 09:04:09 am

Thus if you have multiple, equivalent integer operations that can be done in parallel (e.g. adding a vector) you can utilize SIMD.

This is the reason why the use of SIMD will not be possible — there will not be many same operations to be performed in parallel. And even if I wanted to process the frame in this way, the whole process would be much more complicated and much more difficult to implement than generating pixel by pixel separately.

The initial idea is to use a thread pool where each thread handles one ray and uses it to generate the target color of only one pixel. When the thread is done, it gets another pixel to generate — all the way to the end of the frame. After all, the buffer is streamed to the SDL texture — this is the only (and in my case very convenient, by the way) solution, as SDL does not support multi-threaded rendering.

Quote from: BrunoK on June 24, 2022, 09:58:38 am

What is strange is that my times are lower than those of Martin on a fairly low range laptop (and also my desktop).

We do not know what optimizations Martin used, although I assume he was the default. Therefore, both your laptop and my (8-year-old Lenovo X201 Tablet) give better performance results. But that's not important — the important thing is that manual code alignment doesn't give us any profit with strong optimizations used (or at least not always). I will have to be more interested in this topic and just check with the right code what the performance will look like with and without code alignment.

Logged

Martin_fr

Administrator
Hero Member
Posts: 9855
Debugger - SynEdit - and more

Re: Useful optimizations for a video game project

« Reply #20 on: June 24, 2022, 11:19:16 am »

Quote from: BrunoK on June 24, 2022, 09:58:38 am

What is strange is that my times are lower than those of Martin on a fairly low range laptop (and also my desktop).

It seems, while I did -O3 (which afaik includes -Or), I also left other stuff at defaults. Mainly -Criot - that takes time.

About the speed diff => I think the presence of asm code can affect the optimizer.
So that example did not (fully) show my point.

Actually, in my original example ignoring the first (non-asm) routine, I got 2 diff timings in routines with diff alignment.
Removing -Criot, I no longer get that diff => the code is maybe to simple for the cpu.

But (in the mail thread that I linked), I did have an example. At that time, I also found documentation that mentioned the alignment effect.

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

Paul_

Full Member
Posts: 143

Re: Useful optimizations for a video game project

« Reply #21 on: August 02, 2022, 05:03:46 pm »

Just wondering what type of game it is?

Logged

furious programming

Hero Member
Posts: 858

Re: Useful optimizations for a video game project

« Reply #22 on: August 02, 2022, 11:04:52 pm »

Quote from: Paul_ on August 02, 2022, 05:03:46 pm

Just wondering what type of game it is?

It will be an action/adventure game, with mechanics and projection similar to The Legend of Zelda: A Link to the Past (1991, SNES), but much more extensive, with much nicer graphics (using low-resolution pixelart and special filters) and with couch co-op mode. PCs are thousands of times more powerful than the SNES, so there are practically no limits and I can extend it as much as I want.

I am currently working on the foundations of the game, i.e. window programming and video modes, and an advanced input mapping. Then I will take care of fonts and create controls for the UI of the game (something like mini-LCL). And then I will take care of the engine, that is, in 2-3 months. I hope to have a working prototype of the engine by the end of this year.

« Last Edit: August 02, 2022, 11:08:04 pm by furious programming »

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Useful optimizations for a video game project (Read 3743 times)

Martin_fr

Re: Useful optimizations for a video game project

furious programming

Re: Useful optimizations for a video game project

PascalDragon

Re: Useful optimizations for a video game project

BrunoK

Re: Useful optimizations for a video game project

furious programming

Re: Useful optimizations for a video game project

Martin_fr

Re: Useful optimizations for a video game project

Paul_

Re: Useful optimizations for a video game project

furious programming

Re: Useful optimizations for a video game project

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook