Recent

Author Topic: Useful optimizations for a video game project  (Read 1264 times)

furious programming

  • Hero Member
  • *****
  • Posts: 578
  • I click a little.
    • TreeStructInfo — format for text and binary configuration files
Useful optimizations for a video game project
« on: June 22, 2022, 05:41:18 pm »
I'm working on a (ultimately) large video game project. Currently, I have defined various optimizations for different build modes and in release mode I would like to have enough and such optimizations to make the output machine code run as fast as possible (the size of the executable file and memory consumption is negligible). Some information about the project:
  • the source code is procedural, using only simple records,
  • data is passed using pointers (same as in SDL API),
  • memory is allocated and deallocated manually (GetMem, AllocMem, ReallocMem and FreeMem are used everywhere),
  • data is encapsulated in records, and access to them is only possible through getters and setters (as global functions), so I often use inline.
I am currently using strong optimizations for the release mode (level 3):

Code: Pascal  [Select][+][-]
  1. {$IFDEF GAME_BUILD_DEBUG}
  2.   // debug mode settings (not important)
  3. {$ELSE}
  4.   {$INLINE       ON}
  5.   {$SMARTLINK    ON}
  6.   {$OPTIMIZATION LEVEL3}
  7.  
  8.   {$S-}
  9.   {$IOCHECKS       OFF}
  10.   {$RANGECHECKS    OFF}
  11.   {$OVERFLOWCHECKS OFF}
  12.   {$ASSERTIONS     OFF}
  13.   {$OBJECTCHECKS   OFF}
  14. {$ENDIF}

I know that in addition to the above, there are many different other optimizations that you can add yourself to the code, and some of them may be useful in my case, given the specifics of the project. Anyone have any idea what else can be unlocked to make the resulting machine code faster?

Note that I am asking generally (ahead of time), not because I currently have slow code and with compiler optimizations I want to speed it up. If anyone has additional questions, feel free to ask.
« Last Edit: June 22, 2022, 05:43:47 pm by furious programming »
Lazarus 2.2.0 with FPC 3.2.2 (2022-01-02), Windows 10 — all 64-bit

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7957
  • Debugger - SynEdit - and more
    • wiki
Re: Useful optimizations for a video game project
« Reply #1 on: June 22, 2022, 07:00:25 pm »
First of all, the most optimization potential lies in clever design of your code....

But, if you know there are no exceptions, and if you do use managed types (AnsiString / dyn array)
Code: Text  [Select][+][-]
  1. {$ImplicitExceptions off}

But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.

Avoid managed types, if you don't need them.



Choose a modern cpu type (CoreAvx or CoreI) if you don't need to support older cpu. (project options / target)



If you have a tight loop, in the middle of a long(er) routine, move the loop into a sub-routine (inlined) and call it.

At least in the past, this has sometimes helped the optimizer to do a better job with register allocations.




The classic: Move calculation out of the loop. Pre-calculate partial expressions, and only keep parts in the loop that depend on the loop counter.

For "SomeFoo[LoopCounter]" => use a pointer, and increase the pointer to the next item.
(though that one is in some cases redundant on modern cpu)




The very tricky bit, if you can align the start of small, but high-iteration-count loops to a 32 bit boundary => that can gain/loose a 2 digit percentage in speed.

Unfortunately, even functions are only aligned 16 bytes.

I have myself benchmarked code in the past. And just by changing the order in which procedures were declared (no other change), the speed varied by almost 20% to 30%.

At least on Intel. Because intel has some caches (IIRC for micro-code), that rely on 32 byte bounds.
So if you iterate some 1000 times over a loop, and if that loop has 32 or 64 bytes of code, then it runs fastest if it starts exactly on a 32 byte bound.

Unfortunately there is no option to enforce this. Maybe it can be done with asm blocks.

440bx

  • Hero Member
  • *****
  • Posts: 2835
Re: Useful optimizations for a video game project
« Reply #2 on: June 22, 2022, 09:20:14 pm »
As Martin_fr stated, most really significant speed gains come from the design of the code.

You mentioned using GetMem, FreeMem and other heap functions to manage memory.  Very significant gains can be had by doing your own memory management. 

This means allocating your own blocks based on the usage of the data and carving the block yourself.  With the proper design, it is often possible to remove the need for critical sections (or other synch method) to allocate and deallocate memory blocks. 

Depending on how often the program needs to allocate and free memory blocks, doing your own management can make a very noticeable difference but, that requires memory allocation/deallocation design upfront designed specifically to accommodate the application's needs throughout its execution.

Depending on how you implement it, there can be another very significant advantage.  If every block to be carved is requested directly from the O/S (instead of a heap) then, an external memory viewer can be used to inspect the blocks.  When debugging, this makes memory inspection independent of the current instruction, i.e, the pointers to blocks don't have to be in the current scope, once you know the address, they can always be inspected using an external memory viewer.

HTH.
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

furious programming

  • Hero Member
  • *****
  • Posts: 578
  • I click a little.
    • TreeStructInfo — format for text and binary configuration files
Re: Useful optimizations for a video game project
« Reply #3 on: June 22, 2022, 10:24:07 pm »
Thanks for the answers and advice.

But, if you know there are no exceptions, and if you do use managed types (AnsiString / dyn array)

I do not need exceptions at all — everything will be handled by error codes (like in SDL or pure WinAPI), so it would be good to remove everything related to exceptions from the code. There are appropriate tools (such as debug mode or HeapTrc unit) to check the correct operation of the code, and the release should be as effective as possible, without unnecessary instructions and additional checks.

Quote
But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.

This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?

Quote
Avoid managed types, if you don't need them.

Theoretically, I might not use them, but I have a problem with strings. While SDL uses C-style strings (PAnsiChar only), it becomes a bit of a problem to use them — especially when it comes to concatenating and converting them. There are few built-in functions to support them, and virtually none to convert.

Quote
Choose a modern cpu type (CoreAvx or CoreI) if you don't need to support older cpu. (project options / target)

I don't have these CPU types in the project settings window. Initially, I only care about modern, 64-bit processors. Older processors (including 32-bit) will certainly not be supported.

Quote
For "SomeFoo[LoopCounter]" => use a pointer, and increase the pointer to the next item.
(though that one is in some cases redundant on modern cpu)

I will be running a lot of tests and choosing the best solutions. Iterated pointer access is actually faster than indexed access — I tested it some time ago. I learned a lot of interesting things after watching the video "How I program C" by Eskil Steenberg. I highly recommend listening — it doesn't matter it's C, because in Pascal we have the same.



You mentioned using GetMem, FreeMem and other heap functions to manage memory.  Very significant gains can be had by doing your own memory management.

I believe, but I'd rather focus on writing the right code for the project and not go that low. I don't know if something like this will be needed in my case.
Lazarus 2.2.0 with FPC 3.2.2 (2022-01-02), Windows 10 — all 64-bit

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7957
  • Debugger - SynEdit - and more
    • wiki
Re: Useful optimizations for a video game project
« Reply #4 on: June 22, 2022, 11:54:22 pm »
Quote
This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?

"div 0" can be caught as exception. And I think even some access violations can, but not sure.
But in any case, I usually care not to have those, rather than would my code still work if I had them.


If you don't use "raise" and don't have any "try except" blocks (including not doing/having by whatever any code does, that you use from frameworks etc) then "{$ImplicitExceptions off}" should be ok.

Code: Pascal  [Select][+][-]
  1. procedure foo;
  2. var s: ansistring;
  3. begin
  4.   s:= getVal;
  5.   // do some stuff
  6. end;

Fpc will insert code at the end of that procedure to do "s := ''" => i.e decrease the ref-count, and free the mem of the string, if not hold by other variables.

That will always happen.

But Fpc also encapsulates the entire procedure into an "try finally" block, to make sure "s" is freed, even if an exception occurred.
And with the "{$ImplicitExceptions off}" the "try finally" is not inserted.




Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7957
  • Debugger - SynEdit - and more
    • wiki
Re: Useful optimizations for a video game project
« Reply #5 on: June 23, 2022, 12:15:17 am »
Quote
I don't have these CPU types in the project settings window. Initially, I only care about modern, 64-bit processors. Older processors (including 32-bit) will certainly not be supported.

My IDE shows them... But anyway. From
https://www.freepascal.org/docs-html/user/userap1.html
Quote
-Cp<x>     Select instruction set; see fpc -i or fpc -ic for possible values

Fpc 64 bit 3.2.0 to 3.3.1 on Windows all show
Code: Text  [Select][+][-]
  1. ATHLON64
  2. COREI
  3. COREAVX
  4. COREAVX2
  5.  

Afaik you can also enable different avx, and with that maybe get some of the extra registers used (though I am not sure....)



Do you follow the fpc mail list? Just in case, there have been various optimizer improvements in fpc 3.3.1


440bx

  • Hero Member
  • *****
  • Posts: 2835
Re: Useful optimizations for a video game project
« Reply #6 on: June 23, 2022, 02:24:19 am »
I learned a lot of interesting things after watching the video "How I program C" by Eskil Steenberg. I highly recommend listening — it doesn't matter it's C, because in Pascal we have the same.
I watched a little over an hour of it and will watch the rest later (I don't have the time right now) but, I agree with you.  He really gives a lot of good advice (advice I've been giving for a very long time!... and nobody listens...  :D)

Quite a few times during the video, I thought, this guy should program in Pascal, he'd realize what a sh*tty language C is but, aside from his very poor choice of a programming language, he cares about making his programs as consistent, easy to understand and maintainable as possible (he is the unicorn of C programmers!)

Thank you for the link.


FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 11629
Re: Useful optimizations for a video game project
« Reply #7 on: June 23, 2022, 08:01:16 am »
Using a couple of WPO cycles can speed up code too. And that is also true for procedural programming, although less so.
Black themes should be banned.

PascalDragon

  • Hero Member
  • *****
  • Posts: 4137
  • Compiler Developer
Re: Useful optimizations for a video game project
« Reply #8 on: June 23, 2022, 08:55:19 am »
Code: Pascal  [Select][+][-]
  1.   {$OPTIMIZATION LEVEL3}
  2.  

Please note that this is different from passing -O3 on the command line. It's essentially equivalent to -Oolevel3. It's currently not possible to enable all optimizations that are part of a specific level in code. You need to enable each optimization you want by hand.

Quote
But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.

This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?

Not trivially, no. You'd need to hook yourself into the RTL's exception handling. Search for ErrorProc and ExceptProc if you want to go down this rabbit hole (but note that there'll always be an exception triggered by the processor and handled by the OS, the only part you can influence is how it's handled inside your program).

MathMan

  • Sr. Member
  • ****
  • Posts: 262
Re: Useful optimizations for a video game project
« Reply #9 on: June 23, 2022, 11:43:26 am »
I'm working on a (ultimately) large video game project. ...
  • the source code is procedural, using only simple records,
  • data is passed using pointers (same as in SDL API),
  • memory is allocated and deallocated manually (GetMem, AllocMem, ReallocMem and FreeMem are used everywhere),
  • data is encapsulated in records, and access to them is only possible through getters and setters (as global functions), so I often use inline.

I personally would also spend some/substantial time on looking at point [iii] from your list above. If possible try to isolate cases where you can get away with allocating one large block of mem initially and then organize this internally by passing pointers (plus the required pointer math) when calling sub-functions etc.

I know that this is tedious / tricky work, but from my own experience it is worthwhile.

Cheers,
MathMan

furious programming

  • Hero Member
  • *****
  • Posts: 578
  • I click a little.
    • TreeStructInfo — format for text and binary configuration files
Re: Useful optimizations for a video game project
« Reply #10 on: June 23, 2022, 12:56:06 pm »
Fpc 64 bit 3.2.0 to 3.3.1 on Windows all show
Code: Text  [Select][+][-]
  1. ATHLON64
  2. COREI
  3. COREAVX
  4. COREAVX2
  5.  

I'm using stable version of Lazarus and FPC (2.2.0 and 3.2.2 respectively) and I have only ATHLON64 in this combobox.

Quote
Do you follow the fpc mail list? Just in case, there have been various optimizer improvements in fpc 3.3.1

No, I'm not following. But there is also no need to rush, because my project will be developed for a few more years (3-4), so I will update Lazarus and FPC more than once. For now, I also don't have much code to optimize — I just ask in advance.



Using a couple of WPO cycles can speed up code too. And that is also true for procedural programming, although less so.

Can you write something more about it?



Please note that this is different from passing -O3 on the command line. It's essentially equivalent to -Oolevel3. It's currently not possible to enable all optimizations that are part of a specific level in code. You need to enable each optimization you want by hand.

Interesting. From what I can see on the $OPTIMIZATION document, there is no information on this. What exactly do I have to do, which optimizations to declare additionally to get optimizations compatible with -O3? And besides, which ones to be interested in?

Quote
(but note that there'll always be an exception triggered by the processor and handled by the OS, the only part you can influence is how it's handled inside your program).

This is what I wanted to know. Thanks for the clarification.

I will test the source code thoroughly anyway, so that there are no unexpected exceptions and memory leaks caused by incorrectly written code. However, if in any case that I did not catch, exceptions were to occur, it would be better in the release version to use incorrect data (e.g. causing glitches) than for control flow to become unpredictable or for the process to be killed.



I personally would also spend some/substantial time on looking at point [iii] from your list above. If possible try to isolate cases where you can get away with allocating one large block of mem initially and then organize this internally by passing pointers (plus the required pointer math) when calling sub-functions etc.

If necessary, I will definitely try to limit the dynamic allocation and deallocation of memory as much as possible.

However, at the moment I don't think memory operations are going to be a bottleneck. Especially that the game engine will ultimately preload as much data as possible into the memory so that everything is available during the game's operation. Mainly I mean map and object data, fully represented by Octree, which shouldn't require more than 1GB of memory. Operations on Octree shouldn't be too problematic either.

A performance problem will definitely arise in the case of rendering, because I want to use a multi-threaded purely software raytracing (for a very low resolution frame), where any saving of cycles will have a significant impact on performance. But rendering programming is still a long way off.
Lazarus 2.2.0 with FPC 3.2.2 (2022-01-02), Windows 10 — all 64-bit

PascalDragon

  • Hero Member
  • *****
  • Posts: 4137
  • Compiler Developer
Re: Useful optimizations for a video game project
« Reply #11 on: June 23, 2022, 01:57:05 pm »
Please note that this is different from passing -O3 on the command line. It's essentially equivalent to -Oolevel3. It's currently not possible to enable all optimizations that are part of a specific level in code. You need to enable each optimization you want by hand.

Interesting. From what I can see on the $OPTIMIZATION document, there is no information on this. What exactly do I have to do, which optimizations to declare additionally to get optimizations compatible with -O3? And besides, which ones to be interested in?

You simply list the desired optimizations as mentioned in the documentation to linked at. A list of supported optimizations is available when you do fpc -i. Please note that this list is specific to each CPU architecture.

I will test the source code thoroughly anyway, so that there are no unexpected exceptions and memory leaks caused by incorrectly written code. However, if in any case that I did not catch, exceptions were to occur, it would be better in the release version to use incorrect data (e.g. causing glitches) than for control flow to become unpredictable or for the process to be killed.

I personally prefer to kill the program than have it continue with incorrect data (that's why, if no SysUtils unit is used, the default is to simply terminate the application if an error occurred).

MathMan

  • Sr. Member
  • ****
  • Posts: 262
Re: Useful optimizations for a video game project
« Reply #12 on: June 23, 2022, 02:58:56 pm »
I personally would also spend some/substantial time on looking at point [iii] from your list above. If possible try to isolate cases where you can get away with allocating one large block of mem initially and then organize this internally by passing pointers (plus the required pointer math) when calling sub-functions etc.

If necessary, I will definitely try to limit the dynamic allocation and deallocation of memory as much as possible.

However, at the moment I don't think memory operations are going to be a bottleneck. Especially that the game engine will ultimately preload as much data as possible into the memory so that everything is available during the game's operation. Mainly I mean map and object data, fully represented by Octree, which shouldn't require more than 1GB of memory. Operations on Octree shouldn't be too problematic either.

A performance problem will definitely arise in the case of rendering, because I want to use a multi-threaded purely software raytracing (for a very low resolution frame), where any saving of cycles will have a significant impact on performance. But rendering programming is still a long way off.

I can only state that I had some nasty surprises wrt dynamic memory allocation in some recursive procedures. Of course this was special as the allocations became smaller & smaller with each recursion level, but I was only able to get this to decent speeds after I completely removed allocs from the recursive procedure.

Regarding ray-tracing - that really depends on what you define as "very low resolution". I would assume that you'll need some thight kernel here programmed in asm which fully uses AVX2 / AVX512 (or comparable) capabilities to get somewhere.

Cheers,
MathMan

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 971
Re: Useful optimizations for a video game project
« Reply #13 on: June 23, 2022, 04:46:54 pm »
Optimize your time budget and don't try to optimize the 90% of your code that is not time-critical.

furious programming

  • Hero Member
  • *****
  • Posts: 578
  • I click a little.
    • TreeStructInfo — format for text and binary configuration files
Re: Useful optimizations for a video game project
« Reply #14 on: June 23, 2022, 05:32:52 pm »
You simply list the desired optimizations as mentioned in the documentation to linked at. A list of supported optimizations is available when you do fpc -i. Please note that this list is specific to each CPU architecture.

I checked this option and there are many features available, thanks. It is interesting that there are more instruction sets available:

Code: Pascal  [Select][+][-]
  1. Supported CPU instruction sets:
  2.   ATHLON64,COREI,COREAVX,COREAVX2

but in the project settings window I only have ATHLON64. Weird.

I personally prefer to kill the program than have it continue with incorrect data (that's why, if no SysUtils unit is used, the default is to simply terminate the application if an error occurred).

For now, I do not anticipate any unexpected errors, I will try to write the code so that it does not cause exceptions. If so, the most sensible solution will be selected.



Regarding ray-tracing - that really depends on what you define as "very low resolution".

The internal back buffer will have a resolution of 288×240 pixels, which will require a color calculation for 69,120 pixels in each game frame (using as many threads as there are logical processors). The game will ultimately use pixelart graphics, i.e. it will be in a retro style. For now, it's hard to say if I can achieve enough performance to take advantage of software ray-tracing on low-end PCs (which I care about), so standard rasterization will be the default rendering method (faster but poorer).

Quote
I would assume that you'll need some thight kernel here programmed in asm which fully uses AVX2 / AVX512 (or comparable) capabilities to get somewhere.

I can always use calculations only on integers (as in the good old days), because high precision of calculations will not be required — after all, the image will be highly pixelated. But there will be time for that.



Optimize your time budget and don't try to optimize the 90% of your code that is not time-critical.

Good point. I don't have a code like this yet, but the sooner I find out about the possibilities, the more time I will save in the future. Thanks for the answers.
Lazarus 2.2.0 with FPC 3.2.2 (2022-01-02), Windows 10 — all 64-bit

 

TinyPortal © 2005-2018