Forum > Games

Useful optimizations for a video game project

(1/5) > >>

furious programming:
I'm working on a (ultimately) large video game project. Currently, I have defined various optimizations for different build modes and in release mode I would like to have enough and such optimizations to make the output machine code run as fast as possible (the size of the executable file and memory consumption is negligible). Some information about the project:

* the source code is procedural, using only simple records,
* data is passed using pointers (same as in SDL API),
* memory is allocated and deallocated manually (GetMem, AllocMem, ReallocMem and FreeMem are used everywhere),
* data is encapsulated in records, and access to them is only possible through getters and setters (as global functions), so I often use inline.I am currently using strong optimizations for the release mode (level 3):


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---{$IFDEF GAME_BUILD_DEBUG}  // debug mode settings (not important){$ELSE}  {$INLINE       ON}  {$SMARTLINK    ON}  {$OPTIMIZATION LEVEL3}   {$S-}  {$IOCHECKS       OFF}  {$RANGECHECKS    OFF}  {$OVERFLOWCHECKS OFF}  {$ASSERTIONS     OFF}  {$OBJECTCHECKS   OFF}{$ENDIF}
I know that in addition to the above, there are many different other optimizations that you can add yourself to the code, and some of them may be useful in my case, given the specifics of the project. Anyone have any idea what else can be unlocked to make the resulting machine code faster?

Note that I am asking generally (ahead of time), not because I currently have slow code and with compiler optimizations I want to speed it up. If anyone has additional questions, feel free to ask.

Martin_fr:
First of all, the most optimization potential lies in clever design of your code....

But, if you know there are no exceptions, and if you do use managed types (AnsiString / dyn array)

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---{$ImplicitExceptions off}
But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.

Avoid managed types, if you don't need them.

Choose a modern cpu type (CoreAvx or CoreI) if you don't need to support older cpu. (project options / target)

If you have a tight loop, in the middle of a long(er) routine, move the loop into a sub-routine (inlined) and call it.

At least in the past, this has sometimes helped the optimizer to do a better job with register allocations.


The classic: Move calculation out of the loop. Pre-calculate partial expressions, and only keep parts in the loop that depend on the loop counter.

For "SomeFoo[LoopCounter]" => use a pointer, and increase the pointer to the next item.
(though that one is in some cases redundant on modern cpu)


The very tricky bit, if you can align the start of small, but high-iteration-count loops to a 32 bit boundary => that can gain/loose a 2 digit percentage in speed.

Unfortunately, even functions are only aligned 16 bytes.

I have myself benchmarked code in the past. And just by changing the order in which procedures were declared (no other change), the speed varied by almost 20% to 30%.

At least on Intel. Because intel has some caches (IIRC for micro-code), that rely on 32 byte bounds.
So if you iterate some 1000 times over a loop, and if that loop has 32 or 64 bytes of code, then it runs fastest if it starts exactly on a 32 byte bound.

Unfortunately there is no option to enforce this. Maybe it can be done with asm blocks.

440bx:
As Martin_fr stated, most really significant speed gains come from the design of the code.

You mentioned using GetMem, FreeMem and other heap functions to manage memory.  Very significant gains can be had by doing your own memory management. 

This means allocating your own blocks based on the usage of the data and carving the block yourself.  With the proper design, it is often possible to remove the need for critical sections (or other synch method) to allocate and deallocate memory blocks. 

Depending on how often the program needs to allocate and free memory blocks, doing your own management can make a very noticeable difference but, that requires memory allocation/deallocation design upfront designed specifically to accommodate the application's needs throughout its execution.

Depending on how you implement it, there can be another very significant advantage.  If every block to be carved is requested directly from the O/S (instead of a heap) then, an external memory viewer can be used to inspect the blocks.  When debugging, this makes memory inspection independent of the current instruction, i.e, the pointers to blocks don't have to be in the current scope, once you know the address, they can always be inspected using an external memory viewer.

HTH.

furious programming:
Thanks for the answers and advice.


--- Quote from: Martin_fr on June 22, 2022, 07:00:25 pm ---But, if you know there are no exceptions, and if you do use managed types (AnsiString / dyn array)
--- End quote ---

I do not need exceptions at all — everything will be handled by error codes (like in SDL or pure WinAPI), so it would be good to remove everything related to exceptions from the code. There are appropriate tools (such as debug mode or HeapTrc unit) to check the correct operation of the code, and the release should be as effective as possible, without unnecessary instructions and additional checks.


--- Quote ---But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.
--- End quote ---

This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?


--- Quote ---Avoid managed types, if you don't need them.
--- End quote ---

Theoretically, I might not use them, but I have a problem with strings. While SDL uses C-style strings (PAnsiChar only), it becomes a bit of a problem to use them — especially when it comes to concatenating and converting them. There are few built-in functions to support them, and virtually none to convert.


--- Quote ---Choose a modern cpu type (CoreAvx or CoreI) if you don't need to support older cpu. (project options / target)
--- End quote ---

I don't have these CPU types in the project settings window. Initially, I only care about modern, 64-bit processors. Older processors (including 32-bit) will certainly not be supported.


--- Quote ---For "SomeFoo[LoopCounter]" => use a pointer, and increase the pointer to the next item.
(though that one is in some cases redundant on modern cpu)
--- End quote ---

I will be running a lot of tests and choosing the best solutions. Iterated pointer access is actually faster than indexed access — I tested it some time ago. I learned a lot of interesting things after watching the video "How I program C" by Eskil Steenberg. I highly recommend listening — it doesn't matter it's C, because in Pascal we have the same.



--- Quote from: 440bx on June 22, 2022, 09:20:14 pm ---You mentioned using GetMem, FreeMem and other heap functions to manage memory.  Very significant gains can be had by doing your own memory management.
--- End quote ---

I believe, but I'd rather focus on writing the right code for the project and not go that low. I don't know if something like this will be needed in my case.

Martin_fr:

--- Quote ---This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?
--- End quote ---

"div 0" can be caught as exception. And I think even some access violations can, but not sure.
But in any case, I usually care not to have those, rather than would my code still work if I had them.

If you don't use "raise" and don't have any "try except" blocks (including not doing/having by whatever any code does, that you use from frameworks etc) then "{$ImplicitExceptions off}" should be ok.


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure foo;var s: ansistring;begin  s:= getVal;  // do some stuffend;
Fpc will insert code at the end of that procedure to do "s := ''" => i.e decrease the ref-count, and free the mem of the string, if not hold by other variables.

That will always happen.

But Fpc also encapsulates the entire procedure into an "try finally" block, to make sure "s" is freed, even if an exception occurred.
And with the "{$ImplicitExceptions off}" the "try finally" is not inserted.



Navigation

[0] Message Index

[#] Next page

Go to full version