But, if you know there are no exceptions, and if you do use managed types (AnsiString / dyn array)
But, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.
Avoid managed types, if you don't need them.
Choose a modern cpu type (CoreAvx or CoreI) if you don't need to support older cpu. (project options / target)
For "SomeFoo[LoopCounter]" => use a pointer, and increase the pointer to the next item.
(though that one is in some cases redundant on modern cpu)
You mentioned using GetMem, FreeMem and other heap functions to manage memory. Very significant gains can be had by doing your own memory management.
This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?
I don't have these CPU types in the project settings window. Initially, I only care about modern, 64-bit processors. Older processors (including 32-bit) will certainly not be supported.
-Cp<x> Select instruction set; see fpc -i or fpc -ic for possible values
I learned a lot of interesting things after watching the video "How I program C" (https://www.youtube.com/watch?v=443UNeGrFoM) by Eskil Steenberg. I highly recommend listening — it doesn't matter it's C, because in Pascal we have the same.I watched a little over an hour of it and will watch the rest later (I don't have the time right now) but, I agree with you. He really gives a lot of good advice (advice I've been giving for a very long time!... and nobody listens... :D)
{$OPTIMIZATION LEVEL3}
QuoteBut, if an exception occurs this will leak memory. And as memory gets eaten up, side effects will get noticeable.
This is intresting. Ideally, an exception should not be created at all. I wonder what happens if a program tries to perform an illegal operation — for example, accessing via an empty pointer or dividing by 0. Can you do this so that this operation does not cause any error and does not change the flow of control?
I'm working on a (ultimately) large video game project. ...
- the source code is procedural, using only simple records,
- data is passed using pointers (same as in SDL API),
- memory is allocated and deallocated manually (GetMem, AllocMem, ReallocMem and FreeMem are used everywhere),
- data is encapsulated in records, and access to them is only possible through getters and setters (as global functions), so I often use inline.
Fpc 64 bit 3.2.0 to 3.3.1 on Windows all show
ATHLON64 COREI COREAVX COREAVX2
Do you follow the fpc mail list? Just in case, there have been various optimizer improvements in fpc 3.3.1
Using a couple of WPO cycles can speed up code too. And that is also true for procedural programming, although less so.
Please note that this is different from passing -O3 on the command line. It's essentially equivalent to -Oolevel3. It's currently not possible to enable all optimizations that are part of a specific level in code. You need to enable each optimization you want by hand.
(but note that there'll always be an exception triggered by the processor and handled by the OS, the only part you can influence is how it's handled inside your program).
I personally would also spend some/substantial time on looking at point [iii] from your list above. If possible try to isolate cases where you can get away with allocating one large block of mem initially and then organize this internally by passing pointers (plus the required pointer math) when calling sub-functions etc.
Please note that this is different from passing -O3 on the command line. It's essentially equivalent to -Oolevel3. It's currently not possible to enable all optimizations that are part of a specific level in code. You need to enable each optimization you want by hand.
Interesting. From what I can see on the $OPTIMIZATION (https://www.freepascal.org/docs-html/prog/progsu58.html) document, there is no information on this. What exactly do I have to do, which optimizations to declare additionally to get optimizations compatible with -O3? And besides, which ones to be interested in?
I will test the source code thoroughly anyway, so that there are no unexpected exceptions and memory leaks caused by incorrectly written code. However, if in any case that I did not catch, exceptions were to occur, it would be better in the release version to use incorrect data (e.g. causing glitches) than for control flow to become unpredictable or for the process to be killed.
I personally would also spend some/substantial time on looking at point [iii] from your list above. If possible try to isolate cases where you can get away with allocating one large block of mem initially and then organize this internally by passing pointers (plus the required pointer math) when calling sub-functions etc.
If necessary, I will definitely try to limit the dynamic allocation and deallocation of memory as much as possible.
However, at the moment I don't think memory operations are going to be a bottleneck. Especially that the game engine will ultimately preload as much data as possible into the memory so that everything is available during the game's operation. Mainly I mean map and object data, fully represented by Octree, which shouldn't require more than 1GB of memory. Operations on Octree shouldn't be too problematic either.
A performance problem will definitely arise in the case of rendering, because I want to use a multi-threaded purely software raytracing (for a very low resolution frame), where any saving of cycles will have a significant impact on performance. But rendering programming is still a long way off.
You simply list the desired optimizations as mentioned in the documentation to linked at. A list of supported optimizations is available when you do fpc -i. Please note that this list is specific to each CPU architecture.
I personally prefer to kill the program than have it continue with incorrect data (that's why, if no SysUtils unit is used, the default is to simply terminate the application if an error occurred).
Regarding ray-tracing - that really depends on what you define as "very low resolution".
I would assume that you'll need some thight kernel here programmed in asm which fully uses AVX2 / AVX512 (or comparable) capabilities to get somewhere.
Optimize your time budget and don't try to optimize the 90% of your code that is not time-critical.
QuoteI would assume that you'll need some thight kernel here programmed in asm which fully uses AVX2 / AVX512 (or comparable) capabilities to get somewhere.
I can always use calculations only on integers (as in the good old days), because high precision of calculations will not be required — after all, the image will be highly pixelated. But there will be time for that.
Thank you very much for the example. I will definitely check this trick in the future.I can't either reproduce the results.
But I just tested your test program on my Intel® Core™ i7-640LM (https://ark.intel.com/content/www/us/en/ark/products/43563/intel-core-i7640lm-processor-4m-cache-2-13-ghz.html) (which is quite old) and I can't reproduce your results.
Thus if you have multiple, equivalent integer operations that can be done in parallel (e.g. adding a vector) you can utilize SIMD.
What is strange is that my times are lower than those of Martin on a fairly low range laptop (and also my desktop).
What is strange is that my times are lower than those of Martin on a fairly low range laptop (and also my desktop).It seems, while I did -O3 (which afaik includes -Or), I also left other stuff at defaults. Mainly -Criot - that takes time.
Just wondering what type of game it is?