Lazarus
Free Pascal => General => Topic started by: shihonage on May 31, 2016, 10:38:43 pm
-
I've searched high and low on the Internet, and can't find a single coherent source of information on how would FPC behave, say, in having to render 1000 sprites via a loop using SDL, against C++ doing the same. Very simple, direct measure of FPC+SDL performance.
Does anyone have experience with something like this?
I have 2 project prototypes, one of which is far better suited for Pascal (because it processes a lot of text), and the other is very heavy on sprite drawing, which was written in C/C++ and DirectDraw a long time ago.
I wish there was a way to find out how the latter would behave in FPC+SDL, without actually porting it first.
-
Theoretically, FreePascal is a better fit for games because it compiles quickly(separate compilation).
But everyone uses C++ because it has libraries that resolved problems in memory management deficiency, in some cases a long time ago.
Many experts working at corporations were contributing their work to C++ code-base during past decade.
If something is slow in a compiled language, first guess is that memory is managed not efficiently behind the scenes.
-
Well, first C++ is a language. I think here you will need to pitch various C++ implementations vs FPC.
In general I expect FPC to be a bit less, but not much (say 10-20% on average). In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.
I don't recognize AlexK remarks at all. To my best knowledge, there is nothing fundamentally different about C(++) and fpc memory management from a performance viewpoint.
-
Modern games use highly optimized engine/framework for most scene manipulations and drawing operations, but other game mechanics can be done even on scripting languages - Lua, Python, JavaScript..
-
SDL is not really going to give as much performance as OpenGL or DirectX (not crossplatform). It's a purely 2D graphics engine for pixel-perfect graphics. Both of above can be programmed with Lazarus/fpc, arguably even easier than in C++.
-
I don't recognize AlexK remarks at all. To my best knowledge, there is nothing fundamentally different about C(++) and fpc memory management from a performance viewpoint.
Memory management strategies in libraries, optimized for years. RAII, more allocations on stack.
Topics in C++ discussions are often "What’s the Best Buffer Growth Strategy?", or something like that.
That difference in low level assembler performance is not significant(but I don't know yet how compilers differ on RISC CPUs like ARM).
-
I've searched high and low on the Internet, and can't find a single coherent source of information on how would FPC behave, say, in having to render 1000 sprites via a loop using SDL, against C++ doing the same. Very simple, direct measure of FPC+SDL performance.
SDL has nothing to do with performance and IMHO, unless you are absolutely clueless in programming, you should avoid it.
Speaking of performance, I have recompiled several projects of ours in FPC (before used Delphi), including Aztlan Dreams and Wicked Defense 2 games (unfortunately, our Ixchel Studios site is down now, but I'll see into republishing them later this year). The second one, being a 3D game, in some scenes, have been rendering around 2 million triangles per frame and that ran at pretty stable 60 FPS (vsync on) on 2005 - 2006 old 3D hardware using Direct3D 9 and shader version 3 (using heavy hardware instancing). We are talking about > 120 million triangles per second here. On the other hand, I have seen *many* games compiled with C++ that can't handle more than 200,000 triangles per frame while doing excruciating 5 FPS or less.
My point is, performance-wise, there is nothing inherently wrong in FPC regarding code generation - as long as you keep your algorithms optimal, you should obtain whatever results you are aiming for. If for some reason you have to think about performance in code generation between FPC and other compilers, you are either thinking of premature optimization, or just have some personal issues of insecurity or whatever - in this case, changing career might be a good option. :)
There are other reasons why C++ can be chosen as a language of preference, but mostly these are not performance-related. For instance, in my case, I use it because my colleagues and other people in the project use it, and because it has many compiler options (specifically, MSVC, GCC and Clang/LLVM) supported by large companies and big budget, whereas in case of FPC, you are pretty much stuck at good will and good health of remaining FPC development guys, with all the associated risks involved.
-
SDL is not really going to give as much performance as OpenGL or DirectX (not crossplatform). It's a purely 2D graphics engine for pixel-perfect graphics.
They are not mutually exclusive though. SDL provides sounds, events.
OpenGL works on data you pass to it, in graphical memory. Shaders are text snippets that are passed to OpenGL and compiled on the fly by OpenGL's builtin shader compiler.
I worked with OpenGL in Python, Kivy.
-
Thank you all for your feedback, it was very informative.
@ykot: What do you mean? SDL has poor performance in 2D compared to other things? What would be a faster alternative for a 2D game that wants to run on Windows and Linux?
-
I wish there was a way to find out how the latter would behave in FPC+SDL, without actually porting it first.
Well... there is no way even after porting it... :) You can't compare Linux vs Windows performance... They're too different. One does better one task, and another task is solved better by its adversary.
In Castle Game Engine i get ~60 FPS with ~400-800 sprites (rendered in non-optimized way) at 5-10 years old computer and ~30 FPS at Android tablet and 15 years-old computer.
I think its not efficiency but convenience is the main problem. C(++) was used to make games for a very long time and many games are actually written in C. That means you can find lot's of ready libraries, patterns, examples, answers, tutorials and support. Many C tools have been proven by time and excessive use.
Many game engines use C++,C#,Java,Python, etc. as scripting language, but you can hardly find any quality and actively developed FPC game engines but GLScene and Castle Game Engine, afaik, none of those can provide you with drag-n-drop interface to create games like Unity (C# based), you'll have to do everything by code.
So... I like pascal. I write my games in pascal. But I understand, that usually people don't write games in pascal and there are good reasons for it :)
I do it for fun, so, I don't care about obtaining the highest quality and productivity - I want fun :) But if I'd had to do a commercial-level game I might would have changed my approach.
-
Well, I never really use other people's engines, so that's not the issue. It's just about performance for me... because in other areas, Pascal wins over C++ by being better at diagnostics, more intuitive, easier to read, tons of convenience functions for pretty much everything...
-
Well, I never really use other people's engines, so that's not the issue.
I was also thinking this way... until I've found out that my life it too short and if I want to finish anything, I'll need to speed up the process significantly :) Engines usually more portable (e.g. in Castle Game Engine you don't have to rewrite the code to port it to Android - my today's result :D), more optimal (people spend a lot of time on developing the structure and optimizations), less buggy (as they were tested and bug-reported on different platforms). So... practically not using other engine is using your own engine. It can be better and convenient, but it would definitely take much more time for coding and reading docs.
So... yes. If you need just a 1000 sprites you can do it by OpenGL and that'll do. If you want to extend in 3D with animations, shadows, etc... It'll take too much time to do it from a scratch.
-
I don't do 3D. For that I would indeed need someone else's engine. But for my unique purposes, I need extensive amounts of 2D blitting, and this is where another engine, which is an unneeded abstraction, would start clogging up the much-needed performance.
-
If you're not doing a commercial project, you may try to write your on graphics engine using OpenGL API, which is fast that can handle thousands of polygons even on old pcs. OpenGL isn't too hard to learn, its information is easy to find on the web. And it's fun to write your own engine which you can optimize for your need.
I'm writing my own OpenGL visual control. It's cross platform, can be used on Linux and Windows. Now, I'm making it to be able to use on Android OpenGL ES. If I were doing for commercial project, I will pick any ready-made game/graphics engine. But because it's just for hobby, I enjoy the fun and learning process.
-
In general I expect FPC to be a bit less, but not much (say 10-20% on average). In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.
But game is not just rendering. Game elements must have structure - arrays, classes, lists. In every frame is calculated lot of things (physics, states, AI), game loop must go through the lists etc. etc.
If you lose 20% in every aspect..
FPC classes are performance killer. What I test earlier.
E.g. there are 2 demos in ZenGL library:
- first is based on classes, nice clear code and really low performance.
- second is using records + array of pointers and it's really fast (but if it could compete with C ++ I dont know).
Modern games use highly optimized engine/framework for most scene manipulations and drawing operations, but other game mechanics can be done even on scripting languages - Lua, Python, JavaScript..
Not all, Unity is really slow. If you need speed or lot of objects, you must go outside Unity framework and create own object structure, game loop etc. Maybe rendering is OK but the rest is the same problem as is described above.
In addition, those scripts are not the best solution, it's always simplicity vs performance.
-
In general I expect FPC to be a bit less, but not much (say 10-20% on average). In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.
But game is not just rendering. Game elements must have structure - arrays, classes, lists. In every frame is calculated lot of things (physics, states, AI), game loop must go through the lists etc. etc.
If you lose 20% in every aspect..
FPC classes are performance killer. What I test earlier.
E.g. there are 2 demos in ZenGL library:
- first is based on classes, nice clear code and really low performance.
- second is using records + array of pointers and it's really fast (but if it could compete with C ++ I dont know).
If you start comparing dynamic memory management to statically allocated then of course it is a problem.
I however use classes all the time (but in speed sensitive code don't constantly create and destroy them, and that works fine.
But you do mention something on the side that deserves to be unambiguously stated: the more advanced C++ compilers are more likely to reduce naive C++ code into something better than FPC can, and (as the fpc-devel list show) can do very great things with tight loops with all data value types and local.
FPC misses various kinds of CSE and hoisting and hurts there. But it is all relative, and a matter of tuning the code a bit. And since the rate-determining code is usually fairly localized that is not that bad.
-
I'm working in my very own old-school 2D game engine, so I'm dealing with a lot of what you're talking about here.
Right now I don't care about performance a lot, I'm more concerned about engine structure fighting against the early-optimization illnes everytime I code anything. So I'm using a lot of classes (except one OBJECT that I actually don't know why I keep) and performance sucks. It works correctly in my not-so-modern 4core desktop but it is too slow in my older 2core notebook (for example, the runtime sound generator kills it).
I think I can boost performance a lot by using plain Pascal instead of Object Pascal, but I'll not even discuss it until version 1.0 is up and running with an actual commertial-quality game finished. That means the optimized version would be version 2.0 with radical API redesign that forces me to rewrite such first game from scratch. Or may be rewriting a few methods is enough and version 1.1 or 1.2 with minor changes in API are enough.
What I want to say is that bad performance of my engine isn't in the language but in the way I'm using it. Looking Castle Engine it uses a lot of CLASSes and has much better performance than mine, and this confirms it.
-
A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms. (thanks to many optimizations provided by Generic lists) + thread-safe (up to another 8x speed)
-
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
-
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
Uuhhhmmmm,Marco,
The Pascal shortstring type was by design more efficient than C type "strings". (for the average readers: early C did not know about strings as such, just an array of byte, early Pascal did..)
That's because its length is known to the compiler beforehand. And length is stored. And we have strings that can contain zero's.
And that is simply both a faster and a better design.
Unlest you forgot... O:-)
-
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
Uuhhhmmmm,Marco,
The Pascal shortstring type was by design more efficient than C type "strings". (for the average readers: early C did not know about strings as such, just an array of byte, early Pascal did..)
This depends on the workload and how much manual optimization was applied. I know this matter really well due to my Modula2 days. (which had an even worse concept, null terminated unless it fits exactly in the static allocation).
Pascal shortstrings were much easier to write, and performance not too shabby if you didn't forget to add const to all string parameters. The 255 limit was a bigger problem than the performance.
That's because its length is known to the compiler beforehand. And length is stored. And we have strings that can contain zero's.
But default by reference, and my point was that shortstring was not. And people not realizing made it a vfaq even.
-
Hm. I didn't know UCSD pascal had a const modifier for strings or any const modifier at all. I will investigate :D Oh, And turbo Pascal had not.
Since we are both Pascal historians in some way, please enlighten me if I am wrong?
And I fully agree with the last sentence you wrote,.
-
Hm. I didn't know UCSD pascal had a const modifier for strings or any const modifier at all. I will investigate :D Oh, And turbo Pascal had not.
If you had
procedure x(s:string);
instead of
procedure x(const s:string)
in TP, a copy of s on the stack would be created and then passed to s in the first case. In the second case it would be passed by reference.
-
Marco, TP version plz.
TP 1.0 doesn't know it... const... (get it from the Embarcadero museum)?
If I am wrong I will send you a nice bottle of proper wine.
-
If you start comparing dynamic memory management to statically allocated then of course it is a problem.
I however use classes all the time (but in speed sensitive code don't constantly create and destroy them, and that works fine.
I can test it more, also FPC vs basic code in C++. I'm not good programmer so there should be inaccuracies, anyway we will see.
What I want to say is that bad performance of my engine isn't in the language but in the way I'm using it. Looking Castle Engine it uses a lot of CLASSes and has much better performance than mine, and this confirms it.
Of course, this is on the shoulders of the every programmer :) I'm not sure about Castle Engine, there isn't any stress tests like "08 - Sprite Engine", "09 - Sprite Engine (Classes)" in ZenGL. Links at the end of the post.
How do you have structured this sound generator? Can you post code?
A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms.
But you don't handle 100 000 dynamic objects + add/remove them "on the fly" or you dont sort it etc., right? Thats the difference.
https://github.com/goldsmile/zengl/blob/master/src/zgl_sengine_2d.pas
https://github.com/goldsmile/zengl/blob/master/extra/zglSpriteEngine.pas (same functionality with Classes)
-
But you don't handle 100 000 dynamic objects + add/remove them "on the fly" or you dont sort it etc., right? Thats the difference.
Of course I do. Both add/remove on the fly and sorting.
But I do it in an optimized way. And, of course, not 100 000, but as many as I need :)
I could do the same optimizations with dynamic arrays or face dire memory consumption in static arrays. And yes, I am almost sure the code would run faster with just 1-dimensional static arrays. But the performance is perfectly fine for me.
I even doubt usefulness of thread-safety I've spent so much time to ensure. If the map is generated in fractions of seconds, why should I care "Fastening" it by using threads?
The efficiency bottleneck for me is HDD reading. I bet something could be optimized here also, but I don't have any staff, so I prefer to do something more fun :)
-
I've an application that loads 400000 classes and sort them in various indexes on startup in under a second. It would be 150-250ms though if I didn't handle string encodings.
Operations after loading are sub ms.
-
Ok guys, let's see the numbers from very simple FPC tests :)
Type
PItem = ^TItem;
TItem = record
id : integer;
x, y : integer;
end;
// Records
TManager = record
Count : integer;
List : array of TItem;
end;
// Classes
TCItem = Class
id : integer;
x, y : integer;
end;
TCManager = Class
Count : integer;
List : array of TCItem;
end;
// Pointers to record
PPManager = ^TPManager;
TPManager = record
Count : integer;
List : array of PItem;
end;
FILL TEST: [1000 items]
Records: 0,0000256353031299197 s
Classes: 0,000318480942414061 s
Pointers: 0,000087160030641727 s
QUICKSORT TEST: [1000 items]
Records: 0,000342608286536339 s
Classes: 0,000292845639284142 s
Pointers: 0,000272940580383263 s
FREE MEM TEST: [1000 items]
Records: 0,0000229209769161635 s
Classes: 0,000117922394397631 s
Pointers: 0,0000799218274050438 s
------------------------------------------------------
FILL TEST: [100000 items]
Records: 0,000991690832870043 s
Classes: 0,0151803825522516 s
Pointers: 0,00582843748591348 s
QUICKSORT TEST: [100000 items]
Records: 0,010936546812315 s
Classes: 0,0120403287605367 s
Pointers: 0,00923714933962406 s
FREE MEM TEST: [100000 items]
Records: 0,000565564287109522 s
Classes: 0,00705573001968356 s
Pointers: 0,00362237615133803 s
------------------------------------------------------
FILL TEST: [10000000 items]
Records: 0,0609045929150976 s
Classes: 0,671325713148588 s
Pointers: 0,2505645791059 s
QUICKSORT TEST: [10000000 items]
Records: 1,44320108061496 s
Classes: 3,1377283567959 s
Pointers: 2,63237050861869 s
FREE MEM TEST: [10000000 items]
Records: 0,0520024844495768 s
Classes: 1,54121744059482 s
Pointers: 1,47699562507726 s
So, some differences are there :) Later I will add something what would simulate a game.
I expected a better performance in "array of records" vs "array of pointers". Maybe there is a mistake somewhere.
Can test it please someone with older PC? (these numbers are from i7 3.40 GHz)
-
Can test it please someone with older PC? (these numbers are from i7 3.40 GHz)
If you can send me a still working 8## floppy disk I can. My hardware is still working.
-
Actually, I still have some, what is the address?
-
Test results from my ~13 year old AMD Athlon 3500+, compiled and run from the FPC IDE
■ Free Pascal IDE Version 1.0.12 [2015/11/16]
■ Compiler Version 3.0.0
■ GDB Version GDB 7.4
■ Using configuration files from: D:\speed\
Running "d:\speed\project1.exe "
FILL TEST: [10000000 items]
Records: 0,248648573253774 s
Classes: 2,0375213753895 s
Pointers: 0,9052851958167 s
QUICKSORT TEST: [10000000 items]
Records: 2,58236324632539 s
Classes: 12,3440482267842 s
Pointers: 9,90768453970034 s
FREE MEM TEST: [10000000 items]
Records: 0,155206947245678 s
Classes: 5,768513388735 s
Pointers: 4,44650447871946 s
The majority of time for the Class FILL test is obviously used up by creating the Classes, not by setting the variables. Of note is further that Class variables are automatically initialised with zero on Create, which is not the case for standard records.
Anyway, I am not sure whether anyone would use such atomic Classes in real life.
-
Thank you Nitorami, of course I will try to make them bigger with some functionality.
-
A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms. (thanks to many optimizations provided by Generic lists) + thread-safe (up to another 8x speed)
An that's why I'm against early-optimization. I mean, early-optimizations require early-assumptions (for example: plain Pascal is faster than Object Pascal) and that would be wrong. (I would be ;))
I must investigate about those "optimizations provided by Generic lists". :-X
How do you have structured this sound generator? Can you post code?
Of course I can. You can read it from my SourceForge SVN:
https://sourceforge.net/p/mingro/code/HEAD/tree/TRUNK/src/engine/mngsound.pas
The runtime generator is the TmngPSG CLASS. Sorry for mixing English and Spanish in both comments and naming. I was recycling some old code. Anyway public stuff is all in English.
-
I must investigate about those "optimizations provided by Generic lists". :-X
It was very algorithm-specific.
With the first algorithm I was randomly scanning the array.
The next gen used temporary generic list of the tiles that were applicable which appeared many orders more efficient than just randomly trying to add the tiles.
The core is here: https://github.com/eugeneloza/decoherence/blob/master/decodungeongenerator.pas#L478
-
The example with the loading of objects originally used sorted TStringlist as index. Between 100000 and 200000 that ground to a halt (D6, in 2003) (as in minute + loading times)
I then created a different container type, which I ported to generics (FPC 3.0/3.1 and D2009+) and simplified streaming and it became single digit seconds.
We actually had quite a laugh when a ISV came for official bill generation and after 4 hours the Java application crashed, and they had to restart and said there was not enough time left, and they would come back the next day.
Of course that was a generic application vs a tuned one, but my boss, ever the politician pointed out to them that his own programmers could do it under a minute (the actual I/O of the report generation being the bottle neck there, this was pre SSD times). We laughed, they didn't :-)
One of the reasons it was so bad was that our dataset was 10-20 times bigger than the next biggest customer they had, which were typical single muncipalities, while we were an adminstration office that little muncipalities delegated the work to, that however in total amounted to a quarter of the households in the country.
The lesson was that ordered insertions in a one array structure is slow. The conventional solution is a hash, but I needed ordered lists (to compare dumps to another part of the system), so I created something myself
-
Between 100000 and 20000 that ground to a halt (D6, in 2003) (as in minute + loading times)
The final count down?
:'(
Note, regarding the subject, I have seen some really nice peephole optimizations by Florian the past few weeks.
At least two of these can have impact on gaming performance.