Lazarus

Free Pascal => General => Topic started by: shihonage on May 31, 2016, 10:38:43 pm

Title: Actual gaming performance of FPC?
Post by: shihonage on May 31, 2016, 10:38:43 pm
I've searched high and low on the Internet, and can't find a single coherent source of information on how would FPC behave, say, in having to render 1000 sprites via a loop using SDL, against C++ doing the same. Very simple, direct measure of FPC+SDL performance.

Does anyone have experience with something like this?

I have 2 project prototypes, one of which is far better suited for Pascal (because it processes a lot of text), and the other is very heavy on sprite drawing, which was written in C/C++ and DirectDraw a long time ago.

I wish there was a way to find out how the latter would behave in FPC+SDL, without actually porting it first.
Title: Re: Actual gaming performance of FPC?
Post by: AlexK on June 01, 2016, 01:27:36 am
Theoretically, FreePascal is a better fit for games because it compiles quickly(separate compilation).

But everyone uses C++ because it has libraries that resolved problems in memory management deficiency, in some cases a long time ago.
Many experts working at corporations were contributing their work to C++ code-base during past decade.

If something is slow in a compiled language, first guess is that memory is managed not efficiently behind the scenes.
Title: Re: Actual gaming performance of FPC?
Post by: marcov on June 01, 2016, 10:36:47 am
Well, first C++ is a language. I think here you will need to pitch various C++ implementations vs FPC.

In general I expect FPC to be a bit less, but not much (say 10-20% on average).  In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.

I don't recognize AlexK remarks at all. To my best knowledge, there is nothing fundamentally different about C(++) and fpc memory management from a performance viewpoint.
Title: Re: Actual gaming performance of FPC?
Post by: serbod on June 01, 2016, 11:16:11 am
Modern games use highly optimized engine/framework for most scene manipulations and drawing operations, but other game mechanics can be done even on scripting languages - Lua, Python, JavaScript..
Title: Re: Actual gaming performance of FPC?
Post by: User137 on June 01, 2016, 05:58:10 pm
SDL is not really going to give as much performance as OpenGL or DirectX (not crossplatform). It's a purely 2D graphics engine for pixel-perfect graphics. Both of above can be programmed with Lazarus/fpc, arguably even easier than in C++.
Title: Re: Actual gaming performance of FPC?
Post by: AlexK on June 01, 2016, 10:23:08 pm
I don't recognize AlexK remarks at all. To my best knowledge, there is nothing fundamentally different about C(++) and fpc memory management from a performance viewpoint.

Memory management strategies in libraries, optimized for years. RAII, more allocations on stack.
Topics in C++ discussions are often "What’s the Best Buffer Growth Strategy?", or something like that.

That difference in low level assembler performance is not significant(but I don't know yet how compilers differ on RISC CPUs like ARM).
Title: Re: Actual gaming performance of FPC?
Post by: ykot on June 02, 2016, 12:40:09 am
I've searched high and low on the Internet, and can't find a single coherent source of information on how would FPC behave, say, in having to render 1000 sprites via a loop using SDL, against C++ doing the same. Very simple, direct measure of FPC+SDL performance.
SDL has nothing to do with performance and IMHO, unless you are absolutely clueless in programming, you should avoid it.

Speaking of performance, I have recompiled several projects of ours in FPC (before used Delphi), including Aztlan Dreams and Wicked Defense 2 games (unfortunately, our Ixchel Studios site is down now, but I'll see into republishing them later this year). The second one, being a 3D game, in some scenes, have been rendering around 2 million triangles per frame and that ran at pretty stable 60 FPS (vsync on) on 2005 - 2006 old 3D hardware using Direct3D 9 and shader version 3 (using heavy hardware instancing). We are talking about > 120 million triangles per second here. On the other hand, I have seen *many* games compiled with C++ that can't handle more than 200,000 triangles per frame while doing excruciating 5 FPS or less.

My point is, performance-wise, there is nothing inherently wrong in FPC regarding code generation - as long as you keep your algorithms optimal, you should obtain whatever results you are aiming for. If for some reason you have to think about performance in code generation between FPC and other compilers, you are either thinking of premature optimization, or just have some personal issues of insecurity or whatever - in this case, changing career might be a good option. :)

There are other reasons why C++ can be chosen as a language of preference, but mostly these are not performance-related. For instance, in my case, I use it because my colleagues and other people in the project use it, and because it has many compiler options (specifically, MSVC, GCC and Clang/LLVM) supported by large companies and big budget, whereas in case of FPC, you are pretty much stuck at good will and good health of remaining FPC development guys, with all the associated risks involved.
Title: Re: Actual gaming performance of FPC?
Post by: AlexK on June 02, 2016, 01:51:52 am
SDL is not really going to give as much performance as OpenGL or DirectX (not crossplatform). It's a purely 2D graphics engine for pixel-perfect graphics.

They are not mutually exclusive though. SDL provides sounds, events.
OpenGL works on data you pass to it, in graphical memory. Shaders are text snippets that are passed to OpenGL and compiled on the fly by OpenGL's builtin shader compiler.
I worked with OpenGL in Python, Kivy.
Title: Re: Actual gaming performance of FPC?
Post by: shihonage on June 02, 2016, 10:02:10 am
Thank you all for your feedback, it was very informative.

@ykot: What do you mean? SDL has poor performance in 2D compared to other things? What would be a faster alternative for a 2D game that wants to run on Windows and Linux?

Title: Re: Actual gaming performance of FPC?
Post by: Eugene Loza on June 02, 2016, 11:13:09 am
I wish there was a way to find out how the latter would behave in FPC+SDL, without actually porting it first.
Well... there is no way even after porting it... :) You can't compare Linux vs Windows performance... They're too different. One does better one task, and another task is solved better by its adversary.

In Castle Game Engine i get ~60 FPS with ~400-800 sprites (rendered in non-optimized way) at 5-10 years old computer and ~30 FPS at Android tablet and 15 years-old computer.
I think its not efficiency but convenience is the main problem. C(++) was used to make games for a very long time and many games are actually written in C. That means you can find lot's of ready libraries, patterns, examples, answers, tutorials and support. Many C tools have been proven by time and excessive use.
Many game engines use C++,C#,Java,Python, etc. as scripting language, but you can hardly find any quality and actively developed FPC game engines but GLScene and Castle Game Engine, afaik, none of those can provide you with drag-n-drop interface to create games like Unity (C# based), you'll have to do everything by code.

So... I like pascal. I write my games in pascal. But I understand, that usually people don't write games in pascal and there are good reasons for it :)
I do it for fun, so, I don't care about obtaining the highest quality and productivity - I want fun :) But if I'd had to do a commercial-level game I might would have changed my approach.
Title: Re: Actual gaming performance of FPC?
Post by: shihonage on June 02, 2016, 12:00:38 pm
Well, I never really use other people's engines, so that's not the issue. It's just about performance for me... because in other areas, Pascal wins over C++ by being better at diagnostics, more intuitive, easier to read, tons of convenience functions for pretty much everything...

Title: Re: Actual gaming performance of FPC?
Post by: Eugene Loza on June 02, 2016, 12:10:06 pm
Well, I never really use other people's engines, so that's not the issue.
I was also thinking this way... until I've found out that my life it too short and if I want to finish anything, I'll need to speed up the process significantly :) Engines usually more portable (e.g. in Castle Game Engine you don't have to rewrite the code to port it to Android - my today's result :D), more optimal (people spend a lot of time on developing the structure and optimizations), less buggy (as they were tested and bug-reported on different platforms). So... practically not using other engine is using your own engine. It can be better and convenient, but it would definitely take much more time for coding and reading docs.
So... yes. If you need just a 1000 sprites you can do it by OpenGL and that'll do. If you want to extend in 3D with animations, shadows, etc... It'll take too much time to do it from a scratch.
Title: Re: Actual gaming performance of FPC?
Post by: shihonage on June 02, 2016, 12:29:50 pm
I don't do 3D. For that I would indeed need someone else's engine. But for my unique purposes, I need extensive amounts of 2D blitting, and this is where another engine, which is an unneeded abstraction, would start clogging up the much-needed performance.
Title: Re: Actual gaming performance of FPC?
Post by: Handoko on June 02, 2016, 12:52:43 pm
If you're not doing a commercial project, you may try to write your on graphics engine using OpenGL API, which is fast that can handle thousands of polygons even on old pcs. OpenGL isn't too hard to learn, its information is easy to find on the web. And it's fun to write your own engine which you can optimize for your need.

I'm writing my own OpenGL visual control. It's cross platform, can be used on Linux and Windows. Now, I'm making it to be able to use on Android OpenGL ES. If I were doing for commercial project, I will pick any ready-made game/graphics engine. But because it's just for hobby, I enjoy the fun and learning process.
Title: Re: Actual gaming performance of FPC?
Post by: Paul_ on May 23, 2017, 02:36:05 pm
In general I expect FPC to be a bit less, but not much (say 10-20% on average).  In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.

But game is not just rendering. Game elements must have structure - arrays, classes, lists. In every frame is calculated lot of things (physics, states, AI), game loop must go through the lists etc. etc.
If you lose 20% in every aspect..

FPC classes are performance killer. What I test earlier.

E.g. there are 2 demos in ZenGL library:
- first is based on classes, nice clear code and really low performance.
- second is using records + array of pointers and it's really fast (but if it could compete with C ++ I dont know).


Modern games use highly optimized engine/framework for most scene manipulations and drawing operations, but other game mechanics can be done even on scripting languages - Lua, Python, JavaScript..

Not all, Unity is really slow. If you need speed or lot of objects, you must go outside Unity  framework and create own object structure, game loop etc. Maybe rendering is OK but the rest is the same problem as is described above.
In addition, those scripts are not the best solution, it's always simplicity vs performance.
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 23, 2017, 03:26:38 pm
In general I expect FPC to be a bit less, but not much (say 10-20% on average).  In applications where most of the heavily lifting is done by an external library (like SDL) it might even be not noticable.

But game is not just rendering. Game elements must have structure - arrays, classes, lists. In every frame is calculated lot of things (physics, states, AI), game loop must go through the lists etc. etc.
If you lose 20% in every aspect..

FPC classes are performance killer. What I test earlier.

E.g. there are 2 demos in ZenGL library:
- first is based on classes, nice clear code and really low performance.
- second is using records + array of pointers and it's really fast (but if it could compete with C ++ I dont know).

If you start comparing dynamic memory management to statically allocated then of course it is a problem.

I however use classes all the time (but in speed sensitive code don't constantly create and destroy them, and  that works fine.

But you do mention something on the side that deserves to be unambiguously stated: the more advanced C++ compilers are more likely to reduce naive C++ code into something better than FPC can, and (as the fpc-devel list show) can do very great things with tight loops with all data value types and local.

FPC misses various kinds of CSE and hoisting and hurts there. But it is all relative, and a matter of tuning the code a bit. And since the rate-determining code is usually fairly localized that is not that bad.

Title: Re: Actual gaming performance of FPC?
Post by: Ñuño_Martínez on May 24, 2017, 11:04:07 am
I'm working in my very own old-school 2D game engine, so I'm dealing with a lot of what you're talking about here.

Right now I don't care about performance a lot, I'm more concerned about engine structure fighting against the early-optimization illnes everytime I code anything.  So I'm using a lot of classes (except one OBJECT that I actually don't know why I keep) and performance sucks. It works correctly in my not-so-modern 4core desktop but it is too slow in my older 2core notebook (for example, the runtime sound generator kills it).

I think I can boost performance a lot by using plain Pascal instead of Object Pascal, but I'll not even discuss it until version 1.0 is up and running with an actual commertial-quality game finished. That means the optimized version would be version 2.0 with radical API redesign that forces me to rewrite such first game from scratch.  Or may be rewriting a few methods is enough and version 1.1 or 1.2 with minor changes in API are enough.

What I want to say is that bad performance of my engine isn't in the language but in the way I'm using it.  Looking Castle Engine it uses a lot of CLASSes and has much better performance than mine, and this confirms it.
Title: Re: Actual gaming performance of FPC?
Post by: Eugene Loza on May 24, 2017, 01:31:59 pm
A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms. (thanks to many optimizations provided by Generic lists) + thread-safe (up to another 8x speed)
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 24, 2017, 03:27:24 pm
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
Title: Re: Actual gaming performance of FPC?
Post by: Thaddy on May 24, 2017, 04:13:12 pm
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
Uuhhhmmmm,Marco,
The Pascal shortstring type was by design more efficient than C type "strings". (for the average readers: early C did not know about strings as such, just an array of byte, early Pascal did..)
That's because its length is known to the compiler beforehand. And length is stored. And we have strings that can contain zero's.
And that is simply both a faster and a better design.
Unlest you forgot... O:-)
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 24, 2017, 04:30:22 pm
Naive value type code can be slow due to excessive copies. In ye old days this was the case with the stringtype; Shortstring
Uuhhhmmmm,Marco,
The Pascal shortstring type was by design more efficient than C type "strings". (for the average readers: early C did not know about strings as such, just an array of byte, early Pascal did..)

This depends on the workload and how much manual optimization was applied. I know this matter really well due to my Modula2 days. (which had an even worse concept, null terminated unless it fits exactly in the static allocation).

Pascal shortstrings were much easier to write, and performance not too shabby if you didn't forget to add const to all string parameters.   The 255 limit was a bigger problem than the performance.

Quote
That's because its length is known to the compiler beforehand. And length is stored. And we have strings that can contain zero's.

But default by reference, and my point was that shortstring was not. And people not realizing made it a vfaq even.

Title: Re: Actual gaming performance of FPC?
Post by: Thaddy on May 24, 2017, 04:33:53 pm
Hm. I didn't know UCSD pascal had a const modifier for strings or any const modifier at all. I will investigate  :D Oh, And turbo Pascal had not.
Since we are both Pascal historians in some way, please enlighten me if I am wrong?

And I fully agree with the last sentence you wrote,.
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 24, 2017, 04:45:26 pm
Hm. I didn't know UCSD pascal had a const modifier for strings or any const modifier at all. I will investigate  :D Oh, And turbo Pascal had not.

If you had

procedure x(s:string);

instead of

procedure x(const s:string)


in TP, a copy of s on the stack would be created and then passed to s in the first case. In the second case it would be passed by reference.
Title: Re: Actual gaming performance of FPC?
Post by: Thaddy on May 24, 2017, 05:09:43 pm
Marco, TP version plz.
TP 1.0 doesn't know it... const... (get it from the Embarcadero museum)?

If I am wrong I will send you a nice bottle of proper wine.
Title: Re: Actual gaming performance of FPC?
Post by: Paul_ on May 24, 2017, 05:22:45 pm
If you start comparing dynamic memory management to statically allocated then of course it is a problem.

I however use classes all the time (but in speed sensitive code don't constantly create and destroy them, and  that works fine.

I can test it more, also FPC vs basic code in C++. I'm not good programmer so there should be inaccuracies, anyway we will see.

What I want to say is that bad performance of my engine isn't in the language but in the way I'm using it.  Looking Castle Engine it uses a lot of CLASSes and has much better performance than mine, and this confirms it.

Of course, this is on the shoulders of the every programmer :) I'm not sure about Castle Engine, there isn't any stress tests like "08 - Sprite Engine", "09 - Sprite Engine (Classes)" in ZenGL. Links at the end of the post.

How do you have structured this sound generator? Can you post code?

A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms.

But you don't handle 100 000 dynamic objects + add/remove them "on the fly" or you dont sort it etc., right? Thats the difference.


https://github.com/goldsmile/zengl/blob/master/src/zgl_sengine_2d.pas
https://github.com/goldsmile/zengl/blob/master/extra/zglSpriteEngine.pas (same functionality with Classes)
Title: Re: Actual gaming performance of FPC?
Post by: Eugene Loza on May 24, 2017, 07:44:51 pm
But you don't handle 100 000 dynamic objects + add/remove them "on the fly" or you dont sort it etc., right? Thats the difference.
Of course I do. Both add/remove on the fly and sorting.
But I do it in an optimized way. And, of course, not 100 000, but as many as I need :)
I could do the same optimizations with dynamic arrays or face dire memory consumption in static arrays. And yes, I am almost sure the code would run faster with just 1-dimensional static arrays. But the performance is perfectly fine for me.
I even doubt usefulness of thread-safety I've spent so much time to ensure. If the map is generated in fractions of seconds, why should I care "Fastening" it by using threads?
The efficiency bottleneck for me is HDD reading. I bet something could be optimized here also, but I don't have any staff, so I prefer to do something more fun :)
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 24, 2017, 07:47:34 pm
I've an application that loads 400000 classes and sort them in various indexes on startup in under a second.  It would be 150-250ms though if I didn't handle string encodings.

Operations after loading are sub ms.
Title: Re: Actual gaming performance of FPC?
Post by: Paul_ on May 24, 2017, 08:26:22 pm
Ok guys, let's see the numbers from very simple FPC tests :)

Code: Pascal  [Select][+][-]
  1. Type
  2.   PItem     = ^TItem;
  3.   TItem = record
  4.     id         : integer;
  5.     x, y       : integer;
  6.   end;
  7.  
  8.   // Records
  9.   TManager = record
  10.     Count      : integer;
  11.     List       : array of TItem;
  12.   end;
  13.  
  14.   // Classes
  15.   TCItem = Class
  16.     id         : integer;
  17.     x, y       : integer;
  18.   end;
  19.  
  20.   TCManager = Class
  21.     Count      : integer;
  22.     List       : array of TCItem;
  23.   end;
  24.  
  25.   // Pointers to record
  26.   PPManager = ^TPManager;
  27.   TPManager = record
  28.     Count      : integer;
  29.     List       : array of PItem;
  30.   end;

FILL TEST: [1000 items]
Records:  0,0000256353031299197 s
Classes:  0,000318480942414061 s
Pointers: 0,000087160030641727 s

QUICKSORT TEST: [1000 items]
Records:  0,000342608286536339 s
Classes:  0,000292845639284142 s
Pointers: 0,000272940580383263 s

FREE MEM TEST: [1000 items]
Records:  0,0000229209769161635 s
Classes:  0,000117922394397631 s
Pointers: 0,0000799218274050438 s

------------------------------------------------------
FILL TEST: [100000 items]
Records:  0,000991690832870043 s
Classes:  0,0151803825522516 s
Pointers: 0,00582843748591348 s

QUICKSORT TEST: [100000 items]
Records:  0,010936546812315 s
Classes:  0,0120403287605367 s
Pointers: 0,00923714933962406 s

FREE MEM TEST: [100000 items]
Records:  0,000565564287109522 s
Classes:  0,00705573001968356 s
Pointers: 0,00362237615133803 s

------------------------------------------------------
FILL TEST: [10000000 items]
Records:  0,0609045929150976 s
Classes:  0,671325713148588 s
Pointers: 0,2505645791059 s

QUICKSORT TEST: [10000000 items]
Records:  1,44320108061496 s
Classes:  3,1377283567959 s
Pointers: 2,63237050861869 s

FREE MEM TEST: [10000000 items]
Records:  0,0520024844495768 s
Classes:  1,54121744059482 s
Pointers: 1,47699562507726 s

So, some differences are there :) Later I will add something what would simulate a game.
I expected a better performance in "array of records" vs "array of  pointers". Maybe there is a mistake somewhere.

Can test it please someone with older PC? (these numbers are from i7 3.40 GHz)
Title: Re: Actual gaming performance of FPC?
Post by: Thaddy on May 24, 2017, 08:39:05 pm
Can test it please someone with older PC? (these numbers are from i7 3.40 GHz)
If you can send me a still working 8## floppy disk I can. My hardware is still working.
Title: Re: Actual gaming performance of FPC?
Post by: Paul_ on May 24, 2017, 08:48:30 pm
Actually, I still have some, what is the address?
Title: Re: Actual gaming performance of FPC?
Post by: Nitorami on May 24, 2017, 10:01:02 pm
Test results from my ~13 year old AMD Athlon 3500+, compiled and run from the FPC IDE

■ Free Pascal IDE Version 1.0.12 [2015/11/16]
■ Compiler Version 3.0.0
■ GDB Version GDB 7.4
■ Using configuration files from: D:\speed\
Running "d:\speed\project1.exe "
FILL TEST: [10000000 items]
Records:  0,248648573253774 s
Classes:  2,0375213753895 s
Pointers: 0,9052851958167 s

QUICKSORT TEST: [10000000 items]
Records:  2,58236324632539 s
Classes:  12,3440482267842 s
Pointers: 9,90768453970034 s

FREE MEM TEST: [10000000 items]
Records:  0,155206947245678 s
Classes:  5,768513388735 s
Pointers: 4,44650447871946 s

The majority of time for the Class FILL test is obviously used up by creating the Classes, not by setting the variables. Of note is further that Class variables are automatically initialised with zero on Create, which is not the case for standard records.
Anyway, I am not sure whether anyone would use such atomic Classes in real life.
Title: Re: Actual gaming performance of FPC?
Post by: Paul_ on May 24, 2017, 10:23:01 pm
Thank you Nitorami, of course I will try to make them bigger with some functionality.
Title: Re: Actual gaming performance of FPC?
Post by: Ñuño_Martínez on May 25, 2017, 09:58:27 am
A month-old results. Classes vs records. Task: 3D maze generation (9x9x9)
Records: 22000 ms.
Classes: 250 ms. (thanks to many optimizations provided by Generic lists) + thread-safe (up to another 8x speed)
An that's why I'm against early-optimization.  I mean, early-optimizations require early-assumptions (for example: plain Pascal is faster than Object Pascal) and that would be wrong. (I would be ;))

I must investigate about those "optimizations provided by Generic lists". :-X

How do you have structured this sound generator? Can you post code?
Of course I can. You can read it from my SourceForge SVN:
https://sourceforge.net/p/mingro/code/HEAD/tree/TRUNK/src/engine/mngsound.pas

The runtime generator is the TmngPSG CLASS.  Sorry for mixing English and Spanish in both comments and naming.  I was recycling some old code.  Anyway public stuff is all in English.
Title: Re: Actual gaming performance of FPC?
Post by: Eugene Loza on May 25, 2017, 12:44:18 pm
I must investigate about those "optimizations provided by Generic lists". :-X
It was very algorithm-specific.
With the first algorithm I was randomly scanning the array.
The next gen used temporary generic list of the tiles that were applicable which appeared many orders more efficient than just randomly trying to add the tiles.
The core is here: https://github.com/eugeneloza/decoherence/blob/master/decodungeongenerator.pas#L478
Title: Re: Actual gaming performance of FPC?
Post by: marcov on May 25, 2017, 01:24:30 pm
The example with the loading of objects originally used sorted TStringlist as index. Between 100000 and 200000 that ground to a halt (D6, in 2003) (as in minute + loading times)

I then created a different container type, which I ported to generics (FPC 3.0/3.1 and D2009+) and simplified streaming and it became single digit seconds.

We actually had quite a laugh when a ISV came for official bill generation and after 4 hours the Java application crashed, and they had to restart and said there was not enough time left, and they would come back the next day.

Of course that was a generic application vs a tuned one, but my boss, ever the politician pointed out to them that his own programmers could do it under a minute (the actual I/O of the report generation being the bottle neck there, this was pre SSD times). We laughed, they didn't :-)

One of the reasons it was so bad was that our dataset was 10-20 times bigger than the next biggest customer they had, which were typical single muncipalities, while we were an adminstration office that little muncipalities  delegated the work to, that however in total amounted to a quarter of the households in the country.

The lesson was that ordered insertions in a one array structure is slow. The conventional solution is a hash, but I needed ordered lists (to compare dumps to another part of the system), so I created something myself
Title: Re: Actual gaming performance of FPC?
Post by: Thaddy on May 25, 2017, 02:23:06 pm
Between 100000 and 20000 that ground to a halt (D6, in 2003) (as in minute + loading times)
The final count down?
 :'(
Note, regarding the subject, I have seen some really nice peephole optimizations by Florian the past few weeks.
At least two of these can have impact on gaming performance.
TinyPortal © 2005-2018