Recent

Author Topic: How optimized is the FPC compiler  (Read 40127 times)

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #105 on: December 26, 2020, 11:06:45 am »
Quote
Believe what you want. I know the compiler's code, you don't.
Well, this is true but as I stated multiple times, there is nothing wrong considering addition to the compiler which would make sense, obviously you have to check for yourself how hard it would be to add them but its far away from being an "inappropiate/unlogical Addition".
This aside for a moment, but atleast the more important stuff are the mangement operators, this is more a must have for completion than typehelpers, since they offer way more opputornities than type-helper.

beepee

  • Newbie
  • Posts: 6
Re: How optimized is the FPC compiler
« Reply #106 on: December 26, 2020, 04:40:06 pm »
Hi,
 I generally notice the executables compiled by Free Pascal are over twice as big and run about half the speed compared with the same code compiled by Delphi7.
Except for my graphic programs, there the Free Pascal executable is about same speed and has less bugs than Delphi7
(my delphi7 is from ~2005, so quite old and 64 bits things are not well supported (e.g. seek() )in this version)

Edit:
I apologize, using the fpc320 version, I see the FreePascal is slightly faster, but code size still bigger. And to @Shpend below: yes, all optimizations I can find are used, no debug

To @Handoko below: I will take a look at Build Modes, but I am used to compile from the command line with fpc.exe that uses an fpc.cfg where debug info is off and optimize and strip are set already. And OK, somewhat bigger code is not a big problem nowadays.
« Last Edit: December 26, 2020, 06:24:37 pm by beepee »

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #107 on: December 26, 2020, 05:35:38 pm »
@beepee

did u look if u were compiling with Debug? and were the platform target the same?

Handoko

  • Hero Member
  • *****
  • Posts: 5122
  • My goal: build my own game engine using Lazarus
Re: How optimized is the FPC compiler
« Reply #108 on: December 26, 2020, 05:42:19 pm »
Lazarus / Free Pascal generate bigger binaries because it compiles to multiple platforms, so some extra codes need to be added. Also it is not fair to compare with Delphi 7 because it supports new features, I'm not very sure maybe unicode, etc.

Because the compiler supports several optimization techniques.  The size difference won't be noticeable on large projects.

But, maybe you haven't know. The default configuration will add the debugger info to the generated binary. If you use Lazarus you can disable it by:
Lazarus main menu > Project > Project options > on the left side > Debugging > disable Generating Debugging Info and enable Strip Symbols

There are some other extra things to make sure you get the smallest binary, you can search the forum to know more.

To make it easier to enable/disable the configuration settings, you should enable and use Build Modes. You can check the documentation if you're interested.
« Last Edit: December 26, 2020, 05:54:06 pm by Handoko »

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: How optimized is the FPC compiler
« Reply #109 on: December 26, 2020, 06:04:07 pm »
the thing only is it could be really complete the niche of old-style-objects, when it would have :
I would not say this, because C++ has a lot of differences that not necessarily are in the memory model itself, but allow for a lot of things in C++ that makes C++ much more convinient for high performance programming.

For example in C++ there are references, which are semantically like a one time pointer (i.e. a pointer that once being set cannot be changed, and is not allowed to be null/nil) but syntactically behave like a direct access to the object:
Code: C  [Select][+][-]
  1.   int i_local;
  2.     int &i_ref = i_local;
  3.     i_ref = 42;
  4.     std::cout << i_local << std::endl;
And there is a lot of shanannigance you can do with templates, vor example variable template arguments. This for example allows for some very neat stuff like the creation of objects inplace in other datastructures.

Let me give you an example. Let's say you want to construct a large graph. In this situation, you allocate a lot of memory in a short amount of time (during construction of the graph), and deallocate all of that at once (when tearing the whole graph down).
Using the heap directly might here be too slow, as heap allocations and freeing of every node takes time, you want to allocate the memory in bulks and free it all at once. Also in a classical approach, to free the nodes, you need to traverse the tree, which is terrible for cache locality.
In this case you want to use a stack allocator. This is a datatstructure that reserves a large amount of memory and allocates objects on it like a stack. You can't free objects during the runtime, but once the stack is teared down, all objects are going to be simultaniously destroyed.
This is how you would build that in C++:
Code: C  [Select][+][-]
  1. struct TreeNode {
  2.   virtual int child_count() = 0; // abstract method
  3.   virtual TreeNode &get_child(int idx) = 0; // abstract method
  4.   virtual int get_value() = 0; // abstract method
  5. }
  6. struct TreeBranch: public TreeNode {
  7.   TreeNode &left;
  8.   TreeNode &right;
  9.   virtual int child_count() override { return 2; }
  10.   virtual TreeNode &get_child(int idx) override { return idx ? left : right; }
  11.   virtual int get_value() override { return left.get_value() + right.get_value(); }
  12.   TreeBranch(TreeNode &_left, TreeNode &_right): left(_left), right(_right) { } // constructor
  13. }
  14. struct TreeLeaf: public TreeNode {
  15.   int value;
  16.   virtual int child_count() override { return 0; }
  17.   virtual TreeNode &get_child(int idx) override { assert(false); }
  18.   virtual int get_value() override { return value; }
  19.   TreeLeaf(int _value): value(_value) { }
  20. }
  21. ...
  22. std::vector<TreeBranch> branch_memory{1024*1024*1024}; // capacity 1 GB (virtual memory)
  23. std::vector<TreeLeaf> leaf_memory{1024*1024*1024}; // capacity 1 GB (virtual memory)
  24.  
  25. TreeNode &leaf1 = leaf_memory.emplace_back(1);
  26. TreeNode &leaf2 = leaf_memory.emplace_back(2);
  27. TreeNode &leaf3 = leaf_memory.emplace_back(3);
  28. TreeNode &branch1 = branch_memory.emplace_back(leaf1, leaf2);
  29. TreeNode &root = branch_memory.emplace_back(branch1, leaf3);
  30. std::cout << root.get_value();

In this code, not a single copy (or move) operation takes place, all operations are copy by reference. std::vector<T>.emplace constructs a new element directly where it will be stored by passing the arguments given to the function 1-1 to the types constructor.

In pascal the allocator would need to only allocate the memory and return the pointer so the allocating function can manually call the constructor:
Code: Pascal  [Select][+][-]
  1. // let's assume similar definitions of the types
  2. var
  3.   branch_memory: specialize TVector<TTreeBranch>; // let's assume that a type like this exists
  4.   leaf_memory: specialize TVector<TTreeLeaf>;
  5. var
  6.   leaf1, leaf2, leaf3, branch1, root: PTreeNode;
  7. begin
  8.   leaf1 := leaf_memory.emplace_back;
  9.   leaf1^.init(1);
  10.   leaf2 := leaf_memory.emplace_back;
  11.   leaf2^.init(2);
  12.   leaf3 := leaf_memory.emplace_back;
  13.   leaf3^.init(3);
  14.   branch1 := branch_memory.emplace_back;
  15.   branch1^.init(leaf1, leaf2);
  16.   root:= branch_memory.emplace_back;
  17.   root^.init(branch1, leaf2);
  18.   WriteLn(root^.get_value);
  19. end;
You can archive the same behaviour, but it is more code and less readable. So again I am at what I said in my very first post in this thread. You can do the same in pascal as with C++, but in C++ you just write less (and cleaner) code for getting the same efficency. And this is purely due to the language design of C++.

That said, this is still a very niche thing. The example above is a reduced version of a problem I actually had to face, where the overhead of heap allocation was just too slow for my purposes (the graph that was created required multiple gigabytes in memory), so I needed to build a stack allocator.
C++ makes this very easy, because as you can see, std::vector already provides the required functionality. But a lot of programs, especially those for which Pascal is prevalently used (like most GUI programs) do not find themselves in such situations very often.

Pascal is not C++, and does not even try to be like C++. Different languages have different strengths, and honestly, C++ is much more complex than Pascal, and while it is great for such things as shown above, I would never use C++ for just building a small GUI application due to it's complexity. C++ is also much harder to learn than Pascal due to exactly this.
Different languages are like different tools. I don't need Pascal to be like C++, the same way I don't need to add a hammer head to a skrew driver. If you try to do everything, you end up being bad, or at most mediocre, at everything.

To summarize, Pascal and C++ have by their design different goals. I would argue thats a good thing. I like management operators not because they allow for efficient programming, but because they clean up the code. C++ has the intendet goal to be as efficient as possible, all language features are designed with this in mind, even if that means making the language more complicated. If Pascal would become a C++-lite, there would literally be no reason for me (and probably for most people) to use it instead of C++.
Even though some features would be nice, ultimately we are talking about a niche where Pascal is not the language of choice to begin with. Better to focus on the things Pascal is already good at, instead of trying to improve something Pascal isn't so usefull for to begin with.
« Last Edit: December 26, 2020, 06:08:56 pm by Warfley »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: How optimized is the FPC compiler
« Reply #110 on: December 26, 2020, 09:33:54 pm »
I generally notice the executables compiled by Free Pascal are over twice as big and run about half the speed compared with the same code compiled by Delphi7.

Just like more modern Delphi.  D7 is minimalist and has no deep support for unicode, or anything else after 2003 or so.  If you need to compare to a Delphi, use a recent one, not something ancient.

Also, binary size minimization is no core target at the moment, nobody wants to do complex work on it, like improving smartlinking (except Pascaldragon, occasionally)
 

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #111 on: September 21, 2021, 11:34:54 pm »
Hey, I found this, can some1 explain more for that?

https://gitlab.com/freepascal.org/fpc/source/-/issues/35825

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: How optimized is the FPC compiler
« Reply #112 on: September 23, 2021, 06:42:50 pm »
Well, i have read all the post and i have some contradictory feelings.

First of all, the basis of my argumentation:

- FPC and C, for me, have similar speeds. We can argue if move/copy, using pointers, etc, can be more convenient. But the tools are in Pascal and C. If I really go for speed, i would see (as other saids) how many memory allocs/copys/moves/pointer passing, not data/or system calls i'm being doing. And really needy speed need smart thinking. I do not see many differences between pascal and C in their *basic* mode.
As a very fast example: convert any NTree algorithm in a array algorithm and you would have same speed.

With this point of view, things like a+=b or a:=a+b; for me it's sugar syntax. I goes to the same intermediate code, then to same assembly if properly done. Same if a:=a+1 should give us an inc(a)

But i think we hace several issues related to the class/object model.

1.- We haven't a void heap Class. I haven't read about this in your posts, but Object is not TObject even if both were on heap. Object is a void one. No methods, no data, not any. Object has not an AfterDestructionMethod for example, no interface data and things like that. Object it's A lightweightversion of TObject even on heap. I we need a really void heap class, there is nothing. And i allways said that one of this kind is really needed.
2.- I think we need to determine if we want to be a 25 year compiler that mimics delphi/embarcadero with their good and bad ideas or if besides that, we want a compiler to do their own way and supporting embarcadero/delphi things.
What are the problem?. We cannot use Object for past compatibility (and even embarcadero deprecates it). Well: Use (for example) FPCObject to do new and cooler things: The new and shiny Object implementation for FPC. For use on stack with all the power of a delphi class. Or explore the way. We cannot use other thing that it's not a TObject Descendant. Well, not. We should do a TLazarusClass* (with NOTHING, the minimal one) and then derive TObject. So we can derive from a TLazarusClass* without all the bloated TObject and do newer and cooler and lighter things. And this could be with a cool syntax. Embarcadero wouldn't do it for us. And i think it's clear that we have a mess here. Let's study it and make a good object pascal learning from all our experiences. From the compiler view on how to optimice. From the syntax view to unify things and things like that. We are a community bigger than Embarcadero one and we know the tool.

* (Or other cool name).  :D

3.- I'm a FPC User. I love It. C++ should learn how to do things even with our issues. And i read some posts and seems for their tone that are like we cannot fight and things like that.. Well i think we can fight and do ever better than others, for that reason we are using fpc and lazarus, not the reverse. And i think, perhaps, it's time to think about what we want to be when we were adults, not a child from  oldies doing the same oldies mistakes.

Note: It's not a blame for fpc devs. It's a blame for us, as community to, perhaps, show what we should be the road to do and if, as community, feel confortable with the roadmap. I think many of us doesn't like the oop implementation (it's a desideration, but an open poll will give us a communty view, for example), but there is not a study group from users to do a proposal, nor a poll, not a easy way to make a proposal with a community formal review. And this will enforce a well weighted and balanced evolution with a clear objetive and not our personal tastes (yes i have my own tastes too  :P). Perhaps same with other issues: standard, additional libs, etc..

edit: As example: Figure we are working for a renewed object/class model.
We should:
1.- Do a working group-dev group to do the things. With some basic agreements:
  a) Devs will go for it. If not, there isn't a working group. I would be a theoretically study group with no really trascendence.
  b) Little bites, not big ones. All we have a live.
  c) No *mumble mumble* do your own fork. We are a community, but we aren't too numerous to split or forking and it's a bad strategy. It's better to say: There isn't enough people to do it, but if there were people we will go for it or say: allthough there were enough people for this, we think it's a bad idea for this, and this, and this. And even making a recording/faq/forum to debate these and document that

..Or something like the above if all people doing the work agree with their own terms.
« Last Edit: September 23, 2021, 07:15:45 pm by damieiro »

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #113 on: September 28, 2021, 10:16:19 pm »
I like your view personally, mate!

I hope the dev's really have a watch on that :)

Blade

  • Full Member
  • ***
  • Posts: 177
Re: How optimized is the FPC compiler
« Reply #114 on: September 29, 2021, 12:36:28 am »
But i think we hace several issues related to the class/object model.

Could you give your opinion on advanced records?

Clearly this is direction that Delphi/Embarcadero went, so trying to understand your direction a bit better.

Also, I'm curious if you are coming from a language that was Class-based OOP centric, so feel compelled to continue similar usage in Object Pascal versus the present possible options that it gives.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: How optimized is the FPC compiler
« Reply #115 on: October 14, 2021, 03:39:26 pm »
Quote
Could you give your opinion on advanced records
Clearly this is direction that Delphi/Embarcadero went, so trying to understand your direction a bit better.

Also, I'm curious if you are coming from a language that was Class-based OOP centric, so feel compelled to continue similar usage in Object Pascal versus the present possible options that it gives.

Advanced records, inmho, is a valid point. It's Rust view also. Does the job if you do not need inheritance in your paradigm. For an oldies view, it's a kind of sugar cake of an old unit (taking a whole unit as a advanced record). It's handy, and their potential as making helpers, generics, etc is a very valuable one. If you need encapsulation but not inheritance, i think it fits perfectly.

I do not blame vs advanced records. An OOP can benefit from them a lot, and, i think, makes a fresh tool for doing things. It makes that not-all that needs encapsulations and can be reused *must be* an object. It was allways Quirckly that, for example, system facilities (like opening/close files), system calls, simply things like a random generator, etc, or there should be a procedural call with assignments for persistent data (assign file, seeds..), or a fully object that rarely could have descendency or hybrids like procedures with const variables.. An advanced records sounds far better for many jobs.
Advantages of an advanced record (INMHO):
- Encapsulation
- Do not use inheritance and saves these space  and system and compiler overhead (no OOP tables, etc) and it's faster.
- Avoids quirks like procedures with constant values not showed (like randomize, random calls). You expect encapsulation of data in an advanced record, but not in a procedure
- Avoids quircks of assignation (like the assign a file variable) and bad style like two-calls for one service.
- Allows modern syntax like generics.. Generics is a very powerful tool.
- Can be used and reused out of the scope of classes, which is handy.
- Better mainteinance.
- Readable and enforces good practices.
- A differect tool that we don't have. And different tools is allways wellcome :)

So, i like advanced records.

On the other hand, i dislike two OOP implementations (one nearly deprecated, and other not empowered full) for the same niche of solutions. I think there is here room for improvements and rethinking.

And for the sake of completness. I'm not coming from a oop centric languaje. As many users, i started c (not c++) and Pascal and assembly (and basic  :D ). Platforms from cp/m-dos/all windows, unixes, solaris, etc. Some prolog, some fortran, but my main base is that. OOP was too modern from me, i adquire the oop base later, but i like it as a very powerful tool for large deployments.
I am firmly convinced on code readability, code security (the languaje should avoid mistakes from programmers), code efficienct and this makes all-purpose. And low level grounded compiler.
Pascal does it. Good read, strong typed, efficient, all purpose. But i think that we are forgetting low-level when we are thinking like : low-level is like a C languaje.
Well. Pascal is low level as C. Safer. Strong typed. Readable. But capable of the same speed and low resource consuption as C. If we voluntary do a TObject implementation with a higher level ground for Nothing, we are giving this ground to C++ for nothing. It's a better aproach a very base TVeryPrimitiveObject (a pure void class), then construct the more advanced TObject we are using, and we will have both advantages. The lower ground that can give us a lot of happines if smart people working in it, and the middle ground level, which is now used.
TP-old-nearlydeprecated-Object were on that lower ground. Not for stack or heaps , but for the base. The Void class as a base in tp-object. We could do a TObject from TPObject, but not the reverse. And this is the key here.

(pd: sorry for the latter answer, i'm having some health issues :( )

and one personal opinion.

If you are from the one-file one-object way of coding... The most beautiful and readable code you can achieve inmho, its FPC code. And really fast compile. And fast result.

« Last Edit: October 14, 2021, 04:15:48 pm by damieiro »

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: How optimized is the FPC compiler
« Reply #116 on: October 28, 2021, 08:34:55 pm »
If you spend half your time thinking about how to optimize your code, you could be twice as productive simply by stopping to do so. It is very rare that speed is an issue, and in almost all cases you can speed up the bottleneck by changing the underlying way you do it. Which has nothing whatsoever to do with the compiler.

Simple example: if you have a huge dataset that you filter in many places to extract the relevant information, you could speed that up a lot by using multiple, different queries and/or datatsets. And you could speed those up by using stored procs instead. That has a far greater impact on the speed of your application than any compiler optimization.

In short: always ignore speed unless it becomes a problem, at which point you should search for the bottleneck and fix only that.


Yes, I know that is the opposite philosophy of C++, where speed is always the most important consideration.

For me, the most important thing is always the readability.

Seenkao

  • Hero Member
  • *****
  • Posts: 545
    • New ZenGL.
Re: How optimized is the FPC compiler
« Reply #117 on: October 29, 2021, 10:06:14 am »
[
If you spend half your time thinking about how to optimize your code, you could be twice as productive simply by stopping to do so. It is very rare that speed is an issue, and in almost all cases you can speed up the bottleneck by changing the underlying way you do it. Which has nothing whatsoever to do with the compiler.

Simple example: if you have a huge dataset that you filter in many places to extract the relevant information, you could speed that up a lot by using multiple, different queries and/or datatsets. And you could speed those up by using stored procs instead. That has a far greater impact on the speed of your application than any compiler optimization.

In short: always ignore speed unless it becomes a problem, at which point you should search for the bottleneck and fix only that.


Yes, I know that is the opposite philosophy of C++, where speed is always the most important consideration.

For me, the most important thing is always the readability.
И да, и нет. Это зависит от разрабатываемых приложений. Если конечный результат программа которой будут только пользоваться, и не для дальнейшей разработки с её помощью, то тут можно просто не задумываясь делать приложение и выдавать результат. Учитывая критичные места.

Но если приложение/библиотека будет использоваться для дальнейшей разработки, то в таком случае "всё должно идти с иголочки" (по возможности). И каждая вызываемая процедура должна быть отработана или в ней должно быть указание, что в ней не отработано и по какой причине.

В противном случае, все недоработки, которые несёт с собой приложение/библиотека - могут создать проблемы для программиста который пользуется данным инструментом. И программисту придётся искать другой путь, другую библиотеку или самому писать заново подобные функции/процедуры. Что означает, что вы просто заставили человека сделать дополнительную работу, хотя он хотел заниматься разработкой основного приложения.

Что по поводу читабельности - это может быть воспринято по разному. Если это относится к форматированию текста и документированию его, то да. Если же это относится к тому, что компьютер должен понимать что пишет человек - то нет! Причём совсем нет!
Если мы пишем программу легко воспринимаемую компьютером, но достаточно непросто воспринимаемую программистом (и совсем не воспринимаемую обычным человеком), то это ближе к развитию и человека и программы для компьютера.
Если же мы пишем программу, которую достаточно просто поймёт обычный человек, но "сломает ногу" машина, для которой мы это писали - то это деградация и программиста и машины. Вы не получите должного результата. И будете дальше надеяться на мощности вашего компьютера, а он будет "тормозить".

Yandex translate:
Yes and no. It depends on the applications being developed. If the end result is a program that will only be used, and not for further development with its help, then you can just make an application without hesitation and give the result. Considering critical locations.

But if the application / library will be used for further development, then in this case "everything should go from scratch" (if possible). And each procedure called must be worked out or it must indicate what has not been worked out in it and for what reason.

Otherwise, all the flaws that the application / library brings with it can create problems for the programmer who uses this tool. And the programmer will have to look for another way, another library, or re-write similar functions/procedures himself. Which means that you just forced the person to do extra work, even though he wanted to develop the main application.

What about readability - it can be perceived in different ways. If this applies to formatting text and documenting it, then yes. If this refers to the fact that the computer must understand what a person is writing, then no! And not at all!
If we write a program that is easily perceived by a computer, but is not easily perceived by a programmer (and not at all perceived by an ordinary person), then this is closer to the development of both a person and a computer program.
If we write a program that an ordinary person will understand quite simply, but the machine for which we wrote it will "break its leg", then this is the degradation of both the programmer and the machine. You will not get the proper result. And you will continue to rely on the power of your computer, and it will "slow down".

-----------------------------------------------------------------------------------
Я протестировал программы (и буду дальше тестировать), посмотрел определённые места компилятора. Оптимизация самого компилятора FPC застопорилась примерно лет 10 назад. Видимо ею ни кто не занимался и все полагаются на ресурсы компьютера. Причём, для Windows компилятор FPC более сильно оптимизирован, чем для Linux (и вероятно для других систем так же - менее оптимизирован). В некоторые моменты поведение компилятора непредсказуемо. Компилятор может выполнить оптимизацию, а в следующий момент не выполняет (допустим, в разных процедурах работая со статическими данными). Я ещё далеко не достаточно глубоко залазил внутрь компилятора и не полностью рассмотрел смотрел что он делает.

В ряде случаев, вы можете отказаться от "String" паскаля и работать с текстом сами. Компилятор зачастую (если вы сами хотите производить какие-то действия с текстом) добавляет код, который несёт дополнительные расходы (это правильно для большинства! Редко, но некоторые люди хотят контролировать сами процесс работы с текстом).
Функция StrToInt устарела (вероятно и StrToFloat тоже, не проверял). Средствами паскаля (даже не ассемблера) её можно ускорить минимум в два раза, если не больше. Функцию IntToStr с помощью паскаля не ускоришь, добавляются накладные расходы на текст, и в данном случае компилятор достаточно неплохо справляется с текстом.

Компилятор FPC уже достаточно давно надо пересматривать в сторону оптимизации. Какие-то части он оптимизирует достаточно неплохо, а какие-то просто игнорируются и рассматриваются обычной последовательностью кода.

yandex translate:
I tested the programs (and will continue to test), looked at certain places of the compiler. Optimization of the FPC compiler itself stalled about 10 years ago. Apparently, no one was engaged in it and everyone relies on computer resources. Moreover, the FPC compiler is more highly optimized for Windows than for Linux (and probably less optimized for other systems as well). At some points, the compiler's behavior is unpredictable. The compiler can perform optimization, and the next moment it does not (for example, working with static data in different procedures). I haven't gone deep enough into the compiler yet and haven't fully considered what it does.

In some cases, you can abandon Pascal's "String" and work with the text yourself. The compiler often (if you want to perform some actions with the text yourself) adds code that incurs additional costs (this is correct for most! Rarely, but some people want to control the process of working with the text themselves).
The StrToInt function is outdated (probably StrToFloat too, I didn't check it). By means of Pascal (not even assembler), it can be accelerated at least twice, if not more. You can't speed up the IntToStr function with pascal, text overhead is added, and in this case the compiler copes with the text quite well.

The FPC compiler has had to be revised in the direction of optimization for a long time. He optimizes some parts quite well, and some are simply ignored and considered by the usual sequence of code.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.

Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: How optimized is the FPC compiler
« Reply #118 on: October 29, 2021, 08:28:14 pm »
Ok, a long reply.


Stack and heap.

Ok, much of the things I know about how they are handled are old and probably out-of-date. Please correct me where needed.

Each process gets a stack allocated from the OS. You can specify the min and max sizes. The stack is used for parameters and local vars. Allocation is normally static and de-allocation is optional, because each time a function is called a block is reserved and when it exits, that block is discarded. Of course, the application can allocate a block for its own use on startup, by moving the stack pointer downwards.

All global vars that are known and allocated at compile time are put in the code segment. You can allocate them dynamically on startup if needed.

For dynamic allocations and data management you use the heap. There might be a minimum and/or maximum size to the blocks you can request from the OS, but they can be allocated, discarded, grow and shrink during execution.

On the stack, everything is either relative to the top of the stack, or the stack pointer. You cannot de-allocate memory, but it is possible to allocate it. If you allocate and de-allocate those blocks of memory dynamically, the stack keeps growing.

Now is that not as much a problem as it seems, because of virtual memory. Request a minimum stack size of 2GB, and you get an address range of that size, with one page of RAM mapped to the top. The rest of the range is unpopulated with actual RAM until you write something to it, at which point the OS plugs a new page of RAM at that location. Then again, while you could mark pages disposable or remove them yourself, I don't think that is something you should do.

So: pre-allocated local vars: yes. dynamic memory management on the stack: no.

Memory management on the heap has been inefficient for small blocks. So, compilers tend to use a runtime that has its own memory manager built in. Allocate large blocks from the OS, allocate small stuff yourself inside that large block. Both are fully dynamic.

Each OS has its own implementation of the heap manager. Often more than one as well. Like, Windows has 3 or 4 different ones, depending on how you count them. And they all behave differently.

Why is it all so complex? Because of memory models and segmentation.

On a small CPU, with less than 64 kB of memory that runs a single process at a time, you load the static data at the bottom, the code goes on top of that and the stack starts at the end of memory and grows down. Everything in between is the heap, which you have to manage yourself. If the top of the heap meets the stack pointer, you're out of memory.

So, in this case, it is vastly better to allocate everything on the heap if you want to do any dynamic memory management. Ok, you need pointers to pointers to be able to move stuff around.

But, in general, allocations on the stack are either static or with a predefined size. Allocate three records, free the first and allocate a new one and you have the space of four in use.

When registers and RAM grow in size, we get bank switching and/or segments. You can put the whole address space of an application in a single bank or segment, or you can give it multiple. Popular are the division between code, data and stack.

If the max size of the data (heap) and stack banks or segments are the same, it becomes interesting to put data on the stack. You might even be able to use the bottom of the stack segment/bank as an extension of the heap. And probably the top of the code segment as well.

But if you have a 64-bit application, does it still matter? Everything is mapped linearly in a huge address space, where pages of 4kB (or sometimes 1 MB) of ram are inserted wherever they are used. You can give each function their own stack of multiple GB. It doesn't matter. And the memory management is basically the same for heap and stack, with the only difference that you cannot discard stack memory.

The application requests a large block of memory from the OS and starts handing out blocks on request. And while keeping track of static data and discarding stuff on the stack is easier, every time you access a location, a page of RAM is mapped into it. The OS might free pages below the stack pointer, but it might not. Because many applications use that space to store data. And the OS has no way to know if it is used or discarded.

The heap is something else. You can simply grow the heap almost indefinitely. Holes larger than a page are discarded by the OS and the RAM released. The only problem is fragmentation, if you have a large amount of pages that contain just a few small blocks of data but are mostly empty. The application can move them around, if you use pointers to pointers.

So, what is faster? What is more memory efficient? It depends on your CPU architecture and memory model and if it uses bank switching, segmenting or paging. There is no general statement you can make about it for FPC executables, which run on just about anything.


Strings, records objects and classes,

Let's start with the basics: accessing blocks of memory in C(++) sucks. It's really unsafe and/or very slow. I mean, it is very fast, if you never ask for the length and don't care if you accidentally read or write past the top of the buffer. The best example are, of course, strings. So, you want to use containers that have boundaries built in, like records or objects. And if you use methods to access them, you can have them do the bounds checking for you.

Now, classes and generics in C++ were implemented as templates at the start. It's a macro. It gets expanded and the compiler tries to compile the result. The good: all datatype specific considerations are used. The bad: it's hard to say if the resulting code does what you intend it to do.

For example, does the destructor actually runs? And when? Probably not when an exception occurs (no try .. finally). Constructors and destructors shouldn't have parameters for the best results. So it makes a lot of sense to put as much as possible on the stack, where they are wiped out automatically when needed.

Do we really want to copy that behavior?


Ok, my last point: you can implement almost anything you want in almost every programming language. But it might make things (a lot) easier or harder, depending.

When I have a project, first I think about how I want it to work. Like, how should the high-level stuff look like? What interfaces do I want for the medium level stuff? And with an interface, I mean: everything you program around from two or more sides. So, a function with parameters and/or a result, is an interface. You can specify it as part of your design.

And for the low-level stuff, I make a plan as well. Most of the time, I totally don't care where and how data is allocated. But sometimes I do. And in those cases, I make sure it is programmed like that. And I don't really care if that requires the use of malloc and pointers, or records, or objects (classes).

But I do care very much about the ease of use and readability. If you choose a complex way to do it, make sure it can be easily tested and is encapsulated in an easy-to-use package.
« Last Edit: October 29, 2021, 08:42:43 pm by SymbolicFrank »

Seenkao

  • Hero Member
  • *****
  • Posts: 545
    • New ZenGL.
Re: How optimized is the FPC compiler
« Reply #119 on: October 29, 2021, 09:14:38 pm »
SymbolicFrank, прочитал больше как информацию для ума.  :)
По большей части мне добавить нечего, определённую часть вы знаете лучше меня! И вероятно я перечитаю это ещё раз.

Но в данном случае я больше говорю о банальной оптимизации! Как вы упоминали. По мелочи:
- объявления глобальных переменных - как результат увеличение скорости работы приложения и уменьшения исполняемого кода.
- определённые функции и процедуры давно должны быть переработаны. Так как создавались они ещё при царе Горохе.
- работа со статическими данными - так же ускорение работы кода и его уменьшение. FPC не гарантирует ни как что при встрече статического блока, он его преобразует и уменьшит. Это как в рулетку играть - либо да, либо нет.

Yandex translate:
Symbolic franc, I read more as information for the mind.  :)
For the most part, I have nothing to add, you know a certain part better than me! And I'll probably read it again.

But in this case I'm talking more about banal optimization! As you mentioned. For small things:
- declarations of global variables - as a result, an increase in the speed of the application and a decrease in executable code.
- certain functions and procedures should have been redesigned long ago. Since they were created during the reign of Tsar Peas.
- - working with static data - also speeding up the code and reducing it. The FPC does not guarantee that when a static block is encountered, it will transform it and reduce it. It's like playing roulette - either yes or no.

as an example:
Code: Pascal  [Select][+][-]
  1. const
  2.   One = 1;
  3.   Two = 2;
  4.   Three = 3;
  5.   Four = 4;
  6.   OneAndThreeOrFour = One And Three or Four;
  7. ...
  8. var
  9.   z: longword;
  10. begin
  11.   z := One And Three or Four;  // does not guarantee that it will be converted
  12. // I have to do it manually ->
  13.   z := OneAndThreeOrFour;
  14. end;
  15.  
Таких недочётов скопилось не мало. Я не думаю что я смогу их все выявить, тут нужен не один человек, а группа, которая будет выявлять и устранять недочёты.

По C/C++. Стремиться к C/C++ у меня нет желания. Я не говорю, что это плохие языки, но мы должны понимать, что Pascal и C - это разные языки и не надо из Pascal лепить очередной C.

Yandex translate:
There are a lot of such shortcomings. I do not think that I will be able to identify them all, there is not one person needed here, but a group that will identify and eliminate shortcomings.

By C/C++. I have no desire to strive for C/C++. I'm not saying that these are bad languages, but we must understand that Pascal and C are different languages and it is not necessary to mold another C from Pascal.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.

Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL

 

TinyPortal © 2005-2018