Recent

Author Topic: How optimized is the FPC compiler  (Read 40525 times)

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: How optimized is the FPC compiler
« Reply #15 on: December 20, 2020, 05:17:42 pm »
IOW, it is not the language itself.
I strongly disagree, the language can emphasize more or less efficient programming. For example in C++ you can distinguish between move and copy semantic on assignments. For example if you have a struct containing dynamically allocated memory. When you copy the struct, to have a unique copy you need to deep copy, i.e. copy the dynamic memory. If you move the struct, you know the struct you get your data from will not be touched afterwards, and therefore you can just grab the pointer and don't have to copy the data.

FPC does allow copy assignments. While you can work around that, this basically eliminates a lot of convinience when programming. Example:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$ModeSwitch advancedrecords}
  5.  
  6. type
  7.  
  8.   { TMStream }
  9.  
  10.   TMStream = record
  11.     len: SizeInt;
  12.     Data: PByte;
  13.  
  14.     constructor FromString(const str: String);
  15.     class operator Copy(constref aSrc: TMStream; var aDst: TMStream); inline;
  16.     class operator Initialize(var a: TMStream); inline;
  17.     class operator Finalize(var a: TMStream); inline;
  18.     class operator :=(const str: String): TMStream; inline;
  19.   end;
  20.  
  21. { TMStream }
  22.  
  23. constructor TMStream.FromString(const str: String);
  24. begin
  25.   len := Length(str);
  26.   Data := ReAllocMem(Data, len);
  27.   Move(str[1], Data^, len);
  28. end;
  29.  
  30. class operator TMStream.Copy(constref aSrc: TMStream; var aDst: TMStream);
  31. begin
  32.   aDst.len := aSrc.len;
  33.   aDst.Data := ReAllocMem(aDst.Data, aDst.len);
  34.   Move(aSrc.Data^, aDst.Data^, aDst.len);
  35.   WriteLn('Move');
  36. end;
  37.  
  38. class operator TMStream.Initialize(var a: TMStream);
  39. begin
  40.   a.len:=0;
  41.   a.Data:=nil;
  42. end;
  43.  
  44. class operator TMStream.Finalize(var a: TMStream);
  45. begin
  46.   if Assigned(a.Data) then Freemem(a.Data);
  47. end;
  48.  
  49. class operator TMStream.:=(const str: String): TMStream;
  50. begin
  51.   Result := TMStream.FromString(str);
  52. end;
  53.  
  54. var
  55.   t: TMStream;
  56. begin
  57.   t := 'foo';
  58.   ReadLn;
  59. end.
The whole contents of the string is copied twice just for the initialization. Even without the implicit operator (i.e. t := TMStream.FromString('foo');) this is still a complete copy of the whole string.

In C++ this would look like this:
Code: Pascal  [Select][+][-]
  1. #include <iostream>
  2. #include <cstdlib>
  3. #include <cstring>
  4.  
  5. struct MStream {
  6.     void *data = nullptr;
  7.     int len = 0;
  8.    
  9.     MStream(char const *str) {
  10.         len = std::strlen(str);
  11.         data = std::malloc(len);
  12.         std::memcpy(data, &str[0], len);
  13.     }
  14.     MStream(MStream const &copy): len(copy.len) {
  15.         data = std::realloc(data, len);
  16.         std::memcpy(data, copy.data, len);
  17.         std::cout << "Copy\n";
  18.     }
  19.     MStream(MStream &&move): len(move.len) {
  20.         data = move.data;
  21.         move.len = 0;
  22.         move.data = nullptr;
  23.         std::cout << "Move\n";
  24.     }
  25.     ~MStream() {
  26.         if (data) {
  27.             std::free(data);
  28.         }
  29.     }
  30.     MStream &&operator =(char const *value) {
  31.         return std::move(MStream(value));
  32.     }
  33. };
  34.  
  35. int main() {
  36.     MStream m = "foo";
  37.     return 0;
  38. }
Does not. (in fact, even on O0 gcc optimizes the operator completely away, and the constructor is done inplace, which means not a single move or copy happens, but if the code would be complex enough that it would not be simply optimized away, it would result in a move operation, not a copy operation).

Sure you can also write equivalent code in pascal, by not using assignment operators and constructors for records, but use inplace functions instead. But this is a lot more effort. In general, if you want to write efficient code it is much more effort with pascal, than  it is with C++, as the language design emphasizes this differently. Therefore if you write good C++ code, it will also be efficient. If you write good Pascal code, you need to take extra steps to make it efficient.

Honestly, whenever I need to write very performance relevant code I don't even bother to use the fpc, because C++ code is easier to write efficiently, while also optimizing pascal code for such things as move semantics often makes the code less readable and therefore worse. FPC and Lazarus are great for some things, but I think I would go insane if I tried to get the same level of (manual) optimizations I get in C++ just by using this language as intendet, by chaning my pascal code.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9867
  • Debugger - SynEdit - and more
    • wiki
Re: How optimized is the FPC compiler
« Reply #16 on: December 20, 2020, 06:20:53 pm »
@Warfley: Are you sure your example is correct?

The Pascal code copies the content of the string only where you use "move". And in those locations the c code also uses "memcpy", which I am pretty sure copies the content.

In all other cases it is pointer operations.

In Pascal passing a string (longstring $H+) is a pointer op. Always!
Even assignment is. (Strings support copy on write, so they are not copied, until you modify the content. At that point there is no way avoiding the copy).

You do not even need "const s: string" for the param. A none-conts string param is still passed by pointer. (copy of the pointer / not pointer to pointer)



A lot of other data also works via pointer in Pascal:
- Objects (instances of classes, not old style object).
- Dyn Array

On the other hand records are passed by value. But you can specify "var" or "constref" depending on what you need.



Also in your c code the "string" is actually a PChar (if you want to match the data type exactly in pascal)

The big difference is that in Pascal you have some hidden pointers. In c strings and (dyn)array are usually done as pointer. In Pascal there is a data type, that abstracts that pointer from the users responsibility. Yet in Pascal you can do explicit Pointer too.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: How optimized is the FPC compiler
« Reply #17 on: December 20, 2020, 07:15:03 pm »
@Warfley: Are you sure your example is correct?
Yes, it's about the copy operator that is overloaded. This is used when assigning variables of the same record type:
Code: Pascal  [Select][+][-]
  1. var
  2.   t1, t2: TMStream;
  3. begin
  4.   t2 := t1; // internally this will be compiled to TMStream.Copy(t1, t2);
  5. end;
Quote
A lot of other data also works via pointer in Pascal:
- Objects (instances of classes, not old style object).
- Dyn Array

On the other hand records are passed by value. But you can specify "var" or "constref" depending on what you need.
But this is the thing, with the management operators you can define your own assignment semantics to records. This is useful for having managed datatypes inside your records. There are multiple reasons why these are very useful, most importantly, it let's you write local datatypes that do not have the overhead of classes, and support things like operator overloading for example.

The problem here is that as soon as you use operator overloading, the amount of copies is really annoying. For example I wrote a gmp wrapper recently, here are some parts of the code:
Code: Pascal  [Select][+][-]
  1. class operator TAPInteger.Finalize(var a: TAPInteger);
  2. begin
  3.   mpz_clear(a.FData);
  4. end;
  5.  
  6. class operator TAPInteger.Copy(constref aSrc: TAPInteger;
  7.   var aDst: TAPInteger);
  8. begin
  9.   mpz_set(aDst.FData, PAPInteger(@aSrc)^.FData); // deepcopies the whole data
  10. end;
  11.  
  12. class operator TAPInteger.+(constref lhs: TAPInteger; constref
  13.   rhs: TAPInteger): TAPInteger;
  14. begin
  15.   mpz_add(Result.FData, PAPInteger(@lhs)^.FData, PAPInteger(@rhs)^.FData);
  16. end;
The following expression c := a + b; would create a temporary object that is used as result of the + operation, which is then copied into c using the copy operator.
I wanted to implement some crypto algorithms just out of interest, i.e. not write production ready code, therefore the copies are not a problem, but if you wanted to deploy this code in a server that has to be quick when establishing handshakes and stuff (as each copy would need to copy around 500 bytes), this would be a problem.
The C++ OOP implementation of the gmp uses for this the move semantic, i.e. a temporary object gets created, but the assignment to c only copies the pointer not the whole data.
To do this in pascal you would need to completely go without operator overloading. And personally I think this makes the code much worse. Just look at the RSA key generation:
Code: Pascal  [Select][+][-]
  1. p := TAPInteger.RandomPrime;
  2. q := TAPInteger.RandomPrime;
  3. m := p * q;
  4. phi := (p-1) * (q-1);
  5. pub := 65537;
  6. priv := pub.inverse(phi);
this is much better than writing the following:
Code: Pascal  [Select][+][-]
  1. mpz_init(p);
  2. mpz_init(q);
  3. mpz_init(m);
  4. mpz_init(phi);
  5. mpz_init_set_ui(pub, 65537);
  6. mpz_init(priv);
  7. generateRandomPrime(p);
  8. generateRandomPrime(q);
  9. mpz_mul(m, p, q);
  10. mpz_init(phi_p);
  11. mpz_init(phi_q);
  12. mpz_sub_ui(phi_p, p, 1);
  13. mpz_sub_ui(phi_q, q, 1);
  14. mpz_mul(phi, phi_p, phi_q);
  15. mpz_clear(phi_p);
  16. mpz_crear(phi_q);
  17. mpz_inverse(priv, pub, phi);
In C++ you can write code like the former with literally no drawbacks, in pascal if you need performance, you need to write the latter one if you don't want to loose a lot of performance due to copying.

PS: I should note that due to this bug (#37164) (double call to the finalize operator in functions that return managed records) management operators are currently completely unusable (as they can simply not used as return values, and therefore constructors and operators are unusable) and my project actually didn't go anywhere because of this, but this is about conceptual designs of the language not bugs in the compiler. And still, even though when this bug gets fixed I will continue my projects, the code I write will not be production ready due to this massive overhead and I probably, if I ever need to use the GMP, will resort to C++ instead
« Last Edit: December 20, 2020, 07:20:12 pm by Warfley »

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #18 on: December 20, 2020, 07:42:43 pm »
@Warfley

I think Delphi 10.3 supports those as in C++
https://blogs.embarcadero.com/custom-managed-records-coming-in-delphi-10-3/

Not sure for now if FPC 3.2 could achrive this need to look..

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: How optimized is the FPC compiler
« Reply #19 on: December 20, 2020, 07:56:20 pm »
@Warfley

I think Delphi 10.3 supports those as in C++
https://blogs.embarcadero.com/custom-managed-records-coming-in-delphi-10-3/

Not sure for now if FPC 3.2 could achrive this need to look..

Theoretically the fpc already supports them (but as mentioned above they are currently unusable due to a bug). But my point is that even considering this, it is missing the move semantic. Meaning you often copy data from temporary objects, instead of just grabbing their pointers. As these objects are temporary, they don't need that pointer afterwards, so you can save a lot of performance by doing so.

Example from C++:
Code: C  [Select][+][-]
  1. std::vector<int> v1{1,2,3,4}, v2;
  2. v2 = v1; // copies all data from v1
  3. v2 = std::move(v1); // makes v1 a temporary object, moves list from v1, v1 is now empty and v2 contains all the data from v1.
For example the result of a function is always a temporary object, meaning there is no point in copying data if you can move it instead

It should be noted that besides this C++ compilers generally often use return value optimization, so instead of creating a temporary object tha is returned by a function that is then moved or copied, the compiler will simply write into the target object if possible, ommiting any move or copy, which is an optimization the FPC could also greatly benefit from. But even without the move semantic makes handling complex datatypes via copy assignments much easier
« Last Edit: December 20, 2020, 08:04:51 pm by Warfley »

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: How optimized is the FPC compiler
« Reply #20 on: December 20, 2020, 08:04:57 pm »
I strongly disagree, the language can emphasize more or less efficient programming. For example in C++ you can distinguish between move and copy semantic on assignments. For example if
Nonsense. FreePascal allows the same constructs, but with a slightly more complex syntax.
And C++ is a bad habit language anyway, so no wonder the Pascal solution is more complex.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #21 on: December 20, 2020, 08:10:43 pm »
@Thaddy

can u make an example how it would be done in pascal? Im really interessted :P

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: How optimized is the FPC compiler
« Reply #22 on: December 20, 2020, 08:14:45 pm »
IOW, it is not the language itself.
I strongly disagree, the language can emphasize more or less efficient programming. For example in C++ you can distinguish between move and copy semantic on assignments.

Sure, but IMHO that is in stuff in the fringes and does not justify the over-broad statement that you make.  You also don't really specify any numbers or scenarios where this matters.

I do however acknowledge that some things are (still) less polished.e.g. something like generics' TDictionary suffers from this being awkward for value types, making them hard to mutate during iteration.

I would chalk that up as a "more flexible STL implementation" win for C++ though, not performance.
« Last Edit: December 20, 2020, 08:30:02 pm by marcov »

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: How optimized is the FPC compiler
« Reply #23 on: December 20, 2020, 08:52:57 pm »
Sure, but IMHO that is in stuff in the fringes and does not justify the over-broad statement that you make.  You also don't really specify any numbers or scenarios where this matters.
See a few posts above, the gmp example. In general, when overloading operators that return complex types. Other example, I implemented a set type where union, complement, etc. can be expressed via arithmetic operators. Here the result of the operator functions always need to be copied.
I also often use implicit casts as seen in my APInteger example above. This often leads not only to one, but two uneccesary copies.

Note in C++ you can also overload the +=, -=, etc operators to do the operation inplace and don't need to create a temporary object at all, which also benefits things like a set implementation, as s += s2 would be the same as s.addall(s2)

Quote
Nonsense. FreePascal allows the same constructs, but with a slightly more complex syntax.
But this is all I said. In C++ simple code is often very efficient, see the GMP example from above, using the GMP classes with arithmetic operators is not less efficient than using the gmp low level api. But in pascal, to get a good performance you need to use the low level API, because move semantic is simply not part of the pascal language design.
I did not say that C++ is more efficient, I said it emphasizes more on writing efficient code. And you seem to agree, writing efficient code with pascal is more complex than it is in C++. So if you try too keep your code simple, in pascal it is often a tradeoff between simplicity and performance that in C++ is simply not the case.

And personally, I think as long as performance is not an issue you don't need to optimize your code, thats why I still often use pascal, because most of the time performance is not an issue. But this thread is about performance, and if you want to write highly performing programms, using C++ you can get much cleaner code that is highly efficient opposed to pascal, where such optimizations go at the cost of code complexity. And code being kept simple and readable is imho pretty much the single most important thing to consider when writing a program. So I won't use a language that requires me to write more complex code than neccessary for a given problem.

But I would not use the qualifier slightly. To take my example from above:
Code: Pascal  [Select][+][-]
  1.     p := TAPInteger.RandomPrime;
  2.     q := TAPInteger.RandomPrime;
  3.     m := p * q;
  4.     phi := (p-1) * (q-1);
  5.     pub := 65537;
  6.     priv := pub.inverse(phi);
This is not just slightly less complicated than:
Code: Pascal  [Select][+][-]
  1.     mpz_init(p);
  2.     mpz_init(q);
  3.     mpz_init(m);
  4.     mpz_init(phi);
  5.     mpz_init_set_ui(pub, 65537);
  6.     mpz_init(priv);
  7.     generateRandomPrime(p);
  8.     generateRandomPrime(q);
  9.     mpz_mul(m, p, q);
  10.     mpz_init(phi_p);
  11.     mpz_init(phi_q);
  12.     mpz_sub_ui(phi_p, p, 1);
  13.     mpz_sub_ui(phi_q, q, 1);
  14.     mpz_mul(phi, phi_p, phi_q);
  15.     mpz_clear(phi_p);
  16.     mpz_crear(phi_q);
  17.     mpz_inverse(priv, pub, phi);
The first one is clearly readable and easy to understand and write, while the second one is just worse in every regard. Even if you take away the initialization and clearing, as this can still be done using the management operators, something like:
Code: Pascal  [Select][+][-]
  1.     generateRandomPrime(p);
  2.     generateRandomPrime(q);
  3.     mpz_mul(m, p, q);
  4.     mpz_sub_ui(phi_p, p, 1);
  5.     mpz_sub_ui(phi_q, q, 1);
  6.     mpz_mul(phi, phi_p, phi_q);
  7.     mpz_set_ui(pub, 65537);
  8.     mpz_inverse(priv, pub, phi);
« Last Edit: December 20, 2020, 09:13:39 pm by Warfley »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9867
  • Debugger - SynEdit - and more
    • wiki
Re: How optimized is the FPC compiler
« Reply #24 on: December 20, 2020, 09:52:35 pm »
@Warfley: Are you sure your example is correct?
Yes, it's about the copy operator that is overloaded. This is used when assigning variables of the same record type:

Ah, sorry. The following from your original post
Quote
The whole contents of the string is copied twice
threw me off.

I applied that to the content, as long as it was in the string. But not when it was in the self-allocated mem.


Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #25 on: December 21, 2020, 06:07:13 am »
Im actually really abit wondered why these "move" semantics are not part of Pascal since Pascal was the Idea to replace (even if not succesfully done so) C/C++ family by providing exactly those kind of native/low-lvl intrinsics/API's but expose them in a cleaner and more maintainable way.

So actually this would be really nice to have FPC support also the Move-keyword ("std::Move") and possibility to overload the "+= ", "-="

Even delphi 10.3 allowed the possibility of Copy-Constructors where I think alot of other languages deny those constructs, thats why delphi/FPC shall compete eye-to-eye with C/C++ in such regards, as they do with other constructs, like :

* unions
* general Pointer-arithmetic
* embeded Assembler code

etc..

It has those constructs for a reason, so addingg also  the copy-constructor, move-semantics and +=/-= overloads would mean alot for more efficient coding, I agree on this term aswell as @Warfley

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: How optimized is the FPC compiler
« Reply #26 on: December 21, 2020, 10:11:58 am »
Quote
* unions
* general Pointer-arithmetic
* embeded Assembler code
* Variant record fields
* {$pointermath on}
* inline assembler is fully supported on many targets.
Also note Pascal is older than C (1970  vs 1972  )
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Awkward

  • Full Member
  • ***
  • Posts: 135
Re: How optimized is the FPC compiler
« Reply #27 on: December 21, 2020, 10:12:48 am »
Shpend, if you want so much C/C++ things, maybe better to use C/C++ compiler and do not try transform Pascal to C++ ?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: How optimized is the FPC compiler
« Reply #28 on: December 21, 2020, 11:05:21 am »
Shpend, if you want so much C/C++ things, maybe better to use C/C++ compiler and do not try transform Pascal to C++ ?

Well, before generics, operator overloading was used for a few special cases like the ubiquitous TComplex record, but with generics, the number of applications becomes larger. C++ is generally more apt in using value types, and Delphi generics have large gaps there.

FPC 3.2.0 got the record management stuff, so such changes are not completely out of the question.

I write speed dependent (Vision) applications, but don't use (or have an use case) for the examples that Warfley gives at all, so his general tenure (and subject) "not fit for performance applications" based on a few details ticked me off heavily. But I'm glad that now the discussion is constructive again.

I wonder how many of these are already solvable using management operators though.
« Last Edit: December 21, 2020, 11:07:05 am by marcov »

Shpend

  • Full Member
  • ***
  • Posts: 167
Re: How optimized is the FPC compiler
« Reply #29 on: December 21, 2020, 11:29:32 am »
@marcov

would you actually think that my mentioned things to add to FPC, like the "std::move" and "+=/-=" would make it to FPC? I think FPC can with those changes reallyy compete heavily with C++ and guys, am I wrongthinking that Object Pascal was made to compete vs C++ and Plain Pascal was about to compete with C? I think yes, so adding more record control, is highhly appreciated also for embeded Systems I think or also Engine(Low lvl) Programms, imho these are very endorsed features, i mean we are talking here about a language which is in itself beautfifully designed and has nearly everything C++ offers in terms of Language-Constructs so why not add those to complete it :P

 

TinyPortal © 2005-2018