Recent

Author Topic: How optimized is the FPC compiler  (Read 27699 times)

PascalDragon

  • Hero Member
  • *****
  • Posts: 3638
  • Compiler Developer
Re: How optimized is the FPC compiler
« Reply #120 on: October 30, 2021, 03:36:53 pm »
I tested the programs (and will continue to test), looked at certain places of the compiler. Optimization of the FPC compiler itself stalled about 10 years ago. Apparently, no one was engaged in it and everyone relies on computer resources.

I really wonder how you come to that conclusion, cause if you look at the history of e.g. the x86 specific optimizations you can see that it's actively worked on, same for other platforms. Only because what you consider a good optimization is not done does not mean that no optimizations are done, because it all depends on what the devs priortize.

In some cases, you can abandon Pascal's "String" and work with the text yourself. The compiler often (if you want to perform some actions with the text yourself) adds code that incurs additional costs (this is correct for most! Rarely, but some people want to control the process of working with the text themselves).
The StrToInt function is outdated (probably StrToFloat too, I didn't check it). By means of Pascal (not even assembler), it can be accelerated at least twice, if not more. You can't speed up the IntToStr function with pascal, text overhead is added, and in this case the compiler copes with the text quite well.

The point of Pascal is ease of use and not to squench the last bit of performance out of the code. If you need that you either drop to a lower level and use e.g. PChar instead of AnsiString or simply use a different language that generates code as you like it (e.g. C++). Also this ease of use also includes maintainability of both the compiler and the RTL. Sure you can write StrToInt in assembly for every platform, but then this means a higher maintainance burden.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 792
Re: How optimized is the FPC compiler
« Reply #121 on: October 30, 2021, 09:28:38 pm »
Optimizing things to the max requires selecting a single implementation first. Like in the example of PascalDragon, the CPU. Or, in my example, the memory model. The more you focus the usage, the better it can be optimized.

But, as PascalDragon said, the more platforms you have, the more you need to support. Rolling out a change isn't simply changing one block of source code anymore, but requires a revision of each and every different implementation.

A stupid example: how are you going to use a TStream? Big-endian, little-endian or UTF-8 network communication? Let alone the C# WCF way, where you push the object through in IL code and run that on the target platform. Who is going to make all those translations?

And then I forget the endpoints. Is it a file, a socket, a purely serial communication, USB master or slave? They're all quite different. Some always accept input, but most tell you to wait. Some can request communication whenever the need arises, but others have to wait and buffer until the master requests an update.

So, the trick is in making something that is understandable and "good enough" in general. And if it turns out to be the bottleneck for your application, it is up to you to make that highly specialized and optimized case. And it would be nice if you tell us why and how you did it, so we can learn from it.

Seenkao

  • Sr. Member
  • ****
  • Posts: 310
    • New ZenGL.
Re: How optimized is the FPC compiler
« Reply #122 on: October 31, 2021, 12:55:42 am »
I tested the programs (and will continue to test), looked at certain places of the compiler. Optimization of the FPC compiler itself stalled about 10 years ago. Apparently, no one was engaged in it and everyone relies on computer resources.

I really wonder how you come to that conclusion, cause if you look at the history of e.g. the x86 specific optimizations you can see that it's actively worked on, same for other platforms. Only because what you consider a good optimization is not done does not mean that no optimizations are done, because it all depends on what the devs priortize.
Я ни кого не хотел задеть своими словами! Я просто слишком прямолинеен, это моя достаточно плохая черта.
Переходя по ссылке мы увидим, что какими-то определёнными оптимизациями занимается в основном один человек (остальных не видно, но это не значит что они ни чего не делают). Но один человек - это очень мало. Учитывая сколько платформ поддерживается.

google translate:
I didn't want to hurt anyone with my words! I'm just too straightforward, that's my bad enough trait.
By clicking on the link, we will see that basically one person is engaged in some specific optimizations (the rest are not visible, but this does not mean that they are not doing anything). But one person is very little. Considering how many platforms are supported.

Quote
In some cases, you can abandon Pascal's "String" and work with the text yourself. The compiler often (if you want to perform some actions with the text yourself) adds code that incurs additional costs (this is correct for most! Rarely, but some people want to control the process of working with the text themselves).
The StrToInt function is outdated (probably StrToFloat too, I didn't check it). By means of Pascal (not even assembler), it can be accelerated at least twice, if not more. You can't speed up the IntToStr function with pascal, text overhead is added, and in this case the compiler copes with the text quite well.

The point of Pascal is ease of use and not to squench the last bit of performance out of the code. If you need that you either drop to a lower level and use e.g. PChar instead of AnsiString or simply use a different language that generates code as you like it (e.g. C++). Also this ease of use also includes maintainability of both the compiler and the RTL. Sure you can write StrToInt in assembly for every platform, but then this means a higher maintainance burden.
Основной смыл моих слов был в том, что я написал слово: Паскаль:) И я прекрасно выразился, что сам паскаль меня устраивает! Оптимизация, как я считаю, не достаточная. И больше банальная оптимизация. Где явно видно, что ни какого изменения в коде не будет в процессе работы программы, но компилятор не преобразует такие моменты. Или преобразует, но не всегда или частично. Или вообще только для одной платформы - Windows.

Было бы интересно узнать, а в чём разница между Windows 64 bit и Linux 64 bit, которые идут на одной платформе x86? Почему для Windows оптимизация идёт, а для Linux нет? Примеры с Single/Double, которые используются при вызовах процедур и вычислении определённых данных - думаю достаточно яркие. Windows - оптимизация работает. Linux - оптимизации нет. (Извиняюсь, что вновь поднимаю этут тему, но как пример неплохо подойдёт.)

Теперь то, что явно не оптимизировано.

Google translate:
The main meaning of my words was that I wrote the word: Pascal. :) And I put it perfectly that Pascal himself suits me!Optimization, in my opinion, is not sufficient. And more banal optimization. Where it is clearly seen that no change in the code will be in the process of the program, but the compiler does not transform such moments. Or transforms, but not always or partially. Or generally only for one platform - Windows.

It would be interesting to know what is the difference between Windows 64 bit and Linux 64 bit, which are on the same x86 platform? Why is there optimization for Windows, but not for Linux? Examples with Single / Double, which are used when calling procedures and calculating certain data - I think they are quite bright. Windows - optimization works. Linux - no optimization. (Sorry to bring this up again, but as an example it will work well.)

Now something that is clearly not optimized. StrToInt example:
Code: Pascal  [Select][+][-]
  1. const
  2.   isByte      = 0;                 // len = 3                0..255
  3.   isShortInt  = 4;                 // len = 4                -128..127
  4.   isWord      = 1;                 // len = 5                0..65535
  5.   isSmallInt  = 5;                 // len = 6                -32768..32767
  6.   isLongWord  = 2;                 // len = 10               0..4294967295
  7.   isInteger   = 6;                 // len = 11               -2147483648..2147483647
  8.   {$If defined(CPUX86_64) or defined(aarch64)}
  9.   isQWord     = 3;                 // len = 20               0..18446744073709551615
  10.   isQInt      = 7;                 // len = 20               -9223372036854775808..9223372036854775807
  11.   {$IfEnd}
  12.  
  13. type
  14.   geUseParametr = record
  15.     maxLen: LongWord;
  16.     {$If defined(CPUX86_64) or defined(aarch64)}
  17.     maxNumDiv10: QWord;
  18.     maxNumeric: QWord;
  19.     {$Else}
  20.     maxNumDiv10: LongWord;
  21.     maxNumeric: LongWord;
  22.     {$IfEnd}    
  23.  
  24. var
  25.   resInt64: Int64;  // integer ???
  26.  
  27. procedure SetNumberParametr;      //call at the very beginning. This is to speed up translation work.
  28. function geStrToInt(Str: String; Size: LongWord = isInteger): Boolean;
  29.  
  30. implementation
  31.  
  32. var
  33.   allUseParametr: array[0..7] of geUseParametr;
  34.  
  35. procedure SetNumberParametr;
  36. begin
  37.   allUseParametr[isByte].maxLen := 3;
  38.   allUseParametr[isByte].maxNumeric := 255;
  39.   allUseParametr[isByte].maxNumDiv10 := 25;
  40.   allUseParametr[isShortInt].maxLen := 4;
  41.   allUseParametr[isShortInt].maxNumeric := 127;
  42.   allUseParametr[isShortInt].maxNumDiv10 := 12;
  43.   allUseParametr[isWord].maxLen := 5;
  44.   allUseParametr[isWord].maxNumeric := 65535;
  45.   allUseParametr[isWord].maxNumDiv10 := 6553;
  46.   allUseParametr[isSmallInt].maxLen := 6;
  47.   allUseParametr[isSmallInt].maxNumeric := 32767;
  48.   allUseParametr[isSmallInt].maxNumDiv10 := 3276;
  49.   allUseParametr[isLongWord].maxLen := 10;
  50.   allUseParametr[isLongWord].maxNumeric := 4294967295;
  51.   allUseParametr[isLongWord].maxNumDiv10 := 429496729;
  52.   allUseParametr[isInteger].maxLen := 11;
  53.   allUseParametr[isInteger].maxNumeric := 2147483647;
  54.   allUseParametr[isInteger].maxNumDiv10 := 214748364;
  55.   {$If defined(CPUX86_64) or defined(aarch64)}
  56.   allUseParametr[isQWord].maxLen := 20;
  57.   allUseParametr[isQWord].maxNumeric := 18446744073709551615;
  58.   allUseParametr[isQWord].maxNumDiv10 := 1844674407370955161;
  59.   allUseParametr[isQInt].maxLen := 20;
  60.   allUseParametr[isQInt].maxNumeric := 9223372036854775807;
  61.   allUseParametr[isQInt].maxNumDiv10 := 922337203685477580;
  62.   {$IfEnd}
  63. end;                        
  64.  
  65. // проверки на печатаемые символы не производится
  66. function geStrToInt(Str: String; Size: LongWord = isInteger): Boolean;
  67. var
  68.   lenStr, i: LongWord;
  69.   m, n, z: QWord;
  70. begin
  71.   Result := False;
  72.   if (Size < 4) or (Size > 7) then
  73.     Exit;
  74.   // обнуляем, и флаг изначально указан, что не действителен
  75.   resInt64 := 0;
  76.   IntMinus := False;
  77.   lenStr := Length(Str);
  78.   if lenStr = 0 then
  79.     exit;
  80.   i := 1;
  81.   m := Byte(Str[1]);
  82.   if m = 45 then
  83.   begin
  84.     if lenStr = 1 then
  85.       exit;
  86.     IntMinus := True;
  87.     inc(i);
  88.     m := Byte(Str[2]);
  89. //      dec(lenStr);
  90.   end;
  91.   inc(i);
  92.   m := m - 48;
  93.   // проверяем на установленную длину. Но сначала проверим на знак.
  94.   if lenStr > allUseParametr[Size].maxLen then
  95.     Exit;
  96.   while i < lenStr do
  97.   begin
  98.     m := m * 10 + (Byte(Str[i]) - 48);
  99.     inc(i);
  100.   end;
  101.   // Если уже превысили, то выходим
  102.   if m > allUseParametr[Size].maxNumDiv10 then
  103.     exit;
  104.   m := m * 10;
  105.   z := Byte(Str[i]) - 48;
  106.  
  107.   // обработку размерностей и Word и Int надо разделить
  108.   if IntMinus then
  109.     n := allUseParametr[Size].maxNumeric + 1 - m
  110.   else
  111.     n := allUseParametr[Size].maxNumeric - m;
  112.   if z > n then
  113.     exit;
  114.  
  115.   if IntMinus then
  116.     resInt64 := - m - z
  117.   else
  118.     resInt64 := m + z;
  119.   Result := true;
  120. end;              
  121.  

ускорение достаточное для платформы x86 и по моему слишком большое для ARM (у меня изменения скорости показывало на Android в 3-12 раз!).
Следует учесть, что ни каких исключений в коде не произойдёт. Я от них попросту избавился. А StrToInt-FPC может вызвать исключения. В данном коде если число не будет переведено, то мы об этом узнаем по результату функции Boolean. И можем считать данное число из resInt64, если результат был вычислен. Использовать можно для любых платформ.

Ещё раз повторюсь, я не говорю, что команда по разработке FPC не работает! Я говорю, что мелочами занимаются достаточно мало. И я это  понимаю, потому что мелочи - это неблагодарная работа, на которую надо много времени, а результат не всегда хороший.

Google translate:
acceleration is sufficient for the x86 platform and, in my opinion, is too large for ARM (my speed changes showed 3-12 times on Android!).
Please note that no exceptions will occur in the code. I just got rid of them. StrToInt-FPC can throw exceptions. In this code, if the number is not translated, then we will find out about it by the result of the Boolean function. And we can read the given number from resInt64, if the result was calculated. Can be used for any platform.

Again, I'm not saying that the FPC development team isn't working! I say that little things are done very little. And I understand this, because little things are a thankless job that takes a lot of time, and the result is not always good.

P.S. Это только у меня StrToInt не хочет работать на Android? Мне приходится использовать Val напрямую.
P.S. Is it just me StrToInt doesn't want to work on Android? I have to use Val directly.
« Last Edit: October 31, 2021, 12:58:46 am by Seenkao »
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.
Работаю над ZenGL.
Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL. :)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 792
Re: How optimized is the FPC compiler
« Reply #123 on: October 31, 2021, 10:44:17 am »
With only 7 or 8 significant digits, Singles should be processed as Doubles, to prevent unneeded precision loss. Extended is a bit of a strange format, because it's only used by floating point hardware that wants to comply with the IEEE 754 standard, which recommends using an internal, 80 bit format to prevent that precision loss. It's not meant to be used directly in code.

Seen from that perspective, it makes a lot of sense to only use Doubles for all floating-point arithmetic, with the exception of things like currency, which are actually fixed-point integers. And in all cases, floating point hardware probably expands it to those 80 bits before processing.

Even more so, only SIMD/vector units tend to use other formats: they use larger execution units, like 256 bits wide, but can also split those into multiple, smaller words. That should only be used for speed, not for precision. If you don't mind the lowest bits containing nonsense, depending on the (amount of) operations performed.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 9769
  • FPC developer.
Re: How optimized is the FPC compiler
« Reply #124 on: October 31, 2021, 11:49:14 am »
function geStrToInt(Str: String; Size: LongWord = isInteger): Boolean;

Const the string?
 

Seenkao

  • Sr. Member
  • ****
  • Posts: 310
    • New ZenGL.
Re: How optimized is the FPC compiler
« Reply #125 on: October 31, 2021, 12:02:14 pm »
Не совсем понял вопроса, но как я понял по поводу "Str". Тут вероятно неправильно просто указал, можно изменить допустим на "numStr".
Если вопрос по тому что в функции не указано, что "Str"-константа. Тут я не могу сказать достаточно точно. Но не желательно вообще какого-то изменения в строке, это вызовет накладные расходы самим компилятором (если это не отключено по умолчанию).

Google translate:
I didn't quite understand the question, but as I understood about "Str". It probably just indicated it incorrectly, you can change it to "numStr".
If the question is because the function does not specify that "Str" is a constant. Here I cannot say precisely enough. But it is not desirable to change any line at all, it will cause overhead by the compiler itself (unless it is disabled by default). And it is not necessary.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.
Работаю над ZenGL.
Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL. :)

ASerge

  • Hero Member
  • *****
  • Posts: 1900
Re: How optimized is the FPC compiler
« Reply #126 on: October 31, 2021, 12:22:19 pm »
I didn't quite understand the question, but as I understood about "Str".
It is more efficient to preface parameters of managed types with a const modifier.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 792
Re: How optimized is the FPC compiler
« Reply #127 on: October 31, 2021, 12:36:21 pm »
I just checked: on an Armv8, if you have a float unit, it is IEEE 754 compliant as well (which implies 80 bit calculations) and the vector unit seems to be 128 bits wide, but only supports up to 64 bit operations. Interestingly enough, both support half-floats (16 bits, 3-4 significant figures). Perhaps for crude graphics or neural networks? Memory is at a premium on a microcontroller.

Quote from: Armv8-A and Armv8-R Architectures
The Armv8 architecture supports single-precision (32-bit) and double-precision (64-bit) floating-point data types and arithmetic as defined by the IEEE 754 floating-point standard. It also supports the half-precision (16-bit) floating-point data type for data storage, by supporting conversions between single-precision and half-precision data types and double-precision and half-precision data types. When Armv8.2-FP16 is implemented, it also supports the half-precision floating-point data type for data-processing operations.

That leaves the question: do you want maximum speed on hardware without floating point support, even if that means that the results of your application will differ depending on the platform used? And if so, how easy should it be to upload libraries to the FPC repository that implement one such subset of calculations in that specific assembly language?

Seenkao

  • Sr. Member
  • ****
  • Posts: 310
    • New ZenGL.
Re: How optimized is the FPC compiler
« Reply #128 on: November 01, 2021, 02:29:01 am »
SymbolicFrank, я не думаю, что перевести строковую константу в числа с плавающей точкой сложно, даже учитывая все нюансы. Но, вероятно здесь мы в самом деле должны учитывать какой длины будут данные числа (80, 64, 32 или 16 бит).
Я этим пока не занимался. Буду или нет, пока не знаю. Я и так достаточно немало времени убил на разную мелочь. А надо ещё подучить ассемблер ARM/ARM64, чтоб лучше понимать, что можно улучшить, а что нет.

Google yranslate:
SymbolicFrank, I don't think it is difficult to translate a string constant into floating point numbers, even taking into account all the nuances. But, probably here we really have to take into account how long the given numbers will be (80, 64, 32 or 16 bits).
I haven't done this yet. Whether I will or not, I don’t know yet. I've already killed quite a lot of time on various trifles. And I also need to learn ARM/ARM64 assembler in order to better understand what can be improved and what cannot.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.
Работаю над ZenGL.
Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL. :)

munair

  • Hero Member
  • *****
  • Posts: 781
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: How optimized is the FPC compiler
« Reply #129 on: December 31, 2021, 12:39:34 pm »
I generally notice the executables compiled by Free Pascal are over twice as big and run about half the speed compared with the same code compiled by Delphi7.

Just like more modern Delphi.  D7 is minimalist and has no deep support for unicode, or anything else after 2003 or so.  If you need to compare to a Delphi, use a recent one, not something ancient.

Also, binary size minimization is no core target at the moment, nobody wants to do complex work on it, like improving smartlinking (except Pascaldragon, occasionally)
Not to mention that D7 targeted Windows only. Big difference.
keep it simple

munair

  • Hero Member
  • *****
  • Posts: 781
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: How optimized is the FPC compiler
« Reply #130 on: December 31, 2021, 12:45:49 pm »
Did anyone notice the post dates jumped back after reply #110?
keep it simple

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7684
  • Debugger - SynEdit - and more
    • wiki
Re: How optimized is the FPC compiler
« Reply #131 on: December 31, 2021, 12:51:25 pm »
Did anyone notice the post dates jumped back after reply #110?
Does it?
110: 2020-Dec-26
111: 2021-Sep-21
112: 2021-Sep-23


munair

  • Hero Member
  • *****
  • Posts: 781
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: How optimized is the FPC compiler
« Reply #132 on: December 31, 2021, 12:55:20 pm »
In any case, I don't really understand the comparison between languages optimization-wise. It's like asking who has the fastest car. Maybe in the old days when resources were limited, optimization could make a big difference. But today, the more relevant question is what language does a programmer prefer for specific targets, and there are a lot more considerations than "which language is fastest". As far as C-like languages are concerned, despite the never ending comparison discussions, there is a lot that these languages have against them. I generally find these discussions completely pointless.
keep it simple

munair

  • Hero Member
  • *****
  • Posts: 781
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: How optimized is the FPC compiler
« Reply #133 on: December 31, 2021, 12:56:07 pm »
Did anyone notice the post dates jumped back after reply #110?
Does it?
110: 2020-Dec-26
111: 2021-Sep-21
112: 2021-Sep-23
LOL, you're right.
keep it simple

Akira1364

  • Hero Member
  • *****
  • Posts: 559
Re: How optimized is the FPC compiler
« Reply #134 on: January 02, 2022, 07:15:19 am »
This is an old thread, but, however: while the compiler's codegen is pretty good overall, the RTL / packages are (sadly) chock full of code written seemingly without any kind of understanding whatsoever of how certain keywords actually interact with optimization. Like, if you pass something such as a large-ish record or reference-counted string by value in FPC, the compiler will generate horrendous code 100% of the time, period, end of story. You have to use `const` or `constref` or `var` depending on the context. It's not optional, particularly if you're writing something intended for use by a large number of people.

Furthermore I don't even want to discuss the amount of time I've spent going into my local copy of the RTL / package sources and adding the `inline` modifier to one-liners that very clearly should should have been written with it to begin with, as it's just frustrating. Obviously it'd be nice if FPC had a more advanced form of the "AutoInline" switch turned on at all times so that you could just rely on it to do the right thing, but as it stands in reality it does the opposite (which is to say, no function without the `inline` modifer will ever be inlined, no matter what).

TLDR: great compiler, unfortunately shipped with overall too many slow libraries that constantly amount to "four to five completely un-inlined one-line function calls where each one just calls the next one, and the last one probably conditionally raises some silly exception with a resourcestring borrowed nearly verbatim from Delphi".

Not to say that there aren't decent alternatives: e.g. while TFPList (in my opinion) basically should have the `deprecated` modifier applied to it as a whole (since it's not that well optimized at all and goes against all notions of type safety by way of requiring mandatory "void pointer to literally anything" casts while using it), you can at the very least find some good stuff to replace it with both in the `FGL` unit and `Generics.Collections` unit that ship with FPC too. Or if you're willing to use third-party stuff, I highly recommend this library, which I think is basically unparalleled in quality as far as all data structures ever written in FPC go: https://github.com/avk959/LGenerics
« Last Edit: January 02, 2022, 07:33:19 am by Akira1364 »

 

TinyPortal © 2005-2018