Recent

Author Topic: Optimising as you programme  (Read 1740 times)

dbannon

  • Hero Member
  • *****
  • Posts: 2796
    • tomboy-ng, a rewrite of the classic Tomboy
Optimising as you programme
« on: September 14, 2022, 10:11:58 am »
In another thread, ASerge made a couple of observation. ASerge, I hope you don't mind me quoting you here ?

1. aBox.Items.Count is called three times. Property properties - it is better to write to a local variable.
2. For indexes, I prefer the processor-optimized SizeInt type instead of Integer.

I'd really like to know more about these sort of issues. For example -

* How many times would you need to call a property before its better to cache it locally ?  That would, IMHO, depend on the type too, the cost of declaring an integer and copying data to it would be less than, say a string (that is generally a managed ANSIString for most of us).

* A "processor-optimized SizeInt" ?  but maybe there would be more to be saved by just using a 32bit var in all cases ?  A 32bit operation would be faster than a 64bit one even on a 64bit machine ?  And save a touch of memory ? Most of the examples I can think of could, just, get away  with a 16bit so 32bit is outrageously 'safe'.

* How much slower is it to ask another unit (or class) the value of one of its variables compared to a local var ?

Although its no longer fashionable to consider how such things might speed an app up fractionally or save a few bytes of memory footprint, I like to but can find little written on the topic specifically about FPC use.

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Optimising as you programme
« Reply #1 on: September 14, 2022, 10:50:28 am »
1. Make it work
2. Make it readable
3. Make it fast

That third step is often unnecessary, as you won't improve the speed of the program in 95% of the cases. Like, it is probably a waste of time to optimize accessing REST services, because it takes hundreds of milliseconds for the call to complete.

440bx

  • Hero Member
  • *****
  • Posts: 4029
Re: Optimising as you programme
« Reply #2 on: September 14, 2022, 11:06:48 am »
I'd really like to know more about these sort of issues. For example -

* How many times would you need to call a property before its better to cache it locally ?  That would, IMHO, depend on the type too, the cost of declaring an integer and copying data to it would be less than, say a string (that is generally a managed ANSIString for most of us).

* A "processor-optimized SizeInt" ?  but maybe there would be more to be saved by just using a 32bit var in all cases ?  A 32bit operation would be faster than a 64bit one even on a 64bit machine ?  And save a touch of memory ? Most of the examples I can think of could, just, get away  with a 16bit so 32bit is outrageously 'safe'.

* How much slower is it to ask another unit (or class) the value of one of its variables compared to a local var ?

Although its no longer fashionable to consider how such things might speed an app up fractionally or save a few bytes of memory footprint, I like to but can find little written on the topic specifically about FPC use.

Davo
It unwise to offer "rules of thumb" for the questions you're asking.  The answer to those questions is obtained by reviewing the assembly code the compiler generated. 

There is one question that can be answered unequivocally, which is "* How much slower is it to ask another unit (or class) the value of one of its variables compared to a local var ?". In the case of a unit, for the variable to be accessible in another unit, the variable must be global, because of that it should be a tiny bit faster than accessing a local variable.  In the case of a class, accessing a class field is a little slower than accessing a local variable because the class field is accessed using the class pointer which is a little bit more involved than accessing a local variable (presuming the local variable is accessed using BP/EBP.)

For "processor optimized" questions, it is necessary to look at the instructions generated and taking into account the number of clock cycles involved. Rarely is the gain in speed worth the additional work.

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: Optimising as you programme
« Reply #3 on: September 14, 2022, 11:22:30 am »
WPO can also help, but is a bit uncomfortable to set up.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5481
  • Compiler Developer
Re: Optimising as you programme
« Reply #4 on: September 14, 2022, 01:20:12 pm »
* A "processor-optimized SizeInt" ?  but maybe there would be more to be saved by just using a 32bit var in all cases ?  A 32bit operation would be faster than a 64bit one even on a 64bit machine ?  And save a touch of memory ? Most of the examples I can think of could, just, get away  with a 16bit so 32bit is outrageously 'safe'.

The point here is also if you iterate a string or dynamic array of whom the index already is a SizeInt then you might get problems if you use a LongInt as index variable on a 64-bit system (though, granted, a string or array that large problems results in other problems ;) ).

There is one question that can be answered unequivocally, which is "* How much slower is it to ask another unit (or class) the value of one of its variables compared to a local var ?". In the case of a unit, for the variable to be accessible in another unit, the variable must be global, because of that it should be a tiny bit faster than accessing a local variable.  In the case of a class, accessing a class field is a little slower than accessing a local variable because the class field is accessed using the class pointer which is a little bit more involved than accessing a local variable (presuming the local variable is accessed using BP/EBP.)

It depends: on *nix systems with PIC code the global variable would be more expensive due to an additional indirection. Similar on e.g. Aarch64 where loading an absolute address requires two operations to load the full address compared to one for a register based load.

440bx

  • Hero Member
  • *****
  • Posts: 4029
Re: Optimising as you programme
« Reply #5 on: September 14, 2022, 01:38:08 pm »
It depends: on *nix systems with PIC code the global variable would be more expensive due to an additional indirection. Similar on e.g. Aarch64 where loading an absolute address requires two operations to load the full address compared to one for a register based load.
Yes, I am definitely guilty of seeing it the way it happens on intel running Windows.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 2242
Re: Optimising as you programme
« Reply #6 on: September 14, 2022, 10:03:18 pm »
* How many times would you need to call a property before its better to cache it locally ?  That would, IMHO, depend on the type too, the cost of declaring an integer and copying data to it would be less than, say a string (that is generally a managed ANSIString for most of us).
Almost always when a property is a method (which is very common). It's just that in an invisible case the FPC will allocate a separate variable each assignment and will not guess to use its value again.

Arioch

  • Sr. Member
  • ****
  • Posts: 421
Re: Optimising as you programme
« Reply #7 on: September 14, 2022, 10:48:26 pm »
...and will not guess to use its value again.

And rightly so!!!

What makes you think a function would return the same value on 2nd, 3rd and other calls?

Some do, some do not. FPC does not know.

dbannon

  • Hero Member
  • *****
  • Posts: 2796
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Optimising as you programme
« Reply #8 on: September 15, 2022, 03:27:28 am »
The point here is also if you iterate a string or dynamic array of whom the index already is a SizeInt then you might get problems if you use a LongInt as index variable on a 64-bit system (though, granted, a string or array that large problems results in other problems ;) ).
Hmm, not sure I understand. Does an (eg) array have, internally, a index of its own ? It has some size data, but I would have thought any Index came from 'outside'. As long as that outside index can handle its high() all should be good. Or am I over simplifying here ?  My thoughts here are that the arrays, strings etc I am handing in my app, are probably ok with a 16bit int and could not possibly be a problem with a 32bit one. So, would a 32bit index be faster on a 64bit system ?


Quote
It depends: on *nix systems with PIC code the global variable would be more expensive due to an additional indirection. Similar on e.g. Aarch64 where loading an absolute address requires two operations to load the full address compared to one for a register based load.
Ah, yes, I did not think of the deliberate obfuscation involved with PIC. Debian requires PIC but I suspect much code written for desktop systems does not turn it on. Lazarus does not produce PIC by default.

Thanks folks, some interesting answers.  I accept the sort of tweaking I am discussing is not cost effective but no one is paying for my time !
440bx says look at the assembler, how ?  SymbolicFrank says don't worry about it. And ASerge, thanks, I guess that was what I was guessing but nice to get it with some authority!

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

440bx

  • Hero Member
  • *****
  • Posts: 4029
Re: Optimising as you programme
« Reply #9 on: September 15, 2022, 04:54:24 am »
440bx says look at the assembler, how ? 
There are three ways I use to get the assembly code.  The first and simplest is to run the program under the debugger (in Lazarus) and selecting "View->Debug Windows->Assembler", that's very quick and handy to inspect what the compiler produced for some instruction(s).

Another way is to ask the compiler to generate an assembly listing "-al" option when compiling.

Lastly, the sledgehammer, use a disassembler like IDA Pro. This last one is particularly useful when debugging O/S code (outside the program) is also desired.  IDA Pro "understands" both PDB and DWARF debug symbols which is very useful when using FPC on Windows but, it should be noted that there are some things it doesn't like much in FPC's generated DWARF symbols but, the end effect is minimal, definitely very usable.

For most cases, "method 1" is all you need and basically imposes little to no additional time during the development process.

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Seenkao

  • Hero Member
  • *****
  • Posts: 550
    • New ZenGL.
Re: Optimising as you programme
« Reply #10 on: September 15, 2022, 08:10:26 am »
Старайтесь не заниматься микрооптимизациями.
На это уходит много времени, и это малоэффективно.
Эффективность таких оптимизаций заключается только в конечном коде, который будет постоянно использоваться (вы будете часто использовать данный код в своих программах).

Оптимизируйте код, который будет "статичным" при работе кода. Например в циклах.
Мы зачастую делаем вычисления в циклах (вызываемых процедурах). Но код, который используется в таких циклах (вызываемых процедурах) достаточно статичен. Этот код не меняется и зависит только от входных данных.
Я заранее вычислял такие статичные данные и записывал их в таблицы. Из таблицы я брал данные и записывал прямо в цикл (подпрограмму) посредством индексов данных. И код будет выполняться быстрее.

Если вы будете использовать это, смотрите насколько это нужно. Иногда мы делаем лишнюю работу, которая не нужна. Это не нужно делать в том коде, который вызывается очень редко. Или в коде, который мал размером.

Данный способ увеличит вашу программу за счёт заранее вычисленных данных. Но при правильном использовании это может дать до 50% выигрыша в скорости. Зависит от вашего кода.

Google translate:
Avoid micro-optimizations.
It takes a lot of time and is ineffective.
The effectiveness of such optimizations lies only in the final code that will be constantly used (you will often use this code in your programs).

Optimize code that is "static". For example in cycles.
We often do calculations in loops (subroutines). But the code that is used in such cycles (subroutines) is quite static. This code does not change and depends only on the input data.
I calculated such static data in advance and wrote them down in tables. I took data from the table and wrote it directly to the loop (subroutine) using data indexes. And the code will run faster.

If you will use it, see how much it is necessary. Sometimes we do extra work that is not needed. This should not be done in code that is called very infrequently. Or in code that is small in size.

This method will increase your program due to pre-computed data. But when used correctly, it can give up to 50% speed gain. Depends on your code.
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.

Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL

dbannon

  • Hero Member
  • *****
  • Posts: 2796
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Optimising as you programme
« Reply #11 on: September 15, 2022, 08:12:39 am »
OK, just tried method 1 and sure reminded me of how long it is since I did any assembler. Possibly it was Z80 .....

Anyway, yep, I understand now and will need to spend some time understanding ...

Thanks 440bx, useful !

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

PascalDragon

  • Hero Member
  • *****
  • Posts: 5481
  • Compiler Developer
Re: Optimising as you programme
« Reply #12 on: September 15, 2022, 09:19:07 am »
The point here is also if you iterate a string or dynamic array of whom the index already is a SizeInt then you might get problems if you use a LongInt as index variable on a 64-bit system (though, granted, a string or array that large problems results in other problems ;) ).
Hmm, not sure I understand. Does an (eg) array have, internally, a index of its own ? It has some size data, but I would have thought any Index came from 'outside'. As long as that outside index can handle its high() all should be good. Or am I over simplifying here ?

A string or a dynamic array can have a maximum length of High(SizeInt), thus it's best to index them with a SizeInt variable instead of a Int16 or Int32 one.

  My thoughts here are that the arrays, strings etc I am handing in my app, are probably ok with a 16bit int and could not possibly be a problem with a 32bit one. So, would a 32bit index be faster on a 64bit system ?

In the end it will depend upon the processor whether 32-bit operations on a 64-bit processor are faster or not, so I can't provide a general statement here.

Quote
It depends: on *nix systems with PIC code the global variable would be more expensive due to an additional indirection. Similar on e.g. Aarch64 where loading an absolute address requires two operations to load the full address compared to one for a register based load.
Ah, yes, I did not think of the deliberate obfuscation involved with PIC. Debian requires PIC but I suspect much code written for desktop systems does not turn it on. Lazarus does not produce PIC by default.

Considering that more and more distributions are moving towards hardening the binaries (which requires PIE aka PIC for executables) it's likely no longer true that most desktop systems don't turn it on. Also FPC doesn't produce PIC by default because it wants to give the user the control regarding that. You can always throw the corresponding option into the fpc.cfg if that bothers you.

ASerge

  • Hero Member
  • *****
  • Posts: 2242
Re: Optimising as you programme
« Reply #13 on: September 15, 2022, 09:21:07 am »
Avoid micro-optimizations.
It takes a lot of time and is ineffective.
The effectiveness of such optimizations lies only in the final code that will be constantly used (you will often use this code in your programs).
I agree, but not always "a lot of time".
Sometimes it not only speeds up, as an example, the allocation of a separate variable above, but also documents the code. Also allowing you to avoid side effects (as mentioned above). And this is done "automatically", without any difficulties, just a matter of habit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: Optimising as you programme
« Reply #14 on: September 15, 2022, 09:28:24 am »
Although its no longer fashionable to consider how such things might speed an app up fractionally or save a few bytes of memory footprint, I like to but can find little written on the topic specifically about FPC use.

In general this will rarely make a difference outside of loops that are iterated hundred thousand - million times per second.

The first thing to do is to get a good feel over where your program spends its time (e.g. with a profiler)

 

TinyPortal © 2005-2018