This came up this week. There is compiler support for it in two ways: See: https://forum.lazarus.freepascal.org/index.php/topic,57009.0.html
and the link to Rosetta code.
Both the + operator and Concat() are fast and safe.The concat() may be very slightly faster.
{$mode objfpc} {$modeswitch arrayoperators+} var arr0, arr1, arr2, arr3 : array of single; begin arr0 := concat(arr1,arr2,arr3); // or arr0 := arr1+arr2+arr3; end.
In Delphi mode the array + operator is on by default. Concat() can be used in almost all modes.
Also note FPC can optimize your code using simd instructions on Intel and Arm, so experiment with the compiler settings. Concat() and + may need FPC to rebuild the compiler and rtl with optimum settings for maximum speed. For audio - I believe your interest -, this is certainly worth the trouble. FPC is build rather conservative as standard.
You could try and build FPC and RTL with -CfAVX2 and with -dFASTMATH for Intel. This will still cover most Intel/AMD machines in the wild. For arm, something similar can be done.
For audio - I believe your interest -,Indeed! ;)
I think the addition is element for element into an array of the same dimension as the other three.No you are not wrong. But the compiler can use simd instructions for simple math (vectors here) and that can lead to a considerable speed increase and that is what Fred wants.
In which case I think the OP's method looks OK.
But I could be wrong of course.
About this : cfloat = 0.3456 ;. declare it as a (global) const. This will help the compiler and does not take up stack space.
Those C'sms are strongly discouraged. Those crept in a long time ago, but even the devs never use it.
I shall leave the optimisations to the fp experts here, for I am a beginner in fp.
But I do note that things like
arr[i]+=arr2[i]; arr[i]*=arr2[i];
do not work with build 3.2.2 or 3.2.0.
(Win 10)
Ok but this was only to show the thing, does it exist optimization too for multiplication, like done with concact() for addition ?Yes. the -Cf<XXX> flag will optimze that and more too. You should be able to examine the assembler output.
I think you should look for an optimised way to add (or in your second case, multiply) several floating point values.
I notice a slight boost using parameters instead of globals.
uses sysutils; type aos=array of single; var arr0, arr1, arr2, arr3 : array of single; procedure AddArrays() ; var i : integer; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := arr1[i] + arr2[i] + arr3[i]; end; end; procedure AddArrays2(var arr0:aos;arr1:aos;arr2:aos;arr3:aos); var i : integer; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := arr1[i] + arr2[i] + arr3[i]; end; end; var i,k:int64; lim:int64=200000000; t,t2,totg,totp:int64; begin writeln('Please wait a second or so . . .'); setlength(arr0,lim); setlength(arr1,lim); setlength(arr2,lim); setlength(arr3,lim); for i:=0 to lim-1 do begin arr1[i]:=i; arr2[i]:=i; arr3[i]:=i; end; totp:=0; totg:=0; for k:=1 to 10 do begin t:=gettickcount64; AddArrays(); t2:=gettickcount64-t; totg:=totg+t2; write(t2,' global '); for i:=1 to 5 do write(arr0[i],' '); writeln; t:=gettickcount64; AddArrays2(arr0,arr1,arr2,arr3); t2:=gettickcount64-t; totp:=totp+t2; write(t2,' params '); for i:=1 to 5 do write(arr0[i],' '); write(' ',k,' of 10'); writeln; writeln; end; writeln('Totals'); writeln('Time with globals ',totg); writeln('Time with params ',totp); writeln('Press return to end'); readln; end.
Please wait a second or so . . .
.
.. // 10 x the code that follow from 1 to 10
0
0 global 0 0 0 0 0
0 params 0 0 10 of 10
Totals
Time with globals 0
Time with params 0
Press return to end
f./testarray
Filling arrays, please wait a second or so . . .
OK arrays filled, START the race . . .
...
Time with globals 9449
Time with params var 9422
Time with params const 9379
Time with params nil 9408
Time with globals 8835
Time with params var 8813
Time with params const 8936
Time with params nil 8801
what the point of arro[0]:=0;
Hello.
With this code lightly changed from BobDog-code :
[EDITED X 3] The code was changed added bytebites solution and method with parameters but without var or const.
program testarray; uses sysutils; type aos=array of single; var arr0, arr1, arr2, arr3 : array of single; ratio : single = 0.123; ratio1 : single = 0.345; ratio2 : single = 0.567; ratio3 : single = 0.890; procedure CalculArrays() ; // add and multiply var i : int64; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := ratio * ( (ratio1 * arr1[i]) + (ratio2 * arr2[i]) + (ratio3 * arr3[i]) ); // here ratio multiply end; end; procedure CalculArrays2(var arr0:aos;arr1:aos;arr2:aos;arr3:aos); // add and multiply var i : int64; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := ratio * ( (ratio1 * arr1[i]) + (ratio2 * arr2[i]) + (ratio3 * arr3[i]) ); // here ratio multiply end; end; procedure CalculArrays3(const arr0,arr1,arr2,arr3:aos); // var i : int64; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := ratio * ( (ratio1 * arr1[i]) + (ratio2 * arr2[i]) + (ratio3 * arr3[i]) ); // here ratio multiply end; end; procedure CalculArrays4(arr0,arr1,arr2,arr3:aos); // var i : int64; begin // all arrays have the same size for i := 0 to length(arr0) -1 do begin arr0[i]:=0; arr0[i] := ratio * ( (ratio1 * arr1[i]) + (ratio2 * arr2[i]) + (ratio3 * arr3[i]) ); // here ratio multiply end; end; var lim:int64=200000000; i,k,t,t2,totg,totp, totp2, totp3 :int64; begin writeln('Filling arrays, please wait a second or so . . .'); setlength(arr0,lim); setlength(arr1,lim); setlength(arr2,lim); setlength(arr3,lim); for i:=0 to lim-1 do begin arr1[i]:=i; arr2[i]:=i; arr3[i]:=i; end; totp:=0; totp2:=0; totp3:=0; totg:=0; writeln('OK arrays filled, START the race . . .'); for k:=1 to 5 do begin writeln('Pass ',k); t:=gettickcount64; CalculArrays(); t2:=gettickcount64-t; totg:=totg+t2; t:=gettickcount64; CalculArrays2(arr0,arr1,arr2,arr3); t2:=gettickcount64-t; totp:=totp+t2; t:=gettickcount64; CalculArrays3(arr0,arr1,arr2,arr3); t2:=gettickcount64-t; totp2:=totp2+t2; t:=gettickcount64; CalculArrays4(arr0,arr1,arr2,arr3); t2:=gettickcount64-t; totp3:=totp3+t2; end; writeln('Time with globals ',totg); writeln('Time with params var ',totp); writeln('Time with params const ',totp2); writeln('Time with params nil ',totp3); writeln('Press return to end'); readln; end.
I get this as result on Linux 64 and fpc 3.2.2, i5, 16 G ram.
You may see that the difference is very light (here params const is faster).Quotef./testarray
Filling arrays, please wait a second or so . . .
OK arrays filled, START the race . . .
...
Time with globals 9449
Time with params var 9422
Time with params const 9379
Time with params nil 9408
[EDITED]
Same test but compiled with -O3 optimization: (here params nil is faster).
QuoteTime with globals 8835
Time with params var 8813
Time with params const 8936
Time with params nil 8801
Fre;D
Did you use the "classical" fpc or the one compiled for LLVM ?
If it was with the "fpc-LLVM", can you compare your result with the "classical" fpc ?
Thanks.
Fre;D