I think the addition is element for element into an array of the same dimension as the other three.
In which case I think the OP's method looks OK.
But I could be wrong of course.
No you are not wrong. But the compiler can use simd instructions for simple math (vectors here) and that can lead to a considerable speed increase and that is what Fred wants.
This is not about the algorithm perse, but about squeezing the most speed - within reason - out of the compiler. In effect my suggestion is that Fred ends up with a compiler and rtl that is specifically suited to audio..
(because + and concat() will be optimized)
Then you don't need assembler, because you can do everything in Pascal.
Not many people know this, but this can be done with FPC and the generated code is often better than you or me or Fred can do ourselves. So in this case it is not about the algorithm, but about compiler options and the way the rtl is built.
The make script for FPC has an OPT="<whatever option you want>" for that.