There are very few setlength() calls, none in the inner loop. I did try Warfley's PSSingle() trick, but do not see much advantage. Testing this, I found that virtually all the difference is due to a memory copy that DOES happen inside a for loop:
For the dynamic arrays I use:
tmp := copy(img, imgp, imgp + nx - 1);
while for GetMemory I use
Move(img^[imgp], tmp^[0],nx*4);//src,dst
I found that I could get similar performance for dynamic arrays by using
Move(img[imgp], tmp[0],nx*4);//src,dst
So in my case, the culprit was copy. Are there any comments on my hack of using a move instruction? Is there any alternative to the `copy` instruction - here the length of the array never changes.