interestingly, at least I think it is, my original code was is as efficient as Howard's code, until you start adding the -O2 through -O4 optimizations, there Howard's code is as fast s the assembler program (at least at -O4)..
To be honest I never really paid attention to the optimizations, now I need to read up on those.
Thanks for all the help.
Bas