you need to look at the generated code for a 32 bit target..
as I see it now, it looks like it saves a little by using registers instead of the stack always but this is in 64 bit mode, you may want to see the results in 32 bit mode which I still write lots of..