There a lots of possibilities.
For starters the speed is not only the code, but also depends on the data. Different data, leads to different mem allocs, and can affect cache hits/misses.
Also the same code may run faster/slower if it is moved a few bytes. That happens, when other code in other procedures grows/shrinks. That too is a cache issue, but not the same cache. Some internal microcode cache in the the cpu. For that try {$codealign loop=32} and/or {$codealign proc=32}
... Though I don't know how that affect 32bit assembler.
And that can go either way, better, worse or keep the same.
Not sure if MS defender may get into the way... It may see 32 bit code as unusual and interfere more.
Probably a ton of other stuff...