That highly depends on the chip and the chip maker.
E.g. repne scasd and family used to be slow on Intel, but fast on AMD and VIA.
Also it depends for FPC on your optimization settings regarding CPU family -Cp<cpu> and -Op<cpu> and cache control, for that matter.
In general the compiler chooses a best option. I would not waste time on assembler optimizations, unless you have to. And exactly know the processor make and model you are optimizing for......
It often makes me laugh...

Hey I can write optimized code in assembler!(NOT!) which translates to: Hey I am wasting my time....in most but not all cases.