@ALLIGATOR,
I find your results truly baffling.
Anyway, I'm using FPC v3.2.2 (the stable release version) on Win7 on an i860 (a fairly old CPU but that doesn't matter in this case.)
These are the results I get from the program you linked to (program untouched, run as-is):
16, Not packed 1: 56895 ms.
9, Packed 1: 13330 ms.
results with O4:
16, Not packed 1: 15464 ms.
9, Packed 1: 13147 ms.
It's quite telling that with O4 the packed time is basically unchanged whereas the unpacked one is over almost 4 times faster.
There's something fishy in there.
@BrunoK,
I understand what you're saying about the packing and the cache space but, this is the first time _ever_ that I see unaligned data be processed faster than aligned data.
I cannot argue the timing results, they are what they are but, I believe there is something "not quite right" someplace. I just don't know where.
I believe that in the worst possible case, aligned data would take as long as unaligned data.
IOW, FPC has got the world upside down.
ETA:
I wonder what the results from Delphi are...
It would be nice if someone with a copy of Delphi would run the programs and post the results.