First, my delay in replying is because I live in Australia and the time difference causes problems.
Jonas wrote "OS X does not support more than 4GB of statically allocated data, even on 64 bit platforms (probably for efficiency reasons)."
But this contradicts the Apple developer document.
No, it does not.
Also, it seems pointless to have 64 bit addressing if it cannot be used.
It can perfectly be used, just not for
statically allocated data. You can dynamically allocate as much as you want (up to the limits mentioned in the Apple developer documentation). In FPC you can do this, a.o., using getmem/freemem/new/dispose, dynamic arrays and classes.
This test appears to show that 64-bit addressing is working. If so, why don't the declarations in the first post work?
It's because the limit is related to addresses that are encoded directly in an instruction. As long as such an address is within 4GB of the instruction that accesses it, things will work. The address that is encoded in your test program is the begin address of BigArray, and that address is well with 4GB of your main program. If you would try to directly access BigArray[high(BigArray)], then again you would get an error because that address is located at 9.6GB from your main program. The reason it works if you use an index, is that then again the start address of BigArray is directly encoded in the instruction, and afterwards the index is loaded, it is multiplied by 8, and then added to this first address (all using —semantically— separate operations). So part of the address that is encoded in the instruction will still be within the 4GB limit.
Technically: the relocation used on OS X to address symbols in a position-independent way is limited to a 4GB offset (32 bit RIP-relative offset). While it is possible to use operations that allows for a full 64 bit offset to be statically encoded, this is slower and increases the code size (you need an extra, and large, instruction per access). C compilers use the slow version by default and the fast version if you compile with -fno-common. FPC always uses the fast version. It would be possible to add a command line option to force the use of the slow version, but that would be work for very little gain (and we would have to compile all of our RTL code also with that switch for it to be useful, because if your main program contains an 8GB array and data from the RTL is placed after it, we can no longer use the fast instruction variant either).