@440bx,
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).
Mea culpa, being lazy. In the case of shortstrings, since each string occupies 256 bytes, I didn't want to show the entire layout of the DtFmt array. Once I made that choice, I could simply stay in Lazarus and show the code generated. I usually compile for 32bit when I'm interested in mentally tracking variable usage because in 32bit, very often but not always variable references are fixed stack locations, which makes them easier to keep track of mentally than register contents that keep changing - as is the case in 64bit.
You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?
In this particular case, it was shear laziness. For the LONGSTRINGS OFF, I could use the facilities offered by the Lazarus debugging environment to show everything that needed to be shown. For the LONGSTRINGS ON, using the Lazarus environment gets a bit convoluted because there is an array of pointers to show and the null terminated strings. The memory viewing abilities of the Lazarus environment are a bit convoluted to use. Because of that, I switched to the visual studio debugger which makes it really easy to look at memory anywhere you want.
The original post - where everything is parallel - should have been like this:
LONGSTRINGS OFF00000001000015A8 E8 43 9A 00 00 call main+9A70h (010000AFF0h)
00000001000015AD 48 89 C3 mov rbx,rax
; v move i into eax
00000001000015B0 8B 05 4A EA 01 00 mov eax,dword ptr [main+1EA80h (0100020000h)]
; v multiply by 256 - sizeof(shortstring)
00000001000015B6 48 C1 E0 08 shl rax,8
; v move address of DtFmt array into rdx
; note the address 01000150C0h
; that's where DtFmt is located
00000001000015BA 48 8D 15 FF 3A 01 00 lea rdx,[main+13B40h (01000150C0h)]
; v compute the address of DtFmt[i] into r8
00000001000015C1 4C 8D 04 02 lea r8,[rdx+rax]
; v the rest is writeln stuff
00000001000015C5 48 89 DA mov rdx,rbx
00000001000015C8 B9 00 00 00 00 mov ecx,0
00000001000015CD E8 5E 9C 00 00 call main+9CB0h (010000B230h)
00000001000015D2 E8 C9 5F 00 00 call main+6020h (01000075A0h)
00000001000015D7 48 89 D9 mov rcx,rbx
00000001000015DA E8 91 9B 00 00 call main+9BF0h (010000B170h)
00000001000015DF E8 BC 5F 00 00 call main+6020h (01000075A0h)
the memory layout of the DtFmt array is
; v first array element - note size 0
; 256 bytes reserved for it
0x00000001000150C0 00 20 20 20 20 20 20 20 .
0x00000001000150C8 20 20 20 20 20 20 20 20
0x00000001000150D0 20 20 20 20 20 20 20 20
0x00000001000150D8 20 20 20 20 20 20 20 20
; v 256 bytes later....
; v second array element - note size 16
; 256 bytes reserved for it
0x00000001000151C0 10 79 79 79 79 2d 6d 6d .yyyy-mm
0x00000001000151C8 2d 64 64 20 68 68 3a 6e -dd hh:n
0x00000001000151D0 6e 20 20 20 20 20 20 20 n
0x00000001000151D8 20 20 20 20 20 20 20 20
0x00000001000151E0 20 20 20 20 20 20 20 20
; v 2816 bytes later (last element)
; 256 bytes reserved for it
; note size 0
0x0000000100015BC0 00 20 20 20 20 20 20 20 .
0x0000000100015BC8 20 20 20 20 20 20 20 20
0x0000000100015BD0 20 20 20 20 20 20 20 20
0x0000000100015BD8 20 20 20 20 20 20 20 20
0x0000000100015BE0 20 20 20 20 20 20 20 20
0x0000000100015BE8 20 20 20 20 20 20 20 20
0x0000000100015BF0 20 20 20 20 20 20 20 20
0x0000000100015BF8 20 20 20 20 20 20 20 20
0x0000000100015C00 20 20 20 20 20 20 20 20
0x0000000100015C08 20 20 20 20 20 20 20 20
0x0000000100015C10 20 20 20 20 20 20 20 20
0x0000000100015C18 20 20 20 20 20 20 20 20
0x0000000100015C20 20 20 20 20 20 20 20 20
0x0000000100015C28 20 20 20 20 20 20 20 20
0x0000000100015C30 20 20 20 20 20 20 20 20
0x0000000100015C38 20 20 20 20 20 20 20 20
0x0000000100015C40 20 20 20 20 20 20 20 20
0x0000000100015C48 20 20 20 20 20 20 20 20
0x0000000100015C50 20 20 20 20 20 20 20 20
0x0000000100015C58 20 20 20 20 20 20 20 20
0x0000000100015C60 20 20 20 20 20 20 20 20
0x0000000100015C68 20 20 20 20 20 20 20 20
0x0000000100015C70 20 20 20 20 20 20 20 20
0x0000000100015C78 20 20 20 20 20 20 20 20
0x0000000100015C80 20 20 20 20 20 20 20 20
0x0000000100015C88 20 20 20 20 20 20 20 20
0x0000000100015C90 20 20 20 20 20 20 20 20
0x0000000100015C98 20 20 20 20 20 20 20 20
0x0000000100015CA0 20 20 20 20 20 20 20 20
0x0000000100015CA8 20 20 20 20 20 20 20 20
0x0000000100015CB0 20 20 20 20 20 20 20 20
0x0000000100015CB8 20 20 20 20 20 20 20 20
; ^ total size of the array above is
; 12 * 256
; v this string added just to mark
; the end of the above array
0x0000000100015CC0 1a 61 62 63 64 65 66 67 .abcdefg
0x0000000100015CC8 68 69 6a 6b 6c 6d 6e 6f hijklmno
0x0000000100015CD0 70 71 72 73 74 75 76 77 pqrstuvw
0x0000000100015CD8 78 79 7a 20 20 20 20 20 xyz
0x0000000100015CE0 20 20 20 20 20 20 20 20
In that case the following is true:
1. there is no array of pointers pointing to individual strings
2. strings are in a READ/WRITE section of the executable, therefore they can be modified.
3. a '' string has 256 bytes reserved in the READ/WRITE section it resides in. (therefore it can be changed)
note the address of the first element 0x00000001000150C0
in-memory layout of the executable
; start of executable image
0x100000000, Image, 156 kB, WCX, PascalArrayOfStrings.exe
0x100000000, Image: Commit, 4 kB, R, PascalArrayOfStrings.exe
0x100001000, Image: Commit, 80 kB, RX, PascalArrayOfStrings.exe
; v read/write attribute
; strings can be modified
0x100015000, Image: Commit, 12 kB, RW, PascalArrayOfStrings.exe
; remainder of executable image
0x100018000, Image: Commit, 32 kB, R, PascalArrayOfStrings.exe
0x100020000, Image: Commit, 16 kB, RW, PascalArrayOfStrings.exe
0x100024000, Image: Commit, 4 kB, WC, PascalArrayOfStrings.exe
0x100025000, Image: Commit, 4 kB, RW, PascalArrayOfStrings.exe
0x100026000, Image: Commit, 4 kB, WC, PascalArrayOfStrings.exe
the address is in a read/write PE section which makes the strings
modifiable.
The first post had the LONGSTRINGS OFF case severely "abbreviated".
Just to make this post complete, the LONGSTRINGS ON case is a straight copy of
the original post.
LONGSTRINGS ONHere is what happens for the very same array when LONGSTRINGS is on:
v '' string
0x00000001000150D0 0000000000000000 0000000100017018 .........p......
0x00000001000150E0 0000000100017048 0000000100017070 Hp......pp......
0x00000001000150F0 0000000100017098 00000001000170c0 ˜p......Àp......
0x0000000100015100 00000001000170e0 0000000100017100 àp.......q......
0x0000000100015110 0000000100017120 0000000100017140 q......@q......
0x0000000100015120 0000000100017160 0000000000000000 `q..............
^ '' string
0x0000000100015130 000000010001f060 0001f07000000008 `ð..........pð..
0x0000000100015140 0000038000000001 000000010001f400 ....€....ô......
0x0000000100015150 0001f79000000380 0000038000000001 €....÷......€...
The following is true in that case:
1. there is an array of pointers (to null terminated strings)
2. a '' is simply a null pointer
for reference, note the address of the array of pointers, address 0x1000150D0
Following the pointers in the array, shows
0x0000000100017018 79 79 79 79 2d 6d 6d 2d 64 64 20 68 yyyy-mm-dd h
0x0000000100017024 68 3a 6e 6e 00 00 00 00 00 00 00 00 h:nn........
0x0000000100017030 00 00 01 00 00 00 00 00 ff ff ff ff ........ÿÿÿÿ
0x000000010001703C ff ff ff ff 0a 00 00 00 00 00 00 00 ÿÿÿÿ........
0x0000000100017048 79 79 79 79 2d 6d 6d 2d 64 64 00 00 yyyy-mm-dd..
0x0000000100017054 00 00 00 00 00 00 01 00 00 00 00 00 ............
0x0000000100017060 ff ff ff ff ff ff ff ff 0a 00 00 00 ÿÿÿÿÿÿÿÿ....
0x000000010001706C 00 00 00 00 79 79 79 79 2d 6d 6d 2d ....yyyy-mm-
0x0000000100017078 64 64 00 00 00 00 00 00 00 00 01 00 dd..........
0x0000000100017084 00 00 00 00 ff ff ff ff ff ff ff ff ....ÿÿÿÿÿÿÿÿ
0x0000000100017090 09 00 00 00 00 00 00 00 64 64 64 64 ........dddd
0x000000010001709C 64 64 20 74 74 00 00 00 00 00 00 00 dd tt.......
0x00000001000170A8 00 00 01 00 00 00 00 00 ff ff ff ff ........ÿÿÿÿ
0x00000001000170B4 ff ff ff ff 06 00 00 00 00 00 00 00 ÿÿÿÿ........
0x00000001000170C0 64 64 64 64 64 64 00 00 00 00 01 00 dddddd......
0x00000001000170CC 00 00 00 00 ff ff ff ff ff ff ff ff ....ÿÿÿÿÿÿÿÿ
0x00000001000170D8 02 00 00 00 00 00 00 00 74 74 00 00 ........tt..
0x00000001000170E4 00 00 00 00 00 00 01 00 00 00 00 00 ............
0x00000001000170F0 ff ff ff ff ff ff ff ff 07 00 00 00 ÿÿÿÿÿÿÿÿ....
0x00000001000170FC 00 00 00 00 64 64 64 64 64 20 74 00 ....ddddd t.
0x0000000100017108 00 00 01 00 00 00 00 00 ff ff ff ff ........ÿÿÿÿ
0x0000000100017114 ff ff ff ff 05 00 00 00 00 00 00 00 ÿÿÿÿ........
0x0000000100017120 64 64 64 64 64 00 00 00 00 00 01 00 ddddd.......
0x000000010001712C 00 00 00 00 ff ff ff ff ff ff ff ff ....ÿÿÿÿÿÿÿÿ
0x0000000100017138 01 00 00 00 00 00 00 00 74 00 00 00 ........t...
0x0000000100017144 00 00 00 00 00 00 01 00 00 00 00 00 ............
0x0000000100017150 ff ff ff ff ff ff ff ff 01 00 00 00 ÿÿÿÿÿÿÿÿ....
0x000000010001715C 00 00 00 00 66 00 00 00 00 00 00 00 ....f.......
0x0000000100017168 2a 70 72 65 73 73 20 3c 65 6e 74 65 *press <ente
0x0000000100017174 72 3e 2f 3c 72 65 74 75 72 6e 3e 20 r>/<return>
0x0000000100017180 74 6f 20 65 6e 64 20 74 68 69 73 20 to end this
0x000000010001718C 70 72 6f 67 72 61 6d 00 00 00 00 00 program.....
A very important thing to notice is the location where the array of pointers is located and
the location where the strings are located. They are NOT located in the same PE file section.
Because they are not in the same section of the PE file and, the sections have different
READ/WRITE attributes, the pointers can be altered but the string the original pointers
pointed to cannot because they reside in a READ ONLY section of the PE file.
That's why when LONGSTRINGS are OFF the strings are writeable but when LONGSTRINGS are
ON, attempting to change the contents of the string results in an access violation.
Here is the in-memory view of the mapped program:
0x100000000, Image, 152 kB, WCX, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100000000, Image: Commit, 4 kB, R, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100001000, Image: Commit, 80 kB, RX, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100015000, Image: Commit, 8 kB, RW, lib\x86_64-win64\PascalArrayOfStrings.exe // the pointer array is in a read/write section
0x100017000, Image: Commit, 32 kB, R, lib\x86_64-win64\PascalArrayOfStrings.exe // the strings are in a READ ONLY section
// that's the reason for access violations when
// attempting to change the string contents
0x10001f000, Image: Commit, 16 kB, RW, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100023000, Image: Commit, 4 kB, WC, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100024000, Image: Commit, 4 kB, RW, lib\x86_64-win64\PascalArrayOfStrings.exe
0x100025000, Image: Commit, 4 kB, WC, lib\x86_64-win64\PascalArrayOfStrings.exe
The greatest effort goes into editing the post LOL.