Recent

Author Topic: Quick question about const array of string  (Read 2463 times)

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Quick question about const array of string
« Reply #15 on: June 27, 2019, 01:13:55 am »
What are these three numbers before each string:
Code: ASM  [Select]
  1. .section .rodata.n_.Ld1,"d"
  2. .balign 4
  3. .Ld2$strlab:
  4. .short 0,1
  5. .long -1,10
  6. .Ld2:
  7. .ascii "yyyy-mm-dd\000"

Maybe it has something to do with how strings are represented in memory :)

YES!!  :D

I somehow forgot about it. Thank you for point it out.


engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Quick question about const array of string
« Reply #16 on: June 27, 2019, 02:22:07 am »
@Thaddy,
Thank you for trying to help. I missed your comment the first time.

@440bx,
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).

You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?

440bx

  • Hero Member
  • *****
  • Posts: 1189
Re: Quick question about const array of string
« Reply #17 on: June 27, 2019, 04:55:54 am »
@440bx,
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).
Mea culpa, being lazy.  In the case of shortstrings, since each string occupies 256 bytes, I didn't want to show the entire layout of the DtFmt array.  Once I made that choice, I could simply stay in Lazarus and show the code generated.  I usually compile for 32bit when I'm interested in mentally tracking variable usage because in 32bit, very often but not always variable references are fixed stack locations, which makes them easier to keep track of mentally than register contents that keep changing - as is the case in 64bit. 

You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?
In this particular case, it was shear laziness.   For the LONGSTRINGS OFF, I could use the facilities offered by the Lazarus debugging environment to show everything that needed to be shown.  For the LONGSTRINGS ON, using the Lazarus environment gets a bit convoluted because there is an array of pointers to show and the null terminated strings.  The memory viewing abilities of the Lazarus environment are a bit convoluted to use.  Because of that, I switched to the visual studio debugger which makes it really easy to look at memory anywhere you want.

The original post - where everything is parallel - should have been like this:

LONGSTRINGS OFF

Code: Pascal  [Select]
  1. 00000001000015A8 E8 43 9A 00 00       call        main+9A70h (010000AFF0h)
  2. 00000001000015AD 48 89 C3             mov         rbx,rax
  3.  
  4.                                       ; v move i into eax
  5. 00000001000015B0 8B 05 4A EA 01 00    mov         eax,dword ptr [main+1EA80h (0100020000h)]
  6.  
  7.                                       ; v multiply by 256 - sizeof(shortstring)
  8. 00000001000015B6 48 C1 E0 08          shl         rax,8
  9.  
  10.                                       ; v move address of DtFmt array into rdx
  11.                                       ; note the address 01000150C0h
  12.                                       ; that's where DtFmt is located
  13. 00000001000015BA 48 8D 15 FF 3A 01 00 lea         rdx,[main+13B40h (01000150C0h)]
  14.  
  15.                                      ; v compute the address of DtFmt[i] into r8
  16. 00000001000015C1 4C 8D 04 02          lea         r8,[rdx+rax]
  17.  
  18.                                      ; v the rest is writeln stuff
  19. 00000001000015C5 48 89 DA             mov         rdx,rbx
  20. 00000001000015C8 B9 00 00 00 00       mov         ecx,0
  21. 00000001000015CD E8 5E 9C 00 00       call        main+9CB0h (010000B230h)
  22. 00000001000015D2 E8 C9 5F 00 00       call        main+6020h (01000075A0h)
  23. 00000001000015D7 48 89 D9             mov         rcx,rbx
  24. 00000001000015DA E8 91 9B 00 00       call        main+9BF0h (010000B170h)
  25. 00000001000015DF E8 BC 5F 00 00       call        main+6020h (01000075A0h)
  26.  

the memory layout of the DtFmt array is

Code: Pascal  [Select]
  1.                  ;  v first array element - note size 0
  2.                  ;  256 bytes reserved for it
  3. 0x00000001000150C0  00 20 20 20 20 20 20 20  .
  4. 0x00000001000150C8  20 20 20 20 20 20 20 20
  5. 0x00000001000150D0  20 20 20 20 20 20 20 20
  6. 0x00000001000150D8  20 20 20 20 20 20 20 20
  7.  
  8.                  ;  v 256 bytes later....
  9.  
  10.                  ;  v second array element - note size 16
  11.                  ;  256 bytes reserved for it
  12. 0x00000001000151C0  10 79 79 79 79 2d 6d 6d  .yyyy-mm
  13. 0x00000001000151C8  2d 64 64 20 68 68 3a 6e  -dd hh:n
  14. 0x00000001000151D0  6e 20 20 20 20 20 20 20  n
  15. 0x00000001000151D8  20 20 20 20 20 20 20 20
  16. 0x00000001000151E0  20 20 20 20 20 20 20 20
  17.  
  18.                  ; v 2816 bytes later (last element)
  19.                  ;   256 bytes reserved for it
  20.                  ;   note size 0
  21.  
  22. 0x0000000100015BC0  00 20 20 20 20 20 20 20  .
  23. 0x0000000100015BC8  20 20 20 20 20 20 20 20
  24. 0x0000000100015BD0  20 20 20 20 20 20 20 20
  25. 0x0000000100015BD8  20 20 20 20 20 20 20 20
  26. 0x0000000100015BE0  20 20 20 20 20 20 20 20
  27. 0x0000000100015BE8  20 20 20 20 20 20 20 20
  28. 0x0000000100015BF0  20 20 20 20 20 20 20 20
  29. 0x0000000100015BF8  20 20 20 20 20 20 20 20
  30. 0x0000000100015C00  20 20 20 20 20 20 20 20
  31. 0x0000000100015C08  20 20 20 20 20 20 20 20
  32. 0x0000000100015C10  20 20 20 20 20 20 20 20
  33. 0x0000000100015C18  20 20 20 20 20 20 20 20
  34. 0x0000000100015C20  20 20 20 20 20 20 20 20
  35. 0x0000000100015C28  20 20 20 20 20 20 20 20
  36. 0x0000000100015C30  20 20 20 20 20 20 20 20
  37. 0x0000000100015C38  20 20 20 20 20 20 20 20
  38. 0x0000000100015C40  20 20 20 20 20 20 20 20
  39. 0x0000000100015C48  20 20 20 20 20 20 20 20
  40. 0x0000000100015C50  20 20 20 20 20 20 20 20
  41. 0x0000000100015C58  20 20 20 20 20 20 20 20
  42. 0x0000000100015C60  20 20 20 20 20 20 20 20
  43. 0x0000000100015C68  20 20 20 20 20 20 20 20
  44. 0x0000000100015C70  20 20 20 20 20 20 20 20
  45. 0x0000000100015C78  20 20 20 20 20 20 20 20
  46. 0x0000000100015C80  20 20 20 20 20 20 20 20
  47. 0x0000000100015C88  20 20 20 20 20 20 20 20
  48. 0x0000000100015C90  20 20 20 20 20 20 20 20
  49. 0x0000000100015C98  20 20 20 20 20 20 20 20
  50. 0x0000000100015CA0  20 20 20 20 20 20 20 20
  51. 0x0000000100015CA8  20 20 20 20 20 20 20 20
  52. 0x0000000100015CB0  20 20 20 20 20 20 20 20
  53. 0x0000000100015CB8  20 20 20 20 20 20 20 20
  54.  
  55.                  ;  ^ total size of the array above is
  56.                  ;  12 * 256
  57.  
  58.                  ;  v this string added just to mark
  59.                  ;    the end of the above array
  60.  
  61. 0x0000000100015CC0  1a 61 62 63 64 65 66 67  .abcdefg
  62. 0x0000000100015CC8  68 69 6a 6b 6c 6d 6e 6f  hijklmno
  63. 0x0000000100015CD0  70 71 72 73 74 75 76 77  pqrstuvw
  64. 0x0000000100015CD8  78 79 7a 20 20 20 20 20  xyz
  65. 0x0000000100015CE0  20 20 20 20 20 20 20 20
  66.  

In that case the following is true:

1. there is no array of pointers pointing to individual strings
2. strings are in a READ/WRITE section of the executable, therefore they can be modified.
3. a '' string has 256 bytes reserved in the READ/WRITE section it resides in. (therefore it can be changed)

note the address of the first element 0x00000001000150C0

in-memory layout of the executable
Code: Pascal  [Select]
  1.              ; start of executable image
  2. 0x100000000, Image,        156 kB, WCX, PascalArrayOfStrings.exe
  3. 0x100000000, Image: Commit,  4 kB, R,   PascalArrayOfStrings.exe
  4. 0x100001000, Image: Commit, 80 kB, RX,  PascalArrayOfStrings.exe
  5.  
  6.              ;                     v read/write attribute
  7.              ;                       strings can be modified
  8.  
  9. 0x100015000, Image: Commit, 12 kB, RW,  PascalArrayOfStrings.exe
  10.  
  11.              ; remainder of executable image
  12.  
  13. 0x100018000, Image: Commit, 32 kB, R,   PascalArrayOfStrings.exe
  14. 0x100020000, Image: Commit, 16 kB, RW,  PascalArrayOfStrings.exe
  15. 0x100024000, Image: Commit,  4 kB, WC,  PascalArrayOfStrings.exe
  16. 0x100025000, Image: Commit,  4 kB, RW,  PascalArrayOfStrings.exe
  17. 0x100026000, Image: Commit,  4 kB, WC,  PascalArrayOfStrings.exe
  18.  

the address is in a read/write PE section which makes the strings
modifiable.

The first post had the LONGSTRINGS OFF case severely "abbreviated".

Just to make this post complete, the LONGSTRINGS ON case is a straight copy of
the original post.

LONGSTRINGS ON


Here is what happens for the very same array when LONGSTRINGS is on:
Code: Pascal  [Select]
  1.                     v '' string
  2. 0x00000001000150D0  0000000000000000 0000000100017018  .........p......
  3. 0x00000001000150E0  0000000100017048 0000000100017070  Hp......pp......
  4. 0x00000001000150F0  0000000100017098 00000001000170c0  ˜p......Àp......
  5. 0x0000000100015100  00000001000170e0 0000000100017100  àp.......q......
  6. 0x0000000100015110  0000000100017120 0000000100017140   q......@q......
  7. 0x0000000100015120  0000000100017160 0000000000000000  `q..............
  8.                                      ^ '' string
  9.  
  10. 0x0000000100015130  000000010001f060 0001f07000000008  `ð..........pð..
  11. 0x0000000100015140  0000038000000001 000000010001f400  ........ô......
  12. 0x0000000100015150  0001f79000000380 0000038000000001  €....÷.........
  13.  

The following is true in that case:

1. there is an array of pointers (to null terminated strings)
2. a '' is simply a null pointer

for reference, note the address of the array of pointers, address 0x1000150D0

Following the pointers in the array, shows
Code: Pascal  [Select]
  1. 0x0000000100017018  79 79 79 79 2d 6d 6d 2d 64 64 20 68  yyyy-mm-dd h
  2. 0x0000000100017024  68 3a 6e 6e 00 00 00 00 00 00 00 00  h:nn........
  3. 0x0000000100017030  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  4. 0x000000010001703C  ff ff ff ff 0a 00 00 00 00 00 00 00  ÿÿÿÿ........
  5. 0x0000000100017048  79 79 79 79 2d 6d 6d 2d 64 64 00 00  yyyy-mm-dd..
  6. 0x0000000100017054  00 00 00 00 00 00 01 00 00 00 00 00  ............
  7. 0x0000000100017060  ff ff ff ff ff ff ff ff 0a 00 00 00  ÿÿÿÿÿÿÿÿ....
  8. 0x000000010001706C  00 00 00 00 79 79 79 79 2d 6d 6d 2d  ....yyyy-mm-
  9. 0x0000000100017078  64 64 00 00 00 00 00 00 00 00 01 00  dd..........
  10. 0x0000000100017084  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  11. 0x0000000100017090  09 00 00 00 00 00 00 00 64 64 64 64  ........dddd
  12. 0x000000010001709C  64 64 20 74 74 00 00 00 00 00 00 00  dd tt.......
  13. 0x00000001000170A8  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  14. 0x00000001000170B4  ff ff ff ff 06 00 00 00 00 00 00 00  ÿÿÿÿ........
  15. 0x00000001000170C0  64 64 64 64 64 64 00 00 00 00 01 00  dddddd......
  16. 0x00000001000170CC  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  17. 0x00000001000170D8  02 00 00 00 00 00 00 00 74 74 00 00  ........tt..
  18. 0x00000001000170E4  00 00 00 00 00 00 01 00 00 00 00 00  ............
  19. 0x00000001000170F0  ff ff ff ff ff ff ff ff 07 00 00 00  ÿÿÿÿÿÿÿÿ....
  20. 0x00000001000170FC  00 00 00 00 64 64 64 64 64 20 74 00  ....ddddd t.
  21. 0x0000000100017108  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  22. 0x0000000100017114  ff ff ff ff 05 00 00 00 00 00 00 00  ÿÿÿÿ........
  23. 0x0000000100017120  64 64 64 64 64 00 00 00 00 00 01 00  ddddd.......
  24. 0x000000010001712C  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  25. 0x0000000100017138  01 00 00 00 00 00 00 00 74 00 00 00  ........t...
  26. 0x0000000100017144  00 00 00 00 00 00 01 00 00 00 00 00  ............
  27. 0x0000000100017150  ff ff ff ff ff ff ff ff 01 00 00 00  ÿÿÿÿÿÿÿÿ....
  28. 0x000000010001715C  00 00 00 00 66 00 00 00 00 00 00 00  ....f.......
  29. 0x0000000100017168  2a 70 72 65 73 73 20 3c 65 6e 74 65  *press <ente
  30. 0x0000000100017174  72 3e 2f 3c 72 65 74 75 72 6e 3e 20  r>/<return>
  31. 0x0000000100017180  74 6f 20 65 6e 64 20 74 68 69 73 20  to end this
  32. 0x000000010001718C  70 72 6f 67 72 61 6d 00 00 00 00 00  program.....
  33.  

A very important thing to notice is the location where the array of pointers is located and
the location where the strings are located.  They are NOT located in the same PE file section.

Because they are not in the same section of the PE file and, the sections have different
READ/WRITE attributes, the pointers can be altered but the string the original pointers
pointed to cannot because they reside in a READ ONLY section of the PE file.

That's why when LONGSTRINGS are OFF the strings are writeable but when LONGSTRINGS are
ON, attempting to change the contents of the string results in an access violation.

Here is the in-memory view of the mapped program:
Code: Pascal  [Select]
  1. 0x100000000, Image,        152 kB, WCX, lib\x86_64-win64\PascalArrayOfStrings.exe
  2. 0x100000000, Image: Commit,  4 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe
  3. 0x100001000, Image: Commit, 80 kB, RX,  lib\x86_64-win64\PascalArrayOfStrings.exe
  4.  
  5. 0x100015000, Image: Commit,  8 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe   // the pointer array is in a read/write section
  6. 0x100017000, Image: Commit, 32 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe   // the strings are in a READ ONLY section
  7.                                                                                     // that's the reason for access violations when
  8.                                                                                     // attempting to change the string contents
  9. 0x10001f000, Image: Commit, 16 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  10. 0x100023000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  11. 0x100024000, Image: Commit,  4 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  12. 0x100025000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  13.  

The greatest effort goes into editing the post LOL.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

BrunoK

  • Full Member
  • ***
  • Posts: 190
  • Retired programmer
Re: Quick question about const array of string
« Reply #18 on: June 27, 2019, 05:23:34 pm »
What are these three numbers before each string:
Quote
Any idea about the other numbers?
rtl\inc\astrings.inc stucture TAnsiRec
CodePage     CodePage of ansistring
ElementSize  nb of byte per character
Ref                 Reference count, -1 for constants
Len                Length of string in ?Bytes or ?Elements, I don't know
Lazarus trunk r. 62137/27.10.2019 (+/- patches regarding TScrollBar, IntitalSetupDialog, Options.Environment options, SearchResults).  Lazarus 3.0.6 raw from svn.
FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1903 / 18362.418)

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Quick question about const array of string
« Reply #19 on: June 27, 2019, 05:42:16 pm »
What are these three numbers before each string:
Quote
Any idea about the other numbers?
rtl\inc\astrings.inc stucture TAnsiRec
CodePage     CodePage of ansistring
ElementSize  nb of byte per character
Ref                 Reference count, -1 for constants
Len                Length of string in ?Bytes or ?Elements, I don't know

Thank you. Length is in Elements.

lucamar

  • Hero Member
  • *****
  • Posts: 2075
Re: Quick question about const array of string
« Reply #20 on: June 27, 2019, 06:16:35 pm »
Len                Length of string in ?Bytes or ?Elements, I don't know

That's easily ascertained: create a UnicodeString with say 5 characters: 'abcde', get the pointer to it and read the "Len" field. If it's 10, then it's the length in bytes; if 5, it's in characters. :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

ASerge

  • Hero Member
  • *****
  • Posts: 1411
Re: Quick question about const array of string
« Reply #21 on: June 27, 2019, 06:39:41 pm »
That's easily ascertained: create a UnicodeString with say 5 characters: 'abcde', get the pointer to it and read the "Len" field. If it's 10, then it's the length in bytes; if 5, it's in characters. :)
Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2. {$MODE OBJFPC}
  3.  
  4. var
  5.   A: AnsiString = '12345';
  6.   U: UnicodeString = '12345';
  7.   W: WideString = '12345';
  8.   PQ: PPtrUInt;
  9.   PL: PLongWord;
  10. begin
  11.   PQ := Pointer(A);
  12.   Writeln('Ansi:', PQ[-1]);
  13.   PQ := Pointer(U);
  14.   Writeln('Unicode:', PQ[-1]);
  15.   PL := Pointer(W);
  16.   Writeln('Wide:', PL[-1]);
  17.   Readln;
  18. end.
Result:
Code: Text  [Select]
  1. Ansi:5
  2. Unicode:5
  3. Wide:10

Thaddy

  • Hero Member
  • *****
  • Posts: 9142
Re: Quick question about const array of string
« Reply #22 on: June 27, 2019, 07:00:09 pm »
@ASerge: insert codepoint dependent characters and the results look different. Also note widestring is not a reference counted type: it has been superseded by the assignment compatible UnicodeString, which IS a Pascal string type...(try a UTF8 char that needs 4 bytes, or 3 bytes...or a couple of those mixed......drama, tragedy  :'( :'( :'( :'()
« Last Edit: June 27, 2019, 07:10:47 pm by Thaddy »
also related to equus asinus.

ASerge

  • Hero Member
  • *****
  • Posts: 1411
Re: Quick question about const array of string
« Reply #23 on: June 27, 2019, 08:22:14 pm »
@ASerge: insert codepoint dependent characters and the results look different.
No. This is the result in characters, not code points.
And the result for widestring is platform-specific. In the example it for Windows.

lucamar

  • Hero Member
  • *****
  • Posts: 2075
Re: Quick question about const array of string
« Reply #24 on: June 28, 2019, 07:51:33 am »
No. This is the result in characters, not code points.
And the result for widestring is platform-specific. In the example it for Windows.

On Linux it gives this:
Code: [Select]
Ansi:5
Unicode:5
Wide:5
as was expected. :)

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.