Lazarus

Free Pascal => General => Topic started by: lucamar on June 25, 2019, 12:13:21 am

Title: Quick question about const array of string
Post by: lucamar on June 25, 2019, 12:13:21 am
Just to test my understanding, in this construction:

Code: Pascal  [Select]
  1. const
  2.   DTFmt: array[DTFmtKind] of string =
  3.     ('',
  4.      'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats}
  5.      'dddddd tt', 'dddddd', 'tt',  {Long formats}
  6.      'ddddd t', 'ddddd', 't',  {Short formats}
  7.      'f', {Mixed short date + long time}
  8.      '');

the two empty strings (first and last) ocuppy no memory, right? Or just two Nil pointers?
Title: Re: Quick question about const array of string
Post by: howardpc on June 25, 2019, 12:21:54 am
With {$H+} (ansistring) empty strings are Nil pointers.
Title: Re: Quick question about const array of string
Post by: lucamar on June 25, 2019, 12:31:53 am
So it is kind of like if it were an array of pointers of which the first an last are Nil. Is that (kind of) it?


ETA Got a WTH question from outside, so I'll explain a little what I'm doing.

I have some menu/listbox/radiogroup items which select which format should be used to insert a date/time somewhere else. To select the format I'm using the components Tag, and I've declared a enumeration:
Code: Pascal  [Select]
  1. type
  2.   DTFmtKind = (dtfNone,
  3.       dtfISOFull, dtfISODate, dtfISOTime,
  4.       dtfLongFull, dtfLongDate, dtfLongTime,
  5.       dtfShortFull, dtfShortDate, dtfShortTime,
  6.       dtfMixedFull,
  7.       dtfLast);
The first and last members of the enumeration serve to test against (Sender as TComponent).Tag to see whether it's in or out of bounds. It's basicaly an elaborated Murphy-guard to prevent tryng to access the array with an out-of-bounds index.

And if the number of possible formats change in the future (very probable!) I'll have to change nothing but the array and the enumeration,

That's it: cautious laziness. :)
Title: Re: Quick question about const array of string
Post by: 440bx on June 25, 2019, 08:09:15 am
Just to test my understanding, in this construction:
The first thing to understand is that the way that construction is implemented depends on the setting of $LONGSTRINGS.

consider the following very simple test program (using your array):
Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2.  
  3. {$LONGSTRINGS ON}
  4. //{$LONGSTRINGS OFF}
  5.  
  6.  
  7. program _PascalArrayOfStrings;
  8.  
  9. uses
  10.   Windows,
  11.   sysutils
  12.   ;
  13.  
  14.  
  15. const
  16.   DTFmt: array[0..11] of string =
  17.     ('',
  18.      'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats}
  19.      'dddddd tt', 'dddddd', 'tt',  {Long formats}
  20.      'ddddd t', 'ddddd', 't',  {Short formats}
  21.      'f', {Mixed short date + long time}
  22.      '');
  23.  
  24.  
  25. var
  26.   i : integer;
  27.  
  28. begin
  29.  
  30.   for i := low(DtFmt) to high(DtFmt) do
  31.   begin
  32.     writeln(DtFmt[i]);
  33.   end;
  34.  
  35.   writeln('press <enter>/<return> to end this program');
  36.   readln;
  37. end.
  38.  
The implementation and behavior is radically different depending on the setting of LONGSTRINGS.

Here is what happens when LONGSTRINGS is OFF
Code: Pascal  [Select]
  1. PascalArrayOfStrings.lpr:32       writeln(DtFmt[i]);
  2. 00401563 e8b89c0000               call   0x40b220 <fpc_get_output>
  3. 00401568 89c3                     mov    %eax,%ebx
  4. 0040156A a100804100               mov    0x418000,%eax             ; load i into eax
  5. 0040156F c1e008                   shl    $0x8,%eax                 ; multiply by 256 (sizeof(shortstring))
  6. 00401572 8d8880304100             lea    0x413080(%eax),%ecx       ; load address of string into ecx
  7. 00401578 89da                     mov    %ebx,%edx                 ; the rest is writeln stuff
  8. 0040157A b800000000               mov    $0x0,%eax
  9. 0040157F e87c9e0000               call   0x40b400 <fpc_write_text_shortstr>
  10. 00401584 e8076f0000               call   0x408490 <fpc_iocheck>
  11. 00401589 89d8                     mov    %ebx,%eax
  12. 0040158B e8d09d0000               call   0x40b360 <fpc_writeln_end>
  13. 00401590 e8fb6e0000               call   0x408490 <fpc_iocheck>
  14. 00401595 833d008041000b           cmpl   $0xb,0x418000
  15. 0040159C 7cbe                     jl     0x40155c <main+28>
  16.  

In that case the following is true:

1. there is no array of pointers pointing to individual strings
2. strings are in a READ/WRITE section of the executable, therefore they can be modified.
3. a '' string has 256 bytes reserved in the READ/WRITE section it resides in. (therefore it can be changed)


Here is what happens for the very same array when LONGSTRINGS is on:
Code: Pascal  [Select]
  1.                     v '' string
  2. 0x00000001000150D0  0000000000000000 0000000100017018  .........p......
  3. 0x00000001000150E0  0000000100017048 0000000100017070  Hp......pp......
  4. 0x00000001000150F0  0000000100017098 00000001000170c0  ˜p......Àp......
  5. 0x0000000100015100  00000001000170e0 0000000100017100  àp.......q......
  6. 0x0000000100015110  0000000100017120 0000000100017140   q......@q......
  7. 0x0000000100015120  0000000100017160 0000000000000000  `q..............
  8.                                      ^ '' string
  9.  
  10. 0x0000000100015130  000000010001f060 0001f07000000008  `ð..........pð..
  11. 0x0000000100015140  0000038000000001 000000010001f400  ........ô......
  12. 0x0000000100015150  0001f79000000380 0000038000000001  €....÷.........
  13.  

The following is true in that case:

1. there is an array of pointers (to null terminated strings)
2. a '' is simply a null pointer

for reference, note the address of the array of pointers, address 0x1000150D0

Following the pointers in the array, shows
Code: Pascal  [Select]
  1. 0x0000000100017018  79 79 79 79 2d 6d 6d 2d 64 64 20 68  yyyy-mm-dd h
  2. 0x0000000100017024  68 3a 6e 6e 00 00 00 00 00 00 00 00  h:nn........
  3. 0x0000000100017030  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  4. 0x000000010001703C  ff ff ff ff 0a 00 00 00 00 00 00 00  ÿÿÿÿ........
  5. 0x0000000100017048  79 79 79 79 2d 6d 6d 2d 64 64 00 00  yyyy-mm-dd..
  6. 0x0000000100017054  00 00 00 00 00 00 01 00 00 00 00 00  ............
  7. 0x0000000100017060  ff ff ff ff ff ff ff ff 0a 00 00 00  ÿÿÿÿÿÿÿÿ....
  8. 0x000000010001706C  00 00 00 00 79 79 79 79 2d 6d 6d 2d  ....yyyy-mm-
  9. 0x0000000100017078  64 64 00 00 00 00 00 00 00 00 01 00  dd..........
  10. 0x0000000100017084  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  11. 0x0000000100017090  09 00 00 00 00 00 00 00 64 64 64 64  ........dddd
  12. 0x000000010001709C  64 64 20 74 74 00 00 00 00 00 00 00  dd tt.......
  13. 0x00000001000170A8  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  14. 0x00000001000170B4  ff ff ff ff 06 00 00 00 00 00 00 00  ÿÿÿÿ........
  15. 0x00000001000170C0  64 64 64 64 64 64 00 00 00 00 01 00  dddddd......
  16. 0x00000001000170CC  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  17. 0x00000001000170D8  02 00 00 00 00 00 00 00 74 74 00 00  ........tt..
  18. 0x00000001000170E4  00 00 00 00 00 00 01 00 00 00 00 00  ............
  19. 0x00000001000170F0  ff ff ff ff ff ff ff ff 07 00 00 00  ÿÿÿÿÿÿÿÿ....
  20. 0x00000001000170FC  00 00 00 00 64 64 64 64 64 20 74 00  ....ddddd t.
  21. 0x0000000100017108  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  22. 0x0000000100017114  ff ff ff ff 05 00 00 00 00 00 00 00  ÿÿÿÿ........
  23. 0x0000000100017120  64 64 64 64 64 00 00 00 00 00 01 00  ddddd.......
  24. 0x000000010001712C  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  25. 0x0000000100017138  01 00 00 00 00 00 00 00 74 00 00 00  ........t...
  26. 0x0000000100017144  00 00 00 00 00 00 01 00 00 00 00 00  ............
  27. 0x0000000100017150  ff ff ff ff ff ff ff ff 01 00 00 00  ÿÿÿÿÿÿÿÿ....
  28. 0x000000010001715C  00 00 00 00 66 00 00 00 00 00 00 00  ....f.......
  29. 0x0000000100017168  2a 70 72 65 73 73 20 3c 65 6e 74 65  *press <ente
  30. 0x0000000100017174  72 3e 2f 3c 72 65 74 75 72 6e 3e 20  r>/<return>
  31. 0x0000000100017180  74 6f 20 65 6e 64 20 74 68 69 73 20  to end this
  32. 0x000000010001718C  70 72 6f 67 72 61 6d 00 00 00 00 00  program.....
  33.  

A very important thing to notice is the location where the array of pointers is located and
the location where the strings are located.  They are NOT located in the same PE file section.

Because they are not in the same section of the PE file and, the sections have different
READ/WRITE attributes, the pointers can be altered but the string the original pointers
pointed to cannot because they reside in a READ ONLY section of the PE file.

That's why when LONGSTRINGS are OFF the strings are writeable but when LONGSTRINGS are
ON, attempting to change the contents of the string results in an access violation.

Here is the in-memory view of the mapped program:
Code: Pascal  [Select]
  1. 0x100000000, Image,        152 kB, WCX, lib\x86_64-win64\PascalArrayOfStrings.exe
  2. 0x100000000, Image: Commit,  4 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe
  3. 0x100001000, Image: Commit, 80 kB, RX,  lib\x86_64-win64\PascalArrayOfStrings.exe
  4.  
  5. 0x100015000, Image: Commit,  8 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe   // the pointer array is in a read/write section
  6. 0x100017000, Image: Commit, 32 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe   // the strings are in a READ ONLY section
  7.                                                                                     // that's the reason for access violations when
  8.                                                                                     // attempting to change the string contents
  9. 0x10001f000, Image: Commit, 16 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  10. 0x100023000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  11. 0x100024000, Image: Commit,  4 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  12. 0x100025000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  13.  

HTH.

ETA:

Just in case, it is necessary to scroll horizontally to see some comments on the right hand side.
Title: Re: Quick question about const array of string
Post by: PascalDragon on June 25, 2019, 09:06:22 am
Just to test my understanding, in this construction:

Code: Pascal  [Select]
  1. const
  2.   DTFmt: array[DTFmtKind] of string =
  3.     ('',
  4.      'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats}
  5.      'dddddd tt', 'dddddd', 'tt',  {Long formats}
  6.      'ddddd t', 'ddddd', 't',  {Short formats}
  7.      'f', {Mixed short date + long time}
  8.      '');

the two empty strings (first and last) ocuppy no memory, right? Or just two Nil pointers?
Just compile it with -al and you'll see how the array is layouted (in this case with {$H+}):
Code: [Select]
# Begin asmlist al_const

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld1$strlab:
.short 0,1
.long -1,16
.Ld1:
# [33] 'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats}
.ascii "yyyy-mm-dd hh:nn\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld2$strlab:
.short 0,1
.long -1,10
.Ld2:
.ascii "yyyy-mm-dd\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld3$strlab:
.short 0,1
.long -1,10
.Ld3:
.ascii "yyyy-mm-dd\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld4$strlab:
.short 0,1
.long -1,9
.Ld4:
# [34] 'dddddd tt', 'dddddd', 'tt',  {Long formats}
.ascii "dddddd tt\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld5$strlab:
.short 0,1
.long -1,6
.Ld5:
.ascii "dddddd\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld6$strlab:
.short 0,1
.long -1,2
.Ld6:
.ascii "tt\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld7$strlab:
.short 0,1
.long -1,7
.Ld7:
# [35] 'ddddd t', 'ddddd', 't',  {Short formats}
.ascii "ddddd t\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld8$strlab:
.short 0,1
.long -1,5
.Ld8:
.ascii "ddddd\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld9$strlab:
.short 0,1
.long -1,1
.Ld9:
.ascii "t\000"

.section .rodata.n_.Ld1,"d"
.balign 4
.Ld10$strlab:
.short 0,1
.long -1,1
.Ld10:
# [36] 'f', {Mixed short date + long time}
.ascii "f\000"
# End asmlist al_const
# Begin asmlist al_typedconsts

.section .data.n_TC_$P$THELLOWORLD_$$_DTFMT,"d"
.balign 4
TC_$P$THELLOWORLD_$$_DTFMT:
.long 0
.long .Ld1
.long .Ld2
.long .Ld3
.long .Ld4
.long .Ld5
.long .Ld6
.long .Ld7
.long .Ld8
.long .Ld9
.long .Ld10
.long 0
# End asmlist al_typedconsts
As you can see the first and last element are Nil elements.

But let's be honest: even if they wouldn't be Nil: one element more or less doesn't really matter that much (except if you're playing around on AVR or some other memory constrained hardware).

Sidenote: With {$H-} or explicit use of ShortString you'd always have 256 Byte used for each string.
Title: Re: Quick question about const array of string
Post by: lucamar on June 26, 2019, 05:37:08 pm
Thanks everyone! Now I have a real understanding of how things work deep down :)

Somehow I knew all that but in a tenuous, nebulous way, as in "it's probably something like this".

Much better to be able to map your declarations to what the compiler does with them and say: "it is like this".

Again, thank you.
Title: Re: Quick question about const array of string
Post by: Thaddy on June 26, 2019, 06:15:18 pm
Just remember typed consts have pointers to something, even nil. They have a location in memory.
Untyped consts are compile time determined, do not necessarily have a memory location and are read only nowadays..
Sven forgot to mention that. Untyped consts can be replaced by a literal.
Title: Re: Quick question about const array of string
Post by: 440bx on June 26, 2019, 07:08:10 pm
Just remember typed consts have pointers to something, even nil.
That's a dangerous generalization.  For instance, in the case of a shortstring typed constant, there is no _separate_ pointer pointing to the contents of the string.  e.g,
Code: Pascal  [Select]
  1. const
  2.   s : shortstring = ''
unlike for a longstring, that declaration does _not_ cause the compiler to save a pointer to s somewhere in the data segment and, in the above case, definitely not one that is nil or points to nil since the compiler will reserve a full 256 bytes for the string even though it has been declared as empty. 

What you said is the case for longstrings but not for all typed const variables.

HTH.
Title: Re: Quick question about const array of string
Post by: Thaddy on June 26, 2019, 07:28:58 pm
Well, the compiler has to allow for {$J+} and {$J-} - these are local directives! - that does NOT make any difference for the storage. Shortstring or longstring.
What I meant is: untyped is read only and for small values replaced with literal values. Typed consts are not only in essence different beasts....
Title: Re: Quick question about const array of string
Post by: lucamar on June 26, 2019, 07:51:38 pm
Thanks.

That's something I had that rather clear, the differences of "normal" vs. "typed" constants. :)
Title: Re: Quick question about const array of string
Post by: ASerge on June 26, 2019, 09:53:37 pm
Code: Pascal  [Select]
  1. type
  2.   DTFmtKind = (dtfNone,
  3.       dtfISOFull, dtfISODate, dtfISOTime,
  4.       dtfLongFull, dtfLongDate, dtfLongTime,
  5.       dtfShortFull, dtfShortDate, dtfShortTime,
  6.       dtfMixedFull,
  7.       dtfLast);
It seems to me that the introduction of border elements is not necessary.
Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2. {$MODE OBJFPC}
  3. {$LONGSTRINGS ON}
  4.  
  5. type
  6.   TDateTimeFormatKind = (
  7.     dtfISOFull, dtfISODate, dtfISOTime, {ISO formats}
  8.     dtfLongFull, dtfLongDate, dtfLongTime, {Long formats}
  9.     dtfShortFull, dtfShortDate, dtfShortTime, {Short formats}
  10.     dtfMixedFull {Mixed short date + long time}
  11.   );
  12.  
  13. function IsValidFormatKind(Kind: Integer): Boolean; inline;
  14. begin
  15.   Result := Kind in [Ord(Low(TDateTimeFormatKind))..Ord(High(TDateTimeFormatKind))];
  16. end;
  17.  
  18. function GetFormatString(Kind: TDateTimeFormatKind): string; inline;
  19. const
  20.   CDateTimeFormatString: array[TDateTimeFormatKind] of string = (
  21.     'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd',
  22.     'dddddd tt', 'dddddd', 'tt',
  23.     'ddddd t', 'ddddd', 't',
  24.     'f'
  25.   );
  26. begin
  27.   Result := CDateTimeFormatString[Kind];
  28. end;
  29.  
  30. function GetFormatString(Kind: Integer): string;
  31. begin
  32.   if IsValidFormatKind(Kind) then
  33.     Result := GetFormatString(TDateTimeFormatKind(Kind))
  34.   else
  35.     Result := '';
  36. end;
  37.  
  38. begin
  39.   Writeln(IsValidFormatKind(30));
  40.   Writeln(IsValidFormatKind(-3));
  41.   Writeln(IsValidFormatKind(3));
  42.   Writeln(GetFormatString(30));
  43.   Writeln(GetFormatString(-3));
  44.   Writeln(GetFormatString(3));
  45.   Readln;
  46. end.
When you later want to add another format, you will only add a value to the "CDateimeFormatString" array.
Title: Re: Quick question about const array of string
Post by: lucamar on June 26, 2019, 10:44:29 pm
It seems to me that the introduction of border elements is not necessary.
[.. etc ...]

Yes, that's what I ended up doing (using Low() and High(), which is what I usually do) but that first q&d iteration(*) got me thinking, so I headed over here and asked. As I said, just to test my understanding ... which seems to have been quite correct but rather shallow :)


(*) It came out that way because that project uses a ton of constructions like that, so by the time I started adding my code my head was screwed to that "style". It happens <shrugs>
Title: Re: Quick question about const array of string
Post by: engkin on June 26, 2019, 11:14:50 pm
I am confused  :(

What are these three numbers before each string:
Code: ASM  [Select]
  1. .section .rodata.n_.Ld1,"d"
  2. .balign 4
  3. .Ld2$strlab:
  4. .short 0,1
  5. .long -1,10
  6. .Ld2:
  7. .ascii "yyyy-mm-dd\000"

The fourth number is the length of the string, and each string ends with a terminating zero.

Any idea about the other numbers?
Title: Re: Quick question about const array of string
Post by: Thaddy on June 26, 2019, 11:52:51 pm
Educated guess:
Code: Text  [Select]
  1. .section .rodata.n_.Ld1,"d"    ;read only data
  2. .balign 4
  3. .Ld2$strlab:
  4. .short 0,1                     ; Shortstring declread, size is one to store length of zero.
  5. .long -1,10                    ; Treat as PWideChar, length (payload) is 10.
  6. .Ld2:
  7. .ascii "yyyy-mm-dd\000"        ; terminator is WideChar.

I may be partially wrong here...
Title: Re: Quick question about const array of string
Post by: lucamar on June 27, 2019, 12:21:18 am
What are these three numbers before each string:
Code: ASM  [Select]
  1. .section .rodata.n_.Ld1,"d"
  2. .balign 4
  3. .Ld2$strlab:
  4. .short 0,1
  5. .long -1,10
  6. .Ld2:
  7. .ascii "yyyy-mm-dd\000"

Maybe it has something to do with how strings are represented in memory (https://www.freepascal.org/docs-html/current/prog/progsu161.html) :)
Title: Re: Quick question about const array of string
Post by: engkin on June 27, 2019, 01:13:55 am
What are these three numbers before each string:
Code: ASM  [Select]
  1. .section .rodata.n_.Ld1,"d"
  2. .balign 4
  3. .Ld2$strlab:
  4. .short 0,1
  5. .long -1,10
  6. .Ld2:
  7. .ascii "yyyy-mm-dd\000"

Maybe it has something to do with how strings are represented in memory (https://www.freepascal.org/docs-html/current/prog/progsu161.html) :)

YES!!  :D

I somehow forgot about it. Thank you for point it out.

Title: Re: Quick question about const array of string
Post by: engkin on June 27, 2019, 02:22:07 am
@Thaddy,
Thank you for trying to help. I missed your comment the first time.

@440bx,
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).

You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?
Title: Re: Quick question about const array of string
Post by: 440bx on June 27, 2019, 04:55:54 am
@440bx,
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).
Mea culpa, being lazy.  In the case of shortstrings, since each string occupies 256 bytes, I didn't want to show the entire layout of the DtFmt array.  Once I made that choice, I could simply stay in Lazarus and show the code generated.  I usually compile for 32bit when I'm interested in mentally tracking variable usage because in 32bit, very often but not always variable references are fixed stack locations, which makes them easier to keep track of mentally than register contents that keep changing - as is the case in 64bit. 

You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?
In this particular case, it was shear laziness.   For the LONGSTRINGS OFF, I could use the facilities offered by the Lazarus debugging environment to show everything that needed to be shown.  For the LONGSTRINGS ON, using the Lazarus environment gets a bit convoluted because there is an array of pointers to show and the null terminated strings.  The memory viewing abilities of the Lazarus environment are a bit convoluted to use.  Because of that, I switched to the visual studio debugger which makes it really easy to look at memory anywhere you want.

The original post - where everything is parallel - should have been like this:

LONGSTRINGS OFF

Code: Pascal  [Select]
  1. 00000001000015A8 E8 43 9A 00 00       call        main+9A70h (010000AFF0h)
  2. 00000001000015AD 48 89 C3             mov         rbx,rax
  3.  
  4.                                       ; v move i into eax
  5. 00000001000015B0 8B 05 4A EA 01 00    mov         eax,dword ptr [main+1EA80h (0100020000h)]
  6.  
  7.                                       ; v multiply by 256 - sizeof(shortstring)
  8. 00000001000015B6 48 C1 E0 08          shl         rax,8
  9.  
  10.                                       ; v move address of DtFmt array into rdx
  11.                                       ; note the address 01000150C0h
  12.                                       ; that's where DtFmt is located
  13. 00000001000015BA 48 8D 15 FF 3A 01 00 lea         rdx,[main+13B40h (01000150C0h)]
  14.  
  15.                                      ; v compute the address of DtFmt[i] into r8
  16. 00000001000015C1 4C 8D 04 02          lea         r8,[rdx+rax]
  17.  
  18.                                      ; v the rest is writeln stuff
  19. 00000001000015C5 48 89 DA             mov         rdx,rbx
  20. 00000001000015C8 B9 00 00 00 00       mov         ecx,0
  21. 00000001000015CD E8 5E 9C 00 00       call        main+9CB0h (010000B230h)
  22. 00000001000015D2 E8 C9 5F 00 00       call        main+6020h (01000075A0h)
  23. 00000001000015D7 48 89 D9             mov         rcx,rbx
  24. 00000001000015DA E8 91 9B 00 00       call        main+9BF0h (010000B170h)
  25. 00000001000015DF E8 BC 5F 00 00       call        main+6020h (01000075A0h)
  26.  

the memory layout of the DtFmt array is

Code: Pascal  [Select]
  1.                  ;  v first array element - note size 0
  2.                  ;  256 bytes reserved for it
  3. 0x00000001000150C0  00 20 20 20 20 20 20 20  .
  4. 0x00000001000150C8  20 20 20 20 20 20 20 20
  5. 0x00000001000150D0  20 20 20 20 20 20 20 20
  6. 0x00000001000150D8  20 20 20 20 20 20 20 20
  7.  
  8.                  ;  v 256 bytes later....
  9.  
  10.                  ;  v second array element - note size 16
  11.                  ;  256 bytes reserved for it
  12. 0x00000001000151C0  10 79 79 79 79 2d 6d 6d  .yyyy-mm
  13. 0x00000001000151C8  2d 64 64 20 68 68 3a 6e  -dd hh:n
  14. 0x00000001000151D0  6e 20 20 20 20 20 20 20  n
  15. 0x00000001000151D8  20 20 20 20 20 20 20 20
  16. 0x00000001000151E0  20 20 20 20 20 20 20 20
  17.  
  18.                  ; v 2816 bytes later (last element)
  19.                  ;   256 bytes reserved for it
  20.                  ;   note size 0
  21.  
  22. 0x0000000100015BC0  00 20 20 20 20 20 20 20  .
  23. 0x0000000100015BC8  20 20 20 20 20 20 20 20
  24. 0x0000000100015BD0  20 20 20 20 20 20 20 20
  25. 0x0000000100015BD8  20 20 20 20 20 20 20 20
  26. 0x0000000100015BE0  20 20 20 20 20 20 20 20
  27. 0x0000000100015BE8  20 20 20 20 20 20 20 20
  28. 0x0000000100015BF0  20 20 20 20 20 20 20 20
  29. 0x0000000100015BF8  20 20 20 20 20 20 20 20
  30. 0x0000000100015C00  20 20 20 20 20 20 20 20
  31. 0x0000000100015C08  20 20 20 20 20 20 20 20
  32. 0x0000000100015C10  20 20 20 20 20 20 20 20
  33. 0x0000000100015C18  20 20 20 20 20 20 20 20
  34. 0x0000000100015C20  20 20 20 20 20 20 20 20
  35. 0x0000000100015C28  20 20 20 20 20 20 20 20
  36. 0x0000000100015C30  20 20 20 20 20 20 20 20
  37. 0x0000000100015C38  20 20 20 20 20 20 20 20
  38. 0x0000000100015C40  20 20 20 20 20 20 20 20
  39. 0x0000000100015C48  20 20 20 20 20 20 20 20
  40. 0x0000000100015C50  20 20 20 20 20 20 20 20
  41. 0x0000000100015C58  20 20 20 20 20 20 20 20
  42. 0x0000000100015C60  20 20 20 20 20 20 20 20
  43. 0x0000000100015C68  20 20 20 20 20 20 20 20
  44. 0x0000000100015C70  20 20 20 20 20 20 20 20
  45. 0x0000000100015C78  20 20 20 20 20 20 20 20
  46. 0x0000000100015C80  20 20 20 20 20 20 20 20
  47. 0x0000000100015C88  20 20 20 20 20 20 20 20
  48. 0x0000000100015C90  20 20 20 20 20 20 20 20
  49. 0x0000000100015C98  20 20 20 20 20 20 20 20
  50. 0x0000000100015CA0  20 20 20 20 20 20 20 20
  51. 0x0000000100015CA8  20 20 20 20 20 20 20 20
  52. 0x0000000100015CB0  20 20 20 20 20 20 20 20
  53. 0x0000000100015CB8  20 20 20 20 20 20 20 20
  54.  
  55.                  ;  ^ total size of the array above is
  56.                  ;  12 * 256
  57.  
  58.                  ;  v this string added just to mark
  59.                  ;    the end of the above array
  60.  
  61. 0x0000000100015CC0  1a 61 62 63 64 65 66 67  .abcdefg
  62. 0x0000000100015CC8  68 69 6a 6b 6c 6d 6e 6f  hijklmno
  63. 0x0000000100015CD0  70 71 72 73 74 75 76 77  pqrstuvw
  64. 0x0000000100015CD8  78 79 7a 20 20 20 20 20  xyz
  65. 0x0000000100015CE0  20 20 20 20 20 20 20 20
  66.  

In that case the following is true:

1. there is no array of pointers pointing to individual strings
2. strings are in a READ/WRITE section of the executable, therefore they can be modified.
3. a '' string has 256 bytes reserved in the READ/WRITE section it resides in. (therefore it can be changed)

note the address of the first element 0x00000001000150C0

in-memory layout of the executable
Code: Pascal  [Select]
  1.              ; start of executable image
  2. 0x100000000, Image,        156 kB, WCX, PascalArrayOfStrings.exe
  3. 0x100000000, Image: Commit,  4 kB, R,   PascalArrayOfStrings.exe
  4. 0x100001000, Image: Commit, 80 kB, RX,  PascalArrayOfStrings.exe
  5.  
  6.              ;                     v read/write attribute
  7.              ;                       strings can be modified
  8.  
  9. 0x100015000, Image: Commit, 12 kB, RW,  PascalArrayOfStrings.exe
  10.  
  11.              ; remainder of executable image
  12.  
  13. 0x100018000, Image: Commit, 32 kB, R,   PascalArrayOfStrings.exe
  14. 0x100020000, Image: Commit, 16 kB, RW,  PascalArrayOfStrings.exe
  15. 0x100024000, Image: Commit,  4 kB, WC,  PascalArrayOfStrings.exe
  16. 0x100025000, Image: Commit,  4 kB, RW,  PascalArrayOfStrings.exe
  17. 0x100026000, Image: Commit,  4 kB, WC,  PascalArrayOfStrings.exe
  18.  

the address is in a read/write PE section which makes the strings
modifiable.

The first post had the LONGSTRINGS OFF case severely "abbreviated".

Just to make this post complete, the LONGSTRINGS ON case is a straight copy of
the original post.

LONGSTRINGS ON


Here is what happens for the very same array when LONGSTRINGS is on:
Code: Pascal  [Select]
  1.                     v '' string
  2. 0x00000001000150D0  0000000000000000 0000000100017018  .........p......
  3. 0x00000001000150E0  0000000100017048 0000000100017070  Hp......pp......
  4. 0x00000001000150F0  0000000100017098 00000001000170c0  ˜p......Àp......
  5. 0x0000000100015100  00000001000170e0 0000000100017100  àp.......q......
  6. 0x0000000100015110  0000000100017120 0000000100017140   q......@q......
  7. 0x0000000100015120  0000000100017160 0000000000000000  `q..............
  8.                                      ^ '' string
  9.  
  10. 0x0000000100015130  000000010001f060 0001f07000000008  `ð..........pð..
  11. 0x0000000100015140  0000038000000001 000000010001f400  ........ô......
  12. 0x0000000100015150  0001f79000000380 0000038000000001  €....÷.........
  13.  

The following is true in that case:

1. there is an array of pointers (to null terminated strings)
2. a '' is simply a null pointer

for reference, note the address of the array of pointers, address 0x1000150D0

Following the pointers in the array, shows
Code: Pascal  [Select]
  1. 0x0000000100017018  79 79 79 79 2d 6d 6d 2d 64 64 20 68  yyyy-mm-dd h
  2. 0x0000000100017024  68 3a 6e 6e 00 00 00 00 00 00 00 00  h:nn........
  3. 0x0000000100017030  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  4. 0x000000010001703C  ff ff ff ff 0a 00 00 00 00 00 00 00  ÿÿÿÿ........
  5. 0x0000000100017048  79 79 79 79 2d 6d 6d 2d 64 64 00 00  yyyy-mm-dd..
  6. 0x0000000100017054  00 00 00 00 00 00 01 00 00 00 00 00  ............
  7. 0x0000000100017060  ff ff ff ff ff ff ff ff 0a 00 00 00  ÿÿÿÿÿÿÿÿ....
  8. 0x000000010001706C  00 00 00 00 79 79 79 79 2d 6d 6d 2d  ....yyyy-mm-
  9. 0x0000000100017078  64 64 00 00 00 00 00 00 00 00 01 00  dd..........
  10. 0x0000000100017084  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  11. 0x0000000100017090  09 00 00 00 00 00 00 00 64 64 64 64  ........dddd
  12. 0x000000010001709C  64 64 20 74 74 00 00 00 00 00 00 00  dd tt.......
  13. 0x00000001000170A8  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  14. 0x00000001000170B4  ff ff ff ff 06 00 00 00 00 00 00 00  ÿÿÿÿ........
  15. 0x00000001000170C0  64 64 64 64 64 64 00 00 00 00 01 00  dddddd......
  16. 0x00000001000170CC  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  17. 0x00000001000170D8  02 00 00 00 00 00 00 00 74 74 00 00  ........tt..
  18. 0x00000001000170E4  00 00 00 00 00 00 01 00 00 00 00 00  ............
  19. 0x00000001000170F0  ff ff ff ff ff ff ff ff 07 00 00 00  ÿÿÿÿÿÿÿÿ....
  20. 0x00000001000170FC  00 00 00 00 64 64 64 64 64 20 74 00  ....ddddd t.
  21. 0x0000000100017108  00 00 01 00 00 00 00 00 ff ff ff ff  ........ÿÿÿÿ
  22. 0x0000000100017114  ff ff ff ff 05 00 00 00 00 00 00 00  ÿÿÿÿ........
  23. 0x0000000100017120  64 64 64 64 64 00 00 00 00 00 01 00  ddddd.......
  24. 0x000000010001712C  00 00 00 00 ff ff ff ff ff ff ff ff  ....ÿÿÿÿÿÿÿÿ
  25. 0x0000000100017138  01 00 00 00 00 00 00 00 74 00 00 00  ........t...
  26. 0x0000000100017144  00 00 00 00 00 00 01 00 00 00 00 00  ............
  27. 0x0000000100017150  ff ff ff ff ff ff ff ff 01 00 00 00  ÿÿÿÿÿÿÿÿ....
  28. 0x000000010001715C  00 00 00 00 66 00 00 00 00 00 00 00  ....f.......
  29. 0x0000000100017168  2a 70 72 65 73 73 20 3c 65 6e 74 65  *press <ente
  30. 0x0000000100017174  72 3e 2f 3c 72 65 74 75 72 6e 3e 20  r>/<return>
  31. 0x0000000100017180  74 6f 20 65 6e 64 20 74 68 69 73 20  to end this
  32. 0x000000010001718C  70 72 6f 67 72 61 6d 00 00 00 00 00  program.....
  33.  

A very important thing to notice is the location where the array of pointers is located and
the location where the strings are located.  They are NOT located in the same PE file section.

Because they are not in the same section of the PE file and, the sections have different
READ/WRITE attributes, the pointers can be altered but the string the original pointers
pointed to cannot because they reside in a READ ONLY section of the PE file.

That's why when LONGSTRINGS are OFF the strings are writeable but when LONGSTRINGS are
ON, attempting to change the contents of the string results in an access violation.

Here is the in-memory view of the mapped program:
Code: Pascal  [Select]
  1. 0x100000000, Image,        152 kB, WCX, lib\x86_64-win64\PascalArrayOfStrings.exe
  2. 0x100000000, Image: Commit,  4 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe
  3. 0x100001000, Image: Commit, 80 kB, RX,  lib\x86_64-win64\PascalArrayOfStrings.exe
  4.  
  5. 0x100015000, Image: Commit,  8 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe   // the pointer array is in a read/write section
  6. 0x100017000, Image: Commit, 32 kB, R,   lib\x86_64-win64\PascalArrayOfStrings.exe   // the strings are in a READ ONLY section
  7.                                                                                     // that's the reason for access violations when
  8.                                                                                     // attempting to change the string contents
  9. 0x10001f000, Image: Commit, 16 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  10. 0x100023000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  11. 0x100024000, Image: Commit,  4 kB, RW,  lib\x86_64-win64\PascalArrayOfStrings.exe
  12. 0x100025000, Image: Commit,  4 kB, WC,  lib\x86_64-win64\PascalArrayOfStrings.exe
  13.  

The greatest effort goes into editing the post LOL.
Title: Re: Quick question about const array of string
Post by: BrunoK on June 27, 2019, 05:23:34 pm
What are these three numbers before each string:
Quote
Any idea about the other numbers?
rtl\inc\astrings.inc stucture TAnsiRec
CodePage     CodePage of ansistring
ElementSize  nb of byte per character
Ref                 Reference count, -1 for constants
Len                Length of string in ?Bytes or ?Elements, I don't know
Title: Re: Quick question about const array of string
Post by: engkin on June 27, 2019, 05:42:16 pm
What are these three numbers before each string:
Quote
Any idea about the other numbers?
rtl\inc\astrings.inc stucture TAnsiRec
CodePage     CodePage of ansistring
ElementSize  nb of byte per character
Ref                 Reference count, -1 for constants
Len                Length of string in ?Bytes or ?Elements, I don't know

Thank you. Length is in Elements.
Title: Re: Quick question about const array of string
Post by: lucamar on June 27, 2019, 06:16:35 pm
Len                Length of string in ?Bytes or ?Elements, I don't know

That's easily ascertained: create a UnicodeString with say 5 characters: 'abcde', get the pointer to it and read the "Len" field. If it's 10, then it's the length in bytes; if 5, it's in characters. :)
Title: Re: Quick question about const array of string
Post by: ASerge on June 27, 2019, 06:39:41 pm
That's easily ascertained: create a UnicodeString with say 5 characters: 'abcde', get the pointer to it and read the "Len" field. If it's 10, then it's the length in bytes; if 5, it's in characters. :)
Code: Pascal  [Select]
  1. {$APPTYPE CONSOLE}
  2. {$MODE OBJFPC}
  3.  
  4. var
  5.   A: AnsiString = '12345';
  6.   U: UnicodeString = '12345';
  7.   W: WideString = '12345';
  8.   PQ: PPtrUInt;
  9.   PL: PLongWord;
  10. begin
  11.   PQ := Pointer(A);
  12.   Writeln('Ansi:', PQ[-1]);
  13.   PQ := Pointer(U);
  14.   Writeln('Unicode:', PQ[-1]);
  15.   PL := Pointer(W);
  16.   Writeln('Wide:', PL[-1]);
  17.   Readln;
  18. end.
Result:
Code: Text  [Select]
  1. Ansi:5
  2. Unicode:5
  3. Wide:10
Title: Re: Quick question about const array of string
Post by: Thaddy on June 27, 2019, 07:00:09 pm
@ASerge: insert codepoint dependent characters and the results look different. Also note widestring is not a reference counted type: it has been superseded by the assignment compatible UnicodeString, which IS a Pascal string type...(try a UTF8 char that needs 4 bytes, or 3 bytes...or a couple of those mixed......drama, tragedy  :'( :'( :'( :'()
Title: Re: Quick question about const array of string
Post by: ASerge on June 27, 2019, 08:22:14 pm
@ASerge: insert codepoint dependent characters and the results look different.
No. This is the result in characters, not code points.
And the result for widestring is platform-specific. In the example it for Windows.
Title: Re: Quick question about const array of string
Post by: lucamar on June 28, 2019, 07:51:33 am
No. This is the result in characters, not code points.
And the result for widestring is platform-specific. In the example it for Windows.

On Linux it gives this:
Code: [Select]
Ansi:5
Unicode:5
Wide:5
as was expected. :)