Just to test my understanding, in this construction:The first thing to understand is that the way that construction is implemented depends on the setting of $LONGSTRINGS.
Just to test my understanding, in this construction:Just compile it with -al and you'll see how the array is layouted (in this case with {$H+}):
const DTFmt: array[DTFmtKind] of string = ('', 'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats} 'dddddd tt', 'dddddd', 'tt', {Long formats} 'ddddd t', 'ddddd', 't', {Short formats} 'f', {Mixed short date + long time} '');
the two empty strings (first and last) ocuppy no memory, right? Or just two Nil pointers?
# Begin asmlist al_const
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld1$strlab:
.short 0,1
.long -1,16
.Ld1:
# [33] 'yyyy-mm-dd hh:nn', 'yyyy-mm-dd', 'yyyy-mm-dd', {ISO formats}
.ascii "yyyy-mm-dd hh:nn\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld2$strlab:
.short 0,1
.long -1,10
.Ld2:
.ascii "yyyy-mm-dd\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld3$strlab:
.short 0,1
.long -1,10
.Ld3:
.ascii "yyyy-mm-dd\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld4$strlab:
.short 0,1
.long -1,9
.Ld4:
# [34] 'dddddd tt', 'dddddd', 'tt', {Long formats}
.ascii "dddddd tt\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld5$strlab:
.short 0,1
.long -1,6
.Ld5:
.ascii "dddddd\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld6$strlab:
.short 0,1
.long -1,2
.Ld6:
.ascii "tt\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld7$strlab:
.short 0,1
.long -1,7
.Ld7:
# [35] 'ddddd t', 'ddddd', 't', {Short formats}
.ascii "ddddd t\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld8$strlab:
.short 0,1
.long -1,5
.Ld8:
.ascii "ddddd\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld9$strlab:
.short 0,1
.long -1,1
.Ld9:
.ascii "t\000"
.section .rodata.n_.Ld1,"d"
.balign 4
.Ld10$strlab:
.short 0,1
.long -1,1
.Ld10:
# [36] 'f', {Mixed short date + long time}
.ascii "f\000"
# End asmlist al_const
# Begin asmlist al_typedconsts
.section .data.n_TC_$P$THELLOWORLD_$$_DTFMT,"d"
.balign 4
TC_$P$THELLOWORLD_$$_DTFMT:
.long 0
.long .Ld1
.long .Ld2
.long .Ld3
.long .Ld4
.long .Ld5
.long .Ld6
.long .Ld7
.long .Ld8
.long .Ld9
.long .Ld10
.long 0
# End asmlist al_typedconsts
As you can see the first and last element are Nil elements.Just remember typed consts have pointers to something, even nil.That's a dangerous generalization. For instance, in the case of a shortstring typed constant, there is no _separate_ pointer pointing to the contents of the string. e.g,
It seems to me that the introduction of border elements is not necessary.
type DTFmtKind = (dtfNone, dtfISOFull, dtfISODate, dtfISOTime, dtfLongFull, dtfLongDate, dtfLongTime, dtfShortFull, dtfShortDate, dtfShortTime, dtfMixedFull, dtfLast);
It seems to me that the introduction of border elements is not necessary.
[.. etc ...]
What are these three numbers before each string:
.section .rodata.n_.Ld1,"d" .balign 4 .Ld2$strlab: .short 0,1 .long -1,10 .Ld2: .ascii "yyyy-mm-dd\000"
What are these three numbers before each string:
.section .rodata.n_.Ld1,"d" .balign 4 .Ld2$strlab: .short 0,1 .long -1,10 .Ld2: .ascii "yyyy-mm-dd\000"
Maybe it has something to do with how strings are represented in memory (https://www.freepascal.org/docs-html/current/prog/progsu161.html) :)
@440bx,Mea culpa, being lazy. In the case of shortstrings, since each string occupies 256 bytes, I didn't want to show the entire layout of the DtFmt array. Once I made that choice, I could simply stay in Lazarus and show the code generated. I usually compile for 32bit when I'm interested in mentally tracking variable usage because in 32bit, very often but not always variable references are fixed stack locations, which makes them easier to keep track of mentally than register contents that keep changing - as is the case in 64bit.
Nice detailed explanation. I got a bit puzzled by seeing 32bit assembly and 64bit memory layout. The memory layout included 17 locations and did not match with the Pascal code that had only 12 strings (array[0..11] of string).
You obviously did put noticeable effort and probably some of the tools you used were for 64bit and had to recompile for 64bit?In this particular case, it was shear laziness. For the LONGSTRINGS OFF, I could use the facilities offered by the Lazarus debugging environment to show everything that needed to be shown. For the LONGSTRINGS ON, using the Lazarus environment gets a bit convoluted because there is an array of pointers to show and the null terminated strings. The memory viewing abilities of the Lazarus environment are a bit convoluted to use. Because of that, I switched to the visual studio debugger which makes it really easy to look at memory anywhere you want.
What are these three numbers before each string:
Any idea about the other numbers?rtl\inc\astrings.inc stucture TAnsiRec
What are these three numbers before each string:QuoteAny idea about the other numbers?rtl\inc\astrings.inc stucture TAnsiRec
CodePage CodePage of ansistring
ElementSize nb of byte per character
Ref Reference count, -1 for constants
Len Length of string in ?Bytes or ?Elements, I don't know
Len Length of string in ?Bytes or ?Elements, I don't know
That's easily ascertained: create a UnicodeString with say 5 characters: 'abcde', get the pointer to it and read the "Len" field. If it's 10, then it's the length in bytes; if 5, it's in characters. :)
@ASerge: insert codepoint dependent characters and the results look different.No. This is the result in characters, not code points.
No. This is the result in characters, not code points.
And the result for widestring is platform-specific. In the example it for Windows.
Ansi:5
Unicode:5
Wide:5
as was expected. :)