Recent

Author Topic: Is there some way to obtain the length of a character array at compile time ?  (Read 2903 times)

440bx

  • Sr. Member
  • ****
  • Posts: 390
hello,

Consider a simple definition like this one:
Code: Pascal  [Select]
  1. const
  2.   somecharacters  : pchar = 'an array of characters';
  3.  
The compiler knows at compile time how many characters are pointed to by the constant pointer "somecharacters" and, that's what I'd like to get.

In the above example, I'd like the compiler to tell me there are 22 characters (or 23 if it counts the null).  Is there a way to have the compiler return that value ?  (not using strlen, since that is a run time function, I want the size available at compile time.)

Thanks.

engkin

  • Hero Member
  • *****
  • Posts: 2129
Looking at the generated assembly file (using -al), I don't think the compiler provides the length of somecharacters. It provides the characters:
Code: ASM  [Select]
  1. .section .data.n__$PROJECT1$_Ld1,"d"
  2.         .balign 16
  3. .globl  _$PROJECT1$_Ld1
  4. _$PROJECT1$_Ld1:
  5. # [13] somecharacters  : pchar = 'an array of characters';
  6.         .ascii  "an array of characters\000"

and their address:
Code: ASM  [Select]
  1. .section .data.n_tc_$p$project1_$$_somecharacters,"d"
  2.         .balign 4
  3. TC_$P$PROJECT1_$$_SOMECHARACTERS:
  4.         .long   _$PROJECT1$_Ld1

While for a string:
Code: Pascal  [Select]
  1. const
  2.   somestring  : string = 'an array of characters in a string';

It has the length:
Code: ASM  [Select]
  1. .section .rodata.n__$PROJECT1$_Ld2,"d"
  2.         .balign 4
  3.         .short  0,1
  4.         .long   -1,34
  5. .globl  _$PROJECT1$_Ld2
  6. _$PROJECT1$_Ld2:
  7. # [16] somestring  : string = 'an array of characters in a string';
  8.         .ascii  "an array of characters in a string\000"

But there does not seem to be a way to get it into another constant. However I think I might have found a bug:
Code: Pascal  [Select]
  1. const
  2.   somestring  : string = 'an array of characters in a string';
  3.   addr: pointer = @somestring[1];

addr holds @somestring+1, instead of @somestring[1]. Not sure if it is worth reporting.

440bx

  • Sr. Member
  • ****
  • Posts: 390
Engkin, thank you for the thorough and detailed reply. 

I tried defining another constant immediately after that one and then taking the difference of the two pointers (a variation of $ - varname in assembly) but, unfortunately that didn't work and looking at the memory layout it was evident that it didn't work because of alignment. 

I tried {$packrecords 1} which the manuals say is the same as {$align 1} but, in memory, the fields / constants were still aligned on a 16 byte boundary (compiling for 64bit) which defeated the attempt.

I believe you are right.  There does not seem to be a way of coercing that value out of FPC.  can't win them all ... :)

Thank you again for the thorough reply.





Thaddy

  • Hero Member
  • *****
  • Posts: 7183
Well, for a string literal the compiler will simply precalc the string without code in the binary and the generated code will just be 22.
Code: Pascal  [Select]
  1. {$ifdef fpc}{$mode delphi}{$H+}{$I-}{$endif}
  2. const
  3.   somecharacters  = 'an array of characters';
  4. begin
  5.   writeln(Length(somecharacters));
  6. end.

This is arm asm but you get the point:
Code: ASM  [Select]
  1. # [6] writeln(Length(somecharacters));
  2.         bl      fpc_get_output
  3.         mov     r4,r0
  4.         mov     r2,#22   ;<------ pre-calculated by the compiler.
  5.         mov     r1,r4
  6.         mov     r0,#0
  7.         bl      fpc_write_text_sint
  8.         mov     r0,r4
  9.         bl      fpc_writeln_end
  10.  
So if possible do not use a typed const, but a literal string, untyped const. x86_64 or i386 code will look very similar:with the pre-calculated length.
In the case of typed consts, length() will actually execute code for string length(examine size field) or code for PChar len, which actually calls strlen anyway.
 
« Last Edit: August 10, 2018, 06:00:30 am by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

440bx

  • Sr. Member
  • ****
  • Posts: 390
So if possible do not use a typed const, but a literal string, untyped const. x86_64 or i386 code will look very similar:with the pre-calculated length.
In the case of typed consts, length() will actually execute code for string length(examine size field) or code for PChar len, which actually calls strlen anyway.
Thanks Thaddy.  If the constant is not typed then simply sizeof(thecharacterconstant) returns the number of characters in the constant which is nice but, unfortunately doing it way that leads to some undesirable problems when using arrays.

Your mentioning non-typed constants gave me a few ideas to try.  Had some fun in the process, with the following code:
Code: Pascal  [Select]
  1. type
  2.   Tarray = array[0..1] of pchar;
  3.  
  4. const
  5.   // sizeof returns the size of these constants
  6.  
  7.   charconst1   = 'character constant 1';
  8.   charconst2   = 'character constant 2 - made purposely larger';
  9.  
  10. // this variable declaration gives the compiler quite a headache.  :-)
  11.  
  12. var
  13.   carray       : tarray absolute charconst1;
  14.  
  15.   ;            // for some reason the compiler wants an extra semicolon there.
  16.                // it's happy to get the semicolon  but then it gets totally lost.
  17.      
  18. const
  19.   // this works but, it becomes the programmer's responsibility to ensure that
  20.   // the calculated character array sizes are in synch with the order of the
  21.   // elements of the array.  It's a workaround that "works", but it just makes the
  22.   // code unacceptably fragile.
  23.  
  24.   anarray   : array[0..1] of pchar = (charconst1, charconst2);
  25.  
There doesn't seem to be a way to win.  I'll just use strlen. 

Thanks again.

Thaddy

  • Hero Member
  • *****
  • Posts: 7183
Better simply use length(), that works also on PChars. The compiler generates about the same code as strlen.
I have another idea that seems to do what you want:
Code: Pascal  [Select]
  1. {$ifdef fpc}{$mode delphi}{$H+}{$I-}{$J-}{$endif}
  2. const
  3.   somecharacters:shortstring  = 'an array of characters';
  4. begin
  5.   writeln(Length(somecharacters));
  6. end.

which does this:
Code: ASM  [Select]
  1. # [6] writeln(Length(somecharacters));
  2.         bl      fpc_get_output
  3.         mov     r4,r0
  4.         ldr     r0,.Lj3
  5.         ldrb    r2,[r0]
  6.         mov     r1,r4
  7.         mov     r0,#0
  8.         bl      fpc_write_text_uint
  9.         mov     r0,r4
  10.         bl      fpc_writeln_end
Which references through .Lj3:
Code: ASM  [Select]
  1. TC_$P$UNTITLED_$$_SOMECHARACTERS:
  2.         .byte   22 ; <--- shortstring length
  3.  

« Last Edit: August 10, 2018, 07:39:14 am by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

440bx

  • Sr. Member
  • ****
  • Posts: 390
Better simply use length(), that works also on PChars. The compiler generates about the same code as strlen.
Code: ASM  [Select]
  1. TC_$P$UNTITLED_$$_SOMECHARACTERS:
  2.         .byte   22 ; <--- shortstring length
  3.  
The compiler knows the length of the string but, it refuses to give it up at compile time.  It has no problem handing it over at run time.  I tried assigning s[0] to a constant.  It didn't like it.  Also tried length and a number of other "creative" ways of extracting the length at compile time, all to no avail.

You're right, I'll just resign myself to get the length at run time using a "compiler legal" way of getting it.  There doesn't seem to be any alternative anyway.

Thanks, that idea was worth checking out.

PascalDragon

  • Full Member
  • ***
  • Posts: 206
  • Compiler Developer
Code: Pascal  [Select]
  1. type
  2.   Tarray = array[0..1] of pchar;
  3.  
  4. const
  5.   // sizeof returns the size of these constants
  6.  
  7.   charconst1   = 'character constant 1';
  8.   charconst2   = 'character constant 2 - made purposely larger';
  9.  
  10. // this variable declaration gives the compiler quite a headache.  :-)
  11.  
  12. var
  13.   carray       : tarray absolute charconst1;
  14.  
  15.  

No, just no. Such a declaration is an absolute accident waiting to happen as the compiler is free to place the two constant where it wants and if it thinks that a third, completely unrelated constant fits nicely between charconst1 and charconst2 than it will do so, thus breaking your array.

Maybe it's best for you to explain what you want to achieve in the end instead of posting strange abuses of array, absolute and PChar. This way we could try to find a solution that's efficient, but also maintainable.

440bx

  • Sr. Member
  • ****
  • Posts: 390
No, just no. Such a declaration is an absolute accident waiting to happen as the compiler is free to place the two constant where it wants and if it thinks that a third, completely unrelated constant fits nicely between charconst1 and charconst2 than it will do so, thus breaking your array.

Maybe it's best for you to explain what you want to achieve in the end instead of posting strange abuses of array, absolute and PChar. This way we could try to find a solution that's efficient, but also maintainable.
What I want couldn't be any simpler, I want to have the count of characters in a constant array of characters _at compile time_  (the compiler has that information but, it won't give it up).    Nothing particularly extravagant, parallel to sizeof(sometype) but for constant arrays.    Having those constants would be helpful, just as having high/low of an array allows other arrays/data types to be "customized" to fit them.  Nothing "strange" in that.

As far as the use of absolute in that way, I agree with you but, I was willing to pull the compiler's teeth if necessary to, at least, find out if there was a way to get the values.

Also, I don't care where the compiler chooses to put the constants.  I simply wish it would provide a way for the programmer to obtain their size/(length in this case.)  Just as it does for most everything else.

It would be nice (and useful) if those values could be obtained at compile time but, since that is apparently not possible, I'll get their size at runtime (without abusing the compiler  :D)  pretty length or pretty strlen with a cherry on the top.





Thaddy

  • Hero Member
  • *****
  • Posts: 7183
Well, nothing strange in possibly {$J-} mode as a new optimization (even typed consts should be immutable if they are not assignable).
Anything else should rely on the "runtime" functions. E.g. C(++) doesn't even know any different. And length() already optimizes if possible, as I showed.
What I have seen is that:
- shortstring has an immediate length. (hardly any code)
- untyped consts as string literals have an immediate length. (even less code)
- Ansistrings and Unicode string consts have access to the length field and are not really inefficient.
- Pchar demands strlen (or the internal strlen from the compiler, just like C code)

I think that maybe it is possible for the compiler to optimize a typed string const in {$J-} mode, but certainly not in {$J+} mode.
And I wonder what happens since a) Pchar is a foreign type to support C and derivatives and b) the other problematic cases are managed types.

I think the gain is not worth the trouble, since both untyped literals as shortstring literals are assignment compatible to the other string types, even Pchar.
« Last Edit: August 10, 2018, 10:16:33 am by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

440bx

  • Sr. Member
  • ****
  • Posts: 390
I think the gain is not worth the trouble, since both untyped literals as shortstring literals are assignment compatible to the other string types, even Pchar.
You're definitely right about that.  I asked the question to make sure there wasn't some feature in FPC, that I was not aware of, that would yield those values.  I'll just determine the length at runtime.  As you pointed out, it really won't take much code to get the lengths and use them.

ASerge

  • Hero Member
  • *****
  • Posts: 998
Code: Pascal  [Select]
  1. const
  2.   somecharacters  : pchar = 'an array of characters';
The compiler knows at compile time how many characters are pointed to by the constant pointer "somecharacters" and, that's what I'd like to get.
Code: Pascal  [Select]
  1. program Project1;
  2. {$APPTYPE CONSOLE}
  3. {$OPTIMIZATION OFF}
  4.  
  5. const
  6.   CSomeCharacters = 'an array of characters';
  7.   SomeCharacters: PChar = CSomeCharacters;
  8. begin
  9.   if Length(CSomeCharacters) <> 22 then
  10.     Writeln('You won''t see it');
  11.   Writeln(Length(CSomeCharacters));
  12.   Writeln(Length(SomeCharacters));
  13.   Readln;
  14. end.
Even so: project1.lpr(10,5) Warning: unreachable code
There is only one copy of a line in the executable.

440bx

  • Sr. Member
  • ****
  • Posts: 390
Even so: project1.lpr(10,5) Warning: unreachable code
There is only one copy of a line in the executable.
The compiler obviously knows the length of the character array constant.  There just doesn't seem to be any way of getting it at compile time.  That's unfortunate, there are times when that information is useful.

Nice example Serge, proves beyond any doubt that the compiler knows the length and it is using it. Thanks.
« Last Edit: August 10, 2018, 10:47:43 pm by 440bx »

avra

  • Hero Member
  • *****
  • Posts: 1377
    • Additional info
How about letting the IDE deal with counting array elements instead of the compiler doing it?
Something like https://forum.lazarus.freepascal.org/index.php?topic=27186.15
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 4949
    • wiki
If you do a shortstring (max len 255) and sizeof(), maybe that will do. (not tested, and if it works may be one extra)
Code: Pascal  [Select]
  1. const foo = shortstring('abc');

With ansistring that will not work, since sizeof(ansistring) = sizeof(pointer).

What would you do want to do with the value, if you could get it?
« Last Edit: August 10, 2018, 11:55:28 pm by Martin_fr »