Recent

Author Topic: Is there some way to obtain the length of a character array at compile time ?  (Read 16430 times)

440bx

  • Hero Member
  • *****
  • Posts: 3944
How about letting the IDE deal with counting array elements instead of the compiler doing it?
Something like https://forum.lazarus.freepascal.org/index.php?topic=27186.15
That is a nice idea but, it has one major downside.  If a string is changed (made smaller or longer) and the programmer forgets to tell the IDE to "update" the dependent locations then, things are no longer in sync.  Additionally, it's difficult to trust the IDE to do those things correctly since the IDE usually, unlike the compiler, has a limited view of the entire program.

If you do a shortstring (max len 255) and sizeof(), maybe that will do. (not tested, and if it works may be one extra)
Code: Pascal  [Select][+][-]
  1. const foo = shortstring('abc');

With ansistring that will not work, since sizeof(ansistring) = sizeof(pointer).

What would you do want to do with the value, if you could get it?
Unfortunately, even with a shortstring, the compiler does not allow using the length of the string at compile time.  For instance, it will not accept "const Mylength = mystring[0];" or "const mylength = length(mystring);".  Taking sizeof(shortstring) will yield a value that is the size of a shortstring, not the size/character count of the string it holds (which is reasonable since, as you know, that is not a constant, it's just an initialized variable.)

As far as what I'd do with it, it would be a safe and convenient way of defining the size of dependent types just as it is commonly done with regular data types, e.g, "somecharbuffer = array[0..2 * size(byte)] of char" to define a buffer that can hold a (single byte) character for each nibble in a byte plus the null terminator, change "byte" to any other ordinal type to have a buffer that snuggly holds the type converted to "array of char" (hex conversion). If the programmer needs a buffer to hold a converted qword, just change "byte" to "qword" and everything is updated automatically by the compiler.  That's just an example, as you know, there are countless examples where knowing the size of an item at compile time can be very useful in the definition of dependent types and writing code that is automatically updated by the compiler if the target type (in this case byte) changes in the future.

Specifically with null terminated strings.  If you're going to build at run time a "composite" string made of various constant strings and you know at compile time the character counts of each string that will make up the composite string then, it is possible to declare a buffer type that is something along the lines of "TMyBuffer : array[0..sizeofastring + sizeofanotherstring + sizeofyetanotherstring] of char;".  On one side, it spares the programmer from having to define a maximum size, thereby ensuring that the buffer is always large enough to accommodate the resulting string.  If a routine builds a half dozen of these strings and they reside on the stack before being output, It can make the difference between allocating a few hundred bytes instead of a "max_buffer_size" for each string and still running the risk that there may be one combination that causes a buffer overflow thereby corrupting the stack.

Basically, it allows to write cleaner and safer code when strings (char arrays) are involved.

It would be nice if it were available, particularly considering that the compiler has the information but, Length and strlen, it is.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9792
  • Debugger - SynEdit - and more
    • wiki
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. const
  4.   foo = shortstring('abc123def');
  5.   l = sizeof(foo);
  6.  
  7. begin
  8.   writeln(l);
  9.   readln;
  10. end.
  11.  

Compiles and prints 9. (that is bytes, not chars, in case of utf8)

Which is interesting, because a shortstring also contains the length of the string, so I expected 10.

You can always add an "assert", that of course is runtime. But it would be only during testing, to alert you if any assumption went wrong.


Also note: shortstrings are like records passed by value, not by pointer (except for constref).
« Last Edit: August 11, 2018, 02:40:23 am by Martin_fr »

440bx

  • Hero Member
  • *****
  • Posts: 3944
Which is interesting, because a shortstring also contains the length of the string, so I expected 10.
As I am sure you know, the typecast doesn't convert the string into a shortstring.  It is still a null terminated array of characters.  If it had done a conversion, the sizeof it would have been 256. 

As you suggested, I am making sure the code checks that things "fit" and if they don't, the function returns false without "breaking" anything.

Having the size/length at compile time would be very useful.  It wouldn't require code to ensure everything is safe and it would consequently be cleaner.

back to strlen...  :-\

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Without a typed const, indeed... the compiler simply translates it into a string literal if you use an untyped const declaration, with or without type indication by cast.
But is the strlen not really your issue? and that is really not necessary unless interfacing with C (and the likes) code. It is also not correct for Pascal strings. Pascal strings can contain zero's somewhere in the middle.
Code: Pascal  [Select][+][-]
  1. var s: string = 'testme'#0'testmesomemore'; // C can only write 'testme'
  2. begin
  3.   writeln(s);
  4. end.
or:
Code: Pascal  [Select][+][-]
  1. const s: string = 'testme'#0'testmesomemore';// C can only write 'testme'
  2. begin
  3.   writeln(s);
  4. end.
or:
Code: Pascal  [Select][+][-]
  1. const s = 'testme'#0'testmesomemore';// C can only write 'testme'
  2. begin
  3.   writeln(s);
  4. end.

All three output the same, because they are native Pascal string constructs.
Forget about PChars, unless you interface with dumb string languages.  :D
It seems to me you decided on strlen for all the wrong reasons: Pascal has a distinctive way to handle strings.
You should use length() which is opaque to the issue at hand. Strlen is just for PChar's and Length handles that too..
(Sorry if I offend you again..but I know something about programming languages...) but that is not Pascal and bad programming as the above demonstrates: it can have all kind of side effects:
Code: Pascal  [Select][+][-]
  1. const s = 'testme'#0'testmesomemore';
  2. begin
  3.   writeln(Length(s)); // writes 21: correct
  4.   writeln(strlen(s)); // writes 6: dumbed down C style
  5. end.
This happens when you use strlen on pascal code...... < grumpy.. >:D >:D  :D 8-) O:-) >

Let's see what happens here:
Code: Pascal  [Select][+][-]
  1. const s:PChar { dumb, monkey style, I know }= 'testme'#0'testmesomemore';
  2. begin
  3.   writeln(Length(s)); // hey presto! 6! Where's my content?
  4.   writeln(strlen(s)); //Hey presto! 6!
  5. end.

DON'T use strlen. simple. Unless for interfacing.

Well, this should read like a comic book for most Pascal programmers... :-X 8) ;D O:-)


« Last Edit: August 11, 2018, 12:04:35 pm by Thaddy »
Specialize a type, not a var.

440bx

  • Hero Member
  • *****
  • Posts: 3944
Without a typed const, indeed... the compiler simply translates it into a string literal if you use an untyped const declaration, with or without type indication by cast.
But is the strlen not really your issue? and that is really not necessary unless interfacing with C (and the likes) code.
This is a port of a C program to FPC.  I could take all the C null terminated character arrays and convert them into Pascal strings.  That's always an option but, it makes the port more complicated because some of the strings are longer than 255 characters.  I could use AnsiStrings which can host more than 255 characters, if I go that route I open the door to potential problems since they are a managed type (reference counted.)

The idea is to keep the initial port as close to the original C code as possible to avoid surprises.  Once the initial port is running successfully then, closely examine the result to determine how it can be made cleaner and simpler using Pascal specific constructs/types (particularly objects and properties.)  I've already seen plenty of ways to improve the program with Pascal constructs but, if I start indulging, the port is no longer a port, it's a complete rewrite, which I intend to do after I have a ported version that works as it should.

All that said, I am using some Pascal features that don't really change anything but do make the program cleaner.  For instance, ranges, enumerated types and, var parameters.   Small things that have very little impact overall.

In the first step, the overriding goal is to eliminate all the dependencies on the C standard library.  That alone can occasionally be more delicate than it initially seems.

I'm also using the port to learn about FPC, it's "personality" and quirks and, also about Lazarus as a development environment.

Thank you for the input.  I do agree that Pascal strings are a much better (and flexible) way of handling character arrays than the C pointers to char but, for now, I'm limiting changes to small improvements like using low/high/enumerated types/var and the like.



(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
In mode $H+ pascal strings are practically unlimited in length.
Specialize a type, not a var.

Xor-el

  • Sr. Member
  • ****
  • Posts: 404
In mode $H+ pascal strings are practically unlimited in length.

@Thaddy, I guess you wanted to say limited to 2GB right?  :)

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
In mode $H+ pascal strings are practically unlimited in length.

@Thaddy, I guess you wanted to say limited to 2GB right?  :)
On a 32 bit platform. Yes. And even that depends. :)
Specialize a type, not a var.

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. const
  4.   foo = shortstring('abc123def');
  5.   l = sizeof(foo);
  6.  
  7. begin
  8.   writeln(l);
  9.   readln;
  10. end.
  11.  

Compiles and prints 9. (that is bytes, not chars, in case of utf8)

Which is interesting, because a shortstring also contains the length of the string, so I expected 10.

Delphi (7) prints 256 (which is what I expected, since shortstring is string[255] and it occupies 256 bytes of memory).
No idea about more modern Delphi's.

Bart

440bx

  • Hero Member
  • *****
  • Posts: 3944
Delphi (7) prints 256 (which is what I expected, since shortstring is string[255] and it occupies 256 bytes of memory).
No idea about more modern Delphi's.

Bart

I'm really surprised.  I just tried it with Delphi 2 and Delphi 10 Seattle and both agree on 256.

However, if one tries to assign a new value to "foo", both complain that the left side cannot be assigned to, which proves that it didn't take the cast as a datatype, yet a hex dump of the executable reveals that it did allocate space in the initialized data section (256 bytes worth of space, therefore it did do a conversion, the extra space is filled with nulls) also, if one tries to either take the address of foo or go to the address of foo in memory, in both cases Delphi returns an error.   

Basically, it creates a read-only variable.   It probably does that, just in case, that somewhere in the code the programmer reads bytes in the string that are beyond the length of the string but still within the limits of its size (not exactly good programming.)  With Delphi that would result in a null being read.

FPC does it quite differently.  If the program is compiled with debug info then there is debugging information about the constant but no memory/section space is allocated to it.  If compiled without debugging information then, there is no trace anywhere in the executable of the constant's existence in the code.

What Delphi does is "safe" but what FPC does is what actually is correct.




(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. const
  4.   foo = shortstring('abc123def');
  5.   l = sizeof(foo);
  6.  
  7. begin
  8.   writeln(l);
  9.   readln;
  10. end.
  11.  

Compiles and prints 9. (that is bytes, not chars, in case of utf8)

Which is interesting, because a shortstring also contains the length of the string, so I expected 10.

Delphi (7) prints 256 (which is what I expected, since shortstring is string[255] and it occupies 256 bytes of memory).
No idea about more modern Delphi's.

Bart
Bart, Delphi and FPC reserve 256 bytes, but store the shortstring correct, with size. Hence length.
Specialize a type, not a var.

440bx

  • Hero Member
  • *****
  • Posts: 3944
Bart, Delphi and FPC reserve 256 bytes, but store the shortstring correct, with size. Hence length.
FPC doesn't reserve 256 bytes.  It reserves 0 bytes.  Unlike in Delphi, after compiling, there is no trace of foo in the executable (except if debug info is requested, then its existence as a constant appears in the debugging information.)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Bart, Delphi and FPC reserve 256 bytes, but store the shortstring correct, with size. Hence length.
FPC doesn't reserve 256 bytes.  It reserves 0 bytes.  Unlike in Delphi, after compiling, there is no trace of foo in the executable (except if debug info is requested, then its existence as a constant appears in the debugging information.)
Uhhhhh. Wrong again.
I will show you the full asm example from one of the previous to illustrate that...
Wait a sec..
Code: Bash  [Select][+][-]
  1. TC_$P$SSTRING_$$_SOMECHARACTERS:
  2.         .byte   22
  3. # [4] somecharacters:Shortstring  = 'an array of characters';
  4.         .ascii  "an array of characters\000                         "
  5.         .ascii  "                                                   "
  6.         .ascii  "                                                   "
  7.         .ascii  "                                                   "
  8.         .ascii  "                                                   "
  9.         .ascii  "   "
  10. .Le11:
  11.  
Count them... that is 255 + the .byte 22.....
« Last Edit: August 11, 2018, 07:24:34 pm by Thaddy »
Specialize a type, not a var.

440bx

  • Hero Member
  • *****
  • Posts: 3944
Count them... that is 255 + the .byte 22.....
I have no doubt you see them, that along with a few pink elephants and a few cases of bottles of wine, which is the probably the one thing you are good at, drinking it.

When you're done... _compile this_ and examine it with a hex viewer or have dumpbin disassemble it for you.

Code: Pascal  [Select][+][-]
  1. program WithShortstrings;
  2.  
  3. const
  4.   foo   = shortstring('abc123def');
  5.  
  6.  
  7.   l  = sizeof(foo);
  8.  
  9.  
  10. begin
  11.  writeln(l);
  12.  
  13.  readln;
  14. end.        

if you have any questions about where the example comes from, refer to your own message #25.  Hopefully, you are able to understand yourself (which does not guarantee in any way that your understanding is correct.)
« Last Edit: August 11, 2018, 09:10:34 pm by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
How about letting the IDE deal with counting array elements instead of the compiler doing it?
Something like https://forum.lazarus.freepascal.org/index.php?topic=27186.15
That is a nice idea but, it has one major downside.  If a string is changed (made smaller or longer) and the programmer forgets to tell the IDE to "update" the dependent locations then, things are no longer in sync.  Additionally, it's difficult to trust the IDE to do those things correctly since the IDE usually, unlike the compiler, has a limited view of the entire program.
I didn't say it's perfect. I just don't think you have a better solution at this moment. If it was for me I would make IDE script even more simple - just count commas in a selection and add 1 to the result and put it where it belongs in the code. Crude but should do the work.
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

 

TinyPortal © 2005-2018