Recent

Author Topic: PCHAR question  (Read 815 times)

ssawgift

  • New Member
  • *
  • Posts: 48
    • My Personal Website
PCHAR question
« on: November 26, 2022, 05:27:29 pm »
I suddenly had this question. If I understand correctly, Pascal strings are not null-terminated. When I do a type cast like this:
p := PCHAR(strvar)
What is going on behind the scene? Does the compiler automatically allocate a block of memory and copy the string to it and null-terminate it?

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: PCHAR question
« Reply #1 on: November 26, 2022, 05:36:03 pm »
Pascal strings are actually sometimes (mostly) zero terminated, but Pascal ansistrings are both zero terminated and can contain zero's anywhere in the string. except for shortstrings.
All PChar types behave the same as in e.g. C: They are zero terminated and do not allow zero's anywhere in the middle.
Pchar types are technically arrays with a zero terminator and not Pascal string types. Not Wirth's fault but K&R.
There are many examples on this forum. I wrote at least 10 small examples over the years. But:
Code: Pascal  [Select][+][-]
  1. {$H+} // ansi basics on. Shortstring does not need it.
  2. var
  3.    a:string = 'test me'#0'some more';
  4. begin
  5.   writeln(PChar(a)); // goofy style ( curly brackets )
  6.   writeln( a); // sensible style (Pascal)
  7. end.
« Last Edit: November 26, 2022, 05:44:02 pm by Thaddy »
Specialize a type, not a var.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: PCHAR question
« Reply #2 on: November 26, 2022, 05:42:47 pm »
More exact.

Pascal strings are not null terminated.  They can contain nulls, because they have a length field.

But Pascal Strings (at least long/ansi, to which you can apply the mentioned "pchar(s)" cast) do have a terminating #0. Only that is not within the string. It is the next byte after the end of the string. The compiler automatically allocates the extra space for each string, and always puts the #0 in there.

I am not sure if that is
- documented behaviour
- implementation detail. (which means it could change with no notice)

In either case, the "pchar(s)" cast is documented. And it will continue to work.

Mind also that

Code: Pascal  [Select][+][-]
  1. var s: ansistrnig;
  2. p: pchar;
  3. begin
  4.   p := pchar(s);
  5.   s := s + 'abc'; // at this point p may become a dangling pointer. I.e. p may point to any kind of trash in memory.

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: PCHAR question
« Reply #3 on: November 26, 2022, 05:47:03 pm »
More exact.

Pascal strings are not null terminated.  They can contain nulls, because they have a length field.
Is not exact. All supported string types derived from Ansi - AnsiString, UnicodeString etc - are zero terminated. Only exception are shortstrings (where Delphi also terminates these documented with a zero)
See my example.
All of this is documented in the manuals except the zero terminator for shortstring. Which is, as you rightly noted, not necessary but that goes for all the supported Pascal string types.
« Last Edit: November 26, 2022, 05:50:18 pm by Thaddy »
Specialize a type, not a var.

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: PCHAR question
« Reply #4 on: November 26, 2022, 05:52:34 pm »
More exact.

Pascal strings are not null terminated.  They can contain nulls, because they have a length field.

But Pascal Strings (at least long/ansi, to which you can apply the mentioned "pchar(s)" cast) do have a terminating #0. Only that is not within the string. It is the next byte after the end of the string. The compiler automatically allocates the extra space for each string, and always puts the #0 in there.

I am not sure if that is
- documented behaviour
- implementation detail. (which means it could change with no notice)

In either case, the "pchar(s)" cast is documented. And it will continue to work.

Mind also that

Code: Pascal  [Select][+][-]
  1. var s: ansistrnig;
  2. p: pchar;
  3. begin
  4.   p := pchar(s);
  5.   s := s + 'abc'; // at this point p may become a dangling pointer. I.e. p may point to any kind of trash in memory.
NO. because the compiler treats s as ansistring and concatination is OK.
Specialize a type, not a var.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: PCHAR question
« Reply #5 on: November 26, 2022, 05:52:38 pm »
Is not exact. All supported string types derived from Ansi - AnsiString, UnicodeString etc - are zero terminated. Only exception are shortstrings (where Delphi also terminates these documented with a zero)

Hmm interesting. What do they do when you stuff the exact number of bytes in the string ?  E.g. 80 bytes in a string[80] shortstring.

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: PCHAR question
« Reply #6 on: November 26, 2022, 05:57:11 pm »
What is going on behind the scene? Does the compiler automatically allocate a block of memory and copy the string to it and null-terminate it?
What happens behind the scene depends on the string type and the string compiler options in effect.

This page seems to have the answers you're looking for:  https://wiki.freepascal.org/Character_and_string_types

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: PCHAR question
« Reply #7 on: November 26, 2022, 06:45:34 pm »
More exact.

Pascal strings are not null terminated.  They can contain nulls, because they have a length field.
Is not exact. All supported string types derived from Ansi - AnsiString, UnicodeString etc - are zero terminated.

Sorry, but "zero terminated" does mean
1) They end at the first #0 encountered.
2) They have (have to have) a #0 at the end (as an implication of point 1), for otherwise they would not be terminated.

The above statement is not true for pascal strings. Short and Ansi-string (and "other supported") can have a zero #0 contained in the text. And they do not end on such a #0. They are not terminated by a zero.

However as I explained, they [1] have (which may be implementation detail) a #0 at their "termination". But even that #0 does not terminate them. It just is located there for convenience.

[1] I know this for at least Ansistring. It may or may not hold true for any other string type.
« Last Edit: November 26, 2022, 08:41:44 pm by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: PCHAR question
« Reply #8 on: November 26, 2022, 06:51:07 pm »
Mind also that

Code: Pascal  [Select][+][-]
  1. var s: ansistrnig;
  2. p: pchar;
  3. begin
  4.   p := pchar(s);
  5.   s := s + 'abc'; // at this point p may become a dangling pointer. I.e. p may point to any kind of trash in memory.
NO. because the compiler treats s as ansistring and concatination is OK.

You did notice that this is not about concatenation? Any assignment to s may cause p to become dangling.

- p points to the original mem location of s[1].
- p := pchar(s);  does not affect the refcount
- changing s can (not must, but can) change the memory location of the text. And if the refcount was 1 (which is possible), the old memory no longer is reserved.

Therefore "p may become a dangling pointer".

Depending on the mem-manager the "point to trash" may only happen later. But can happen right away. (afaik heaptrc can trash the freed memory / but that I need to double check)

ASerge

  • Hero Member
  • *****
  • Posts: 2223
Re: PCHAR question
« Reply #9 on: November 26, 2022, 10:55:21 pm »
I am not sure if that is
- documented behaviour
From docs AnsiStrings: They are reference counted and are guaranteed to be null terminated.

Result:
ShortString - does not null terminated. Cannot be casted to PChar.
AnsiString - always null terminated. Can be casted to PChar.
UnicodeString - always null terminated. Can be casted to PUnicodeChar.
WideString (it differs from UnicodeString in Windows) - always null terminated. Can be casted to PWideChar.

When cast strings to PChar, you also need to be careful for empty strings. An empty string is equal to nil, and PChar(S) acts as a function, which in this case returns a pointer to an empty string that ends with #0, but not nil. Some external functions expect nil for empty strings, in this case, instead of PChar(S), you need to use Pointer(S).

Any strings always know their length, and even if they contain #0 inside, they are processed correctly. But if you cast to PChar and use functions with null terminated strings, then the processing will be incorrect.

Use the converted PChar value only if the original string does not change and remains in scope.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: PCHAR question
« Reply #10 on: November 26, 2022, 11:14:14 pm »
I am not sure if that is
- documented behaviour
From docs AnsiStrings: They are reference counted and are guaranteed to be null terminated.

Ok, so it is documented, rather implementation detail.

From the same doc
Quote
Remark The null-termination does not mean that null characters (char(0) or #0) cannot be used: the null-termination is not used internally

They are "null terminated", yet they are not "null terminated" (as they do not use the null, that would null terminate them....)
In other words, as I wrote: they a terminating null. (differs from being "null terminated")

And also, while the documentation makes every effort to point out that they have a terminating null char, it does not state that this null is not meant to be accessed via "s[length(s)+1]" => as that leads to a range check error.

IMHO, a documentation with a want to be cleaned up / reworded.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: PCHAR question
« Reply #11 on: November 26, 2022, 11:22:39 pm »
I am not sure if that is
- documented behaviour
From docs AnsiStrings: They are reference counted and are guaranteed to be null terminated.

Null terminating is very ambiguous as term, as the REMARK in that link tells. There is an null at the end, but the string doesn't stop at the first null, so null doesn't actually terminate the string.   It is really only there for support of legacy languages.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: PCHAR question
« Reply #12 on: November 27, 2022, 10:24:59 pm »
More exact.

Pascal strings are not null terminated.  They can contain nulls, because they have a length field.
Is not exact. All supported string types derived from Ansi - AnsiString, UnicodeString etc - are zero terminated. Only exception are shortstrings (where Delphi also terminates these documented with a zero)
See my example.
All of this is documented in the manuals except the zero terminator for shortstring. Which is, as you rightly noted, not necessary but that goes for all the supported Pascal string types.

Delphi does not NUL terminate ShortString variables. Take the following example:

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils;
  3.  
  4. var
  5.   s: String[5];
  6.   b: Byte = $ff;
  7.   bp: PByte;
  8.   i: LongInt;
  9. begin
  10.   s := 'Test1';
  11.   s := 'Test';
  12.   bp := PByte(@s[0]);
  13.   for i := 0 to High(s) do
  14.     Write(IntToHex(bp[i], 2), ' ');
  15.   Writeln;
  16. end.

This will print 04 54 65 73 74 31 meaning that with the change of the value from 'Test1' to 'Test' the compiler did not zero out the character at index 5.

 

TinyPortal © 2005-2018