Recent

Author Topic: How does the string handler Add a #0 at the end without adding length?  (Read 4136 times)

jamie

  • Hero Member
  • *****
  • Posts: 6996
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #15 on: March 22, 2020, 01:10:22 pm »
If you are not willing to fix that it should at least throw an error when it detect this situation..

and if you want to go that way, who ever heard of allowing #0 in a string and call it valid  but yet you can't use a valid Char operation and get what you would expect..

 If you don't think this should be fix the code should FAULT at runtime! when ever a case like this pops up.

 This operation gives no indication that is isn't doing what's to be expected.

 So please, ether we fix this or modify the code so it won't be allowed, at lease this way some other bugs in code can be found that are sitting there dormant and pop out unexpected times with no recourse of where to look!

The only true wisdom is knowing you know nothing

PascalDragon

  • Hero Member
  • *****
  • Posts: 6039
  • Compiler Developer
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #16 on: March 22, 2020, 01:18:24 pm »
I am sorry you don't get it..

POS for a pchar has no overload so what it is doing it first recreating an Ansistring using the NULL (#0) as the end of line marker instead..

 It should not be doing that for Pchar(S) etc..

 It should be doing what StrLen does.. and That is return the position of the Null..

You are wrong. The purpose of StrLen is to determine the length of the passed PChar. That just happens to coincide with the index of the terminating NUL-character.
Your PChar is indeed converted to an AnsiString due to there not being an overload for PChar. And this means that the resulting AnsiString will contain StrLen characters which is to mean the PChar without the terminating NUL-character. Thus Pos will never find that terminating NUL, because that NUL is never inside the string.

If you're dealing with PChars then please use the functions for PChars (namely StrPos in this case).

Allow me to add a failing example that contradicts some other comments about #0 not being in the string is why this is..

Code: Pascal  [Select][+][-]
  1. Var
  2. S:String;
  3. begin
  4.  S := 'Test'+#0;
  5.  Caption := Pos(#0, Pchar(S));  //Fails;
  6.  

However this does work ..

Caption := POS(#0, S);

That doesn't contradict anything that was said. The layout of the string S is Test#0#0. The first NUL is part of the string (Length(S) will return 5), the second NUL is not part of the string as that is the terminator. Now you cast that string to PChar, that means that the PChar will essentially be Test#0 with StrLen(S) returning 4 and that remaining NUL-character being the terminator. Now if you pass that to Pos that will - as mentioned above - be converted to an AnsiString, but this time with the layout Test#0 where Length will be 4 and the NUL-character will not be part of the string.

If you're mixing Pascal and NUL-terminated you need to know what you're doing! This is all by design.

Bart

  • Hero Member
  • *****
  • Posts: 5575
    • Bart en Mariska's Webstek
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #17 on: March 22, 2020, 01:22:36 pm »
Why do you think that you are the only person that is right in this discussion and everybody else including the fpc devels and Borland and Embarcadero are wrong?

As explained before a #0 is always the terminator for a PChar.
Since Pos() will only search to the end of the PChar not including the terminator, since this is not part of the actual data it can only ever return 0 as result, as it cannot ever find a #0 inside a PChar.

Pascal strings were designed to be able to have any char inside it, including #0.
The has been exploited in may programs to store extra data inside a string, knowing that the OS would never display them with any cooked (as opposed to raw) IO procedure (writeln etc.).

I.e. one of my old programs used it as a start of a signature in the first record of a datafile.

Bart

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #18 on: March 22, 2020, 01:56:54 pm »
using POS(#0, Pchar(Astring)) always returns 0. It fails to report the position of the trailing #0

That's not a bug, it's a feature. Pos() works on Pascal-style strings and the result can't exceed the length, Length('A') is 1 irrespective of whether it's got a trailing \0 internally for convenience when treating it as a C-style character array.

I think what you've not allowed for is that there are two distinct lengths involved:

1) The number of valid characters stored in the string (or entries in a list etc.).

2) The amount of storage allocated by the memory manager, which is likely to be some multiple of the physical word/alignment size.

The requirement for \0 beyond the end of a string merely means that the difference between (1) and (2) is never zero, whatever you might like to believe >:-)

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

jamie

  • Hero Member
  • *****
  • Posts: 6996
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #19 on: March 22, 2020, 02:20:43 pm »
Say as you will .. I am not trying to be adamant about it , its just that is how I've been brought up to code.. in my mind this is ridiculous.

 Please force the function to FAULT!!!!!!!!!!! under these conditions... I would wager it would uncover lots of bugs.

In the mean time I have created an overload function which works as it should, I'll put that in my ExtHelper unit I include now in just about everything I do..

 I tried to inline it because its such a short function but for what ever reason the compile wants to call it as a non inline.

 So I resorted to a Assembler call using a string search which works fine, except it won't be cross platform. Work in that later.. That in itself was a task also...

Code: Pascal  [Select][+][-]
  1. Function POS(aChar :Char; APChar:Pchar):Integer; overload; inline;
  2. Begin
  3.   Result:=0;
  4.   While (aPChar<>Nil)and(aPChar[Result]<>aChar)do Inc(Result);
  5. end;  
  6.  
That won't inline, I did ensure the {$INLINE ON} is in there, too.
but in anycase the Assembler one seems to work..

 Thanks to all and please don't let me upset the PASCAL community.  Apparently it would be worst than a nuclear explosion.
                                                                 
The only true wisdom is knowing you know nothing

440bx

  • Hero Member
  • *****
  • Posts: 5583
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #20 on: March 22, 2020, 02:34:12 pm »
using POS(#0, Pchar(Astring)) always returns 0. It fails to report the position of the trailing #0

Pos(#0, Pchar(S)) always returns 0.

That is a bug!
It isn't a bug.  You'll find that the equivalent C function (strstr) behaves exactly the same (returns a pointer to the beginning of the null terminated array, not a pointer to the null.)

The reason it isn't a bug is because the null (#0) is not considered a character.  It is a sentinel that marks the end of the character sequence.  Functions that are designed to return character positions are not going to return the position of the non-character sentinel.

you can try this to convince yourself
Code: Pascal  [Select][+][-]
  1.  
  2. const
  3.   ntdll = 'ntdll.dll';
  4.  
  5. function StrPos  (SearchIn     : pchar;
  6.                   SearchFor    : pchar)
  7.        : pchar;   cdecl; external ntdll name 'strstr';     { !! renamed       }
  8.  
  9. var
  10.   s : packed array[0..2] of char = ('a', 'b', 'c');
  11.   p : pchar;
  12.  
  13.   i : integer;
  14.   n : integer;
  15.  
  16.  
  17. BEGIN
  18.   p := StrPos(s, #0);  // use NTDLL's implementation of StrPos (strstr)
  19.  
  20.   n := 0;
  21.   for i := 0 to high(s) + 1 do
  22.   begin
  23.     if s[i] = #0 then break;
  24.     inc(n);
  25.   end;
  26.  
On exit StrPos will have p point to the first character of s, not the fourth one.  That is the same as 0 (zero) if instead of returning a pointer it returned an index as the Pascal function does.

if you want the position of the terminator/sentinel then you can code your own loop, like the "for" in the above code or, simpler, a call to lstrlen will return the index because when it says zero, it means the zeroth character.  For the array above, it would return 3, and since the array is zero based, that is also the index of the null terminator, i.e s[3].

There is no bug but, it obviously bugs you. ;)

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v4.0rc3) on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12318
  • FPC developer.
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #21 on: March 22, 2020, 02:50:51 pm »
Allow me to add a failing example that contradicts some other comments about #0 not being in the string is why this is..

In Pascal #0's are allowed to be in the string. So that #0 of yours is included in the string and after it comes the terminating #0, which is not part of the string.

Code: Pascal  [Select][+][-]
  1.  S := 'Test'+#0;
  2.  

.... will basically be Test#0#0 with length 5

Quote
Caption := POS(#0, S);

Will search data at s for #0 with a maximum of 5 characters. (a strnpos() in C terms)

Yes. And now do I posex starting on the position after the found #0 to find the terminating one. That won't work, because that one is outside the string, so it will simply return false because there are no more chars in the string to search.
« Last Edit: March 22, 2020, 02:54:29 pm by marcov »

PascalDragon

  • Hero Member
  • *****
  • Posts: 6039
  • Compiler Developer
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #22 on: March 22, 2020, 03:01:47 pm »
As explained before a #0 is always the terminator for a PChar.
Since Pos() will only search to the end of the PChar not including the terminator, since this is not part of the actual data it can only ever return 0 as result, as it cannot ever find a #0 inside a PChar.

Small clarification: Pos does not search inside the PChar, but inside the AnsiString converted from the PChar, which is a small, but important difference.

using POS(#0, Pchar(Astring)) always returns 0. It fails to report the position of the trailing #0

That's not a bug, it's a feature. Pos() works on Pascal-style strings and the result can't exceed the length, Length('A') is 1 irrespective of whether it's got a trailing \0 internally for convenience when treating it as a C-style character array.

I think what you've not allowed for is that there are two distinct lengths involved:

1) The number of valid characters stored in the string (or entries in a list etc.).

2) The amount of storage allocated by the memory manager, which is likely to be some multiple of the physical word/alignment size.

The requirement for \0 beyond the end of a string merely means that the difference between (1) and (2) is never zero, whatever you might like to believe >:-)

MarkMLl

In fact it's even three sizes: your 1) and 2) plus as a 3) the size of the string plus header (containing the length, the code page and the reference count) and terminating NUL-character.

Please force the function to FAULT!!!!!!!!!!! under these conditions... I would wager it would uncover lots of bugs.                                                   

It can not and in fact must not fault, because it is behaving exactly as designed!

jamie

  • Hero Member
  • *****
  • Posts: 6996
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #23 on: March 22, 2020, 03:18:39 pm »
Without getting ROCKS thrown at me, then why can't you FORCE a RUNTIME fault when a #0 is in the search criteria  for PCHAR types ? At lease this would give the coder some what not to do instead of a silent fail.
The only true wisdom is knowing you know nothing

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12318
  • FPC developer.
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #24 on: March 22, 2020, 03:24:01 pm »
Without getting ROCKS thrown at me, then why can't you FORCE a RUNTIME fault when a #0 is in the search criteria  for PCHAR types ? At lease this would give the coder some what not to do instead of a silent fail.

The need to typecast should be the warning. The typecast says you know what your are doing.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6039
  • Compiler Developer
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #25 on: March 22, 2020, 03:47:28 pm »
Without getting ROCKS thrown at me, then why can't you FORCE a RUNTIME fault when a #0 is in the search criteria  for PCHAR types ? At lease this would give the coder some what not to do instead of a silent fail.

What you fail to grasp is that Pos here never sees a PChar. It only sees an AnsiString, because the PChar argument is implicitely cast to an AnsiString before being handed to Pos. Thus there can not be any handling of that.

Without getting ROCKS thrown at me, then why can't you FORCE a RUNTIME fault when a #0 is in the search criteria  for PCHAR types ? At lease this would give the coder some what not to do instead of a silent fail.

The need to typecast should be the warning. The typecast says you know what your are doing.

Well, to be fair, jamie casts to a PChar, probably assuming there's a PChar-compatible overload and the compiler instead converts it to a AnsiString.

Then again all this is by design. The terminating NUL-character is not considered part of the string, thus even with a PChar-turned-AnsiString this behavior is correct.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: How does the string handler Add a #0 at the end without adding length?
« Reply #26 on: March 22, 2020, 06:33:35 pm »
In fact it's even three sizes: your 1) and 2) plus as a 3) the size of the string plus header (containing the length, the code page and the reference count) and terminating NUL-character.

I was assuming that fixed overheads were included in the (2) - (1) difference.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018