Recent

Author Topic: Short strings small bug  (Read 2091 times)

Avinash

  • Full Member
  • ***
  • Posts: 117
Short strings small bug
« on: November 10, 2022, 05:25:54 am »
FPC 3.2.2 Win32

Code: Pascal  [Select][+][-]
  1. var
  2.   S: String;
  3.  
  4. begin
  5.  
  6.   S[0]:= #255;
  7.   S := S + #0;
  8.   WriteLn(Length(S));  { gives correct 255 }
  9.  
  10.   S[0]:= #255;
  11.   S[Length(S) + 1] := #0;
  12.   WriteLn(Length(S));  { gives wrong 0,  Turbo Pascal gives 255 }
  13.  
  14. end.


Code responsible for this:

Code: ASM  [Select][+][-]
  1. ; [11] S[Length(S) + 1]:= #0;
  2. %LINE 11+0
  3.                 movzx   eax,byte [U_$P$PROGRAM_$$_S]
  4.                 lea     eax,[eax+1]
  5.                 movzx   eax,al
  6.                 mov     byte [U_$P$PROGRAM_$$_S+eax*1],0
  7.  

And with {$O+} (what, by the way, does it turn on? I can't reproduce its effect by the -O1..4 switches):

Code: ASM  [Select][+][-]
  1. ; [12] S[Length(S) + 1]:= #0;
  2. %LINE 12+0
  3.                 movzx   eax,byte [U_$P$PROGRAM_$$_S]
  4.                 add     eax,1
  5.                 and     eax,255
  6.                 mov     byte [U_$P$PROGRAM_$$_S+eax],0
  7.  

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Short strings small bug
« Reply #1 on: November 10, 2022, 06:47:28 pm »
FPC 3.2.2 Win32

Code: Pascal  [Select][+][-]
  1. var
  2.   S: String;
  3.  
  4. begin
  5.  
  6.   S[0]:= #255;
  7.   S := S + #0;
  8.   WriteLn(Length(S));  { gives correct 255 }
  9.  
  10.   S[0]:= #255;
  11.   S[Length(S) + 1] := #0;
  12.   WriteLn(Length(S));  { gives wrong 0,  Turbo Pascal gives 255 }
  13.  
  14. end.

Absoultey NOT a bug.
TP gives range check error at compile time.
Only if compiled with range checking off it'll output 255.
However, you are accessing invalid memory in this case and therefore the behaviour is therefore undefined.
You're just lucky the app doesn't crash.

One can guess that fpc treats the index of a shortstring as a byte and this value will overflow: 255+1=0.
Some compiler devel can comment on that.

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Short strings small bug
« Reply #2 on: November 10, 2022, 07:06:58 pm »
A case of a "programmer" not counting from zero?
The range is #0..#255 which is 256, so if you add/concat something to #255 you have a range error or an overflow.
If you want to avoid that, use AnsiString, not shortstring.

As Bart wrote, this is absolutely not a bug from the compiler side, it is frankly programmer error.
The most important thing is  that after an overflow the compiler's behavior is undefined, hence it can be either rubbish, 0, or 255. Since it is undefined you can not rely on such things. As Bart also wrote, you should use overflow checking, at least during debugging. Wrong code is wrong code....
« Last Edit: November 10, 2022, 07:13:32 pm by Thaddy »
Specialize a type, not a var.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Short strings small bug
« Reply #3 on: November 10, 2022, 08:39:55 pm »
Is assignment to element zero of a string even legal? I thought that other than the shortstring case it had been retired in favour of SetLength.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 14204
  • Probably until I exterminate Putin.
Re: Short strings small bug
« Reply #4 on: November 10, 2022, 09:05:17 pm »
Good question. I oversaw it, but.. element zero stores the length if it is shortstring, and strings start from element 1.
Any other pascal string type also stores information below element 1, a.k.a. known as negative offset.
Specialize a type, not a var.

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Short strings small bug
« Reply #5 on: November 10, 2022, 09:13:04 pm »
Thaddy. you didn't oversee it, you overlooked it.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Short strings small bug
« Reply #6 on: November 10, 2022, 09:30:42 pm »
Good question. I oversaw it, but.. element zero stores the length if it is shortstring, and strings start from element 1.
Any other pascal string type also stores information below element 1, a.k.a. known as negative offset.

Yes, but the length was an integer, which shouldn't- according to later versions of Object Pascal- be assignment-compatible with a character (i.e. the # prefix).

I agree with Howard's comment about overwhatever, but (wearing my technical author's hat) the word which should be avoided at all costs is "oversight" which in current contexts is /massively/ ambiguous.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Short strings small bug
« Reply #7 on: November 10, 2022, 10:41:19 pm »
Is assignment to element zero of a string even legal? I thought that other than the shortstring case it had been retired in favour of SetLength.

Since this is about TP vs FPC and by default in FPC String = ShortString (unless you have it otherwise specified in fpc.cfg by setting mode or by any other way), setting S[0] is perfectly legal.

Bart
« Last Edit: November 11, 2022, 10:49:57 am by Bart »

Avinash

  • Full Member
  • ***
  • Posts: 117
Re: Short strings small bug
« Reply #8 on: November 11, 2022, 12:53:07 am »
It looks like a some service is being provided, but that service is inadequate. It must be either a sensible thing, like
Code: ASM  [Select][+][-]
  1.          movzx eax, byte ptr [S]
  2.          inc al
  3.          je @@1
  4.          mov byte ptr S[eax], 0
  5.   @@1:

or it should not exist at all, and the behavior should be as documented in Turbo Pascal Version 7.0 Language Guide, page 55:

The first character of a string variable (at index 0) contains the dynamic length of the string; that is, Length(S) is the same as Ord(S[0]). If a value is assigned to the length attribute, the compiler doesn't check whether this value is less than the declared size of the string. It's possible to index a string beyond its current dynamic length. The characters read are random and assignments beyond the current length don't affect the actual value of the string variable.

dje

  • Full Member
  • ***
  • Posts: 134
Re: Short strings small bug
« Reply #9 on: November 11, 2022, 03:14:01 am »
or it should not exist at all, and the behavior should be as documented in Turbo Pascal Version 7.0 Language Guide, page 55:

Yes, and no. Page 55 also states:
Quote
The index expressions select components in each corresponding dimensions of the array. The number of expressions can't exceed the number of index types in the array declaration.

The index type for a shortstring seems to be undefined. But, the guide also states:
Quote
...a value of PChar can be indexed with a singe index expression of type Word. The index expression specifies an offset to add to the character pointer before it's de-referenced....

So, a PChar index is of type word, and yes, FreePascal does this:
Code: Pascal  [Select][+][-]
  1.   PChar(@s)[0] := #255;
  2.   PChar(@s)[length(s) + 1] := #0;
  3.   writeln(length(s));  

Regardless, accessing outside array bounds has be undefined for decades, in all languages. It is processor and implementation dependent. eg: Many 68000 compilers only supported 16 bit indexes for speed reasons. It would be acceptable for a 6502 compiler to wrap 256 byte array accesses by using the 8 bit X/Y index registers, while the 6809 could support 16 bit indexes.

Either way, Turbo Pascal made it clear that if you wanted to access behond 256 chars, you should use a PChar, but....

Even then, its limited to 64kb indexes (word's). Another example is, HSPascal (Atari ST) used 16 bit indexes, while PurePascal used 32 bit. One is faster than the other.

 

Avinash

  • Full Member
  • ***
  • Posts: 117
Re: Short strings small bug
« Reply #10 on: November 11, 2022, 05:09:21 am »
Page 55 also states:
Quote
The index expressions select components in each corresponding dimensions of the array. The number of expressions can't exceed the number of index types in the array declaration.
It's about matching the number of indexes [I,J,K ..] to the array dimensions.

In any case, this mechanism is not friendly to common sense, looks like a bug and generates slower and bulky code (movzx instead of mov or even extra lines of code) for no reason.

Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2. var
  3.   S: String;
  4.   X: array [Byte] of Char;
  5.   I: Integer;
  6. begin
  7.   X[I] := #0;
  8.   S[I] := #0;
  9.  
  10.   S[Length(S) + 1]:= #0;
  11. end.

No point in using movzx instead of mov:
Code: ASM  [Select][+][-]
  1. ; [36] X[I] := #0;
  2. %LINE 36+0
  3.                 mov     eax,dword [U_$P$PROGRAM_$$_I]
  4.                 mov     byte [U_$P$PROGRAM_$$_X+eax*1],0
  5. ; [37] S[I] := #0;
  6. %LINE 37+0
  7.                 movzx   eax,byte [U_$P$PROGRAM_$$_I]
  8.                 mov     byte [U_$P$PROGRAM_$$_S+eax*1],0

There is no reason to add the line movzx eax,al here:
Code: ASM  [Select][+][-]
  1. ; [39] S[Length(S) + 1]:= #0;
  2. %LINE 39+0
  3.                 movzx   eax,byte [U_$P$PROGRAM_$$_S]
  4.                 lea     eax,[eax+1]
  5.                 movzx   eax,al
  6.                 mov     byte [U_$P$PROGRAM_$$_S+eax*1],0

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Short strings small bug
« Reply #11 on: November 11, 2022, 07:34:18 am »
Is assignment to element zero of a string even legal? I thought that other than the shortstring case it had been retired in favour of SetLength.

For ShortString it is legal (Length and SetLength are simply wrappers around an access to element 0 in case of ShortString).

In any case, this mechanism is not friendly to common sense, looks like a bug and generates slower and bulky code (movzx instead of mov or even extra lines of code) for no reason.

The length value of a ShortString is a Byte and FPC treats it as such when returned from Length(SomeShortString). Thus adding something will overflow if 255 is reached. TP and Delphi on the other hand seem to return the native integer size from Length(SomeShortString) nevertheless and thus adding a value will not overflow. Instead they mask it to 8-bit when assiging it to SomeShortString[0]. The Delphi documentation at least states that Length returns Integer (for FPC it's normally SizeInt except for ShortString). I'll have to try whether that upsets anything if I change the returned length from Byte to SizeInt.

Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2. var
  3.   S: String;
  4.   X: array [Byte] of Char;
  5.   I: Integer;
  6. begin
  7.   X[I] := #0;
  8.   S[I] := #0;
  9.  
  10.   S[Length(S) + 1]:= #0;
  11. end.

No point in using movzx instead of mov:
Code: ASM  [Select][+][-]
  1. ; [36] X[I] := #0;
  2. %LINE 36+0
  3.                 mov     eax,dword [U_$P$PROGRAM_$$_I]
  4.                 mov     byte [U_$P$PROGRAM_$$_X+eax*1],0
  5. ; [37] S[I] := #0;
  6. %LINE 37+0
  7.                 movzx   eax,byte [U_$P$PROGRAM_$$_I]
  8.                 mov     byte [U_$P$PROGRAM_$$_S+eax*1],0

There is no reason to add the line movzx eax,al here:
Code: ASM  [Select][+][-]
  1. ; [39] S[Length(S) + 1]:= #0;
  2. %LINE 39+0
  3.                 movzx   eax,byte [U_$P$PROGRAM_$$_S]
  4.                 lea     eax,[eax+1]
  5.                 movzx   eax,al
  6.                 mov     byte [U_$P$PROGRAM_$$_S+eax*1],0

The movzx are necessary, because FPC later on uses it to address memory by passing eax as index and FPC uses the native address width for that.

And with {$O+} (what, by the way, does it turn on? I can't reproduce its effect by the -O1..4 switches):

The {$O} switch controls level 2 optimizations.

Avinash

  • Full Member
  • ***
  • Posts: 117
Re: Short strings small bug
« Reply #12 on: November 11, 2022, 08:20:28 am »
The length value of a ShortString is a Byte and FPC treats it as such when returned from Length(SomeShortString). Thus adding something will overflow if 255 is reached.

I can't get overflow in either FPC or TP:

Code: Pascal  [Select][+][-]
  1. {$Q+}  { Overflow checking }
  2. var
  3.   B: Byte;
  4.   I: Integer;
  5.   S: String;
  6. begin
  7.   S[0]:= #255;
  8.   I := Length(S) + Byte(10);
  9.   WriteLn(I);  {gives 265 in both FPC and TP}
  10.  
  11.   B := 255;
  12.   I := B + Byte(10);
  13.   WriteLn(I);  {gives 265 in both FPC and TP}
  14. end.

Also, this does not change anything in the assembler output (and does not apply to the case of a simple assignment to S[J], where movzx also occurs):
Code: Pascal  [Select][+][-]
  1. S[Integer(Length(S)) + 1]:= #0;

The movzx are necessary
But not the line movzx eax, al

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Short strings small bug
« Reply #13 on: November 11, 2022, 09:10:47 am »
The range check is in converting the integer in s[ length(s)+1] back to a byte to fit in S[0]. Not in the calculation of the integer.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Short strings small bug
« Reply #14 on: November 11, 2022, 04:27:11 pm »
The length value of a ShortString is a Byte and FPC treats it as such when returned from Length(SomeShortString). Thus adding something will overflow if 255 is reached.

I can't get overflow in either FPC or TP:

Code: Pascal  [Select][+][-]
  1. {$Q+}  { Overflow checking }
  2. var
  3.   B: Byte;
  4.   I: Integer;
  5.   S: String;
  6. begin
  7.   S[0]:= #255;
  8.   I := Length(S) + Byte(10);
  9.   WriteLn(I);  {gives 265 in both FPC and TP}
  10.  
  11.   B := 255;
  12.   I := B + Byte(10);
  13.   WriteLn(I);  {gives 265 in both FPC and TP}
  14. end.

The calculation is done in native platform size. As marcov said, the overflow check will be done at a different location.

The movzx are necessary
But not the line movzx eax, al

Yes, it is:

Code: ASM  [Select][+][-]
  1.     ; [39] S[Length(S) + 1]:= #0;
  2.     %LINE 39+0
  3.                     ; load length Byte and zero-extend into EAX
  4.                     movzx   eax,byte [U_$P$PROGRAM_$$_S]
  5.                     ; increment by 1
  6.                     lea     eax,[eax+1]
  7.                     ; zero extend lowest Byte into EAX
  8.                     movzx   eax,al
  9.                     ; store 0 at offset EAX of the ShortString
  10.                     mov     byte [U_$P$PROGRAM_$$_S+eax*1],0

 

TinyPortal © 2005-2018