Recent

Author Topic: Strings in Free Pascal: Internal Differences, Reference Counting, and Pointers  (Read 1100 times)

sc10tech

  • Newbie
  • Posts: 4
What are the internal differences between ShortString, AnsiString, and UnicodeString in Free Pascal? Specifically, how does reference counting work for AnsiString, and in what situations should one be concerned with manual memory management when manipulating memory through pointers?

andersonscinfo

  • Full Member
  • ***
  • Posts: 156
As strings em Free Pascal têm implementações internas distintas:

**ShortString**
- Armazenamento estático com limite de 255 caracteres
- Primeiro byte indica o comprimento
- Não usa heap nem referências

```
type
  ShortString = string[255];
```

**AnsiString**
- Armazenamento dinâmico com contador de referências
- Cópias são feitas somente em escrita (copy-on-write)
- Compartilha dados até modificação

```
var
  S1, S2: AnsiString;
begin
  S1 := 'Teste';
  S2 := S1;        // Mesmo ponteiro, contador = 2
  S2[1] := 'X';    // Cópia real ocorre aqui
end;
```

**UnicodeString**
- Similar ao AnsiString mas armazena UTF-16
- Contador de referências e copy-on-write também aplicáveis

**Referência e Ponteiros**
- Evite manipular ponteiros diretamente em strings gerenciadas
- Use `PChar` ou `PWideChar` apenas para APIs externas
- Alocações manuais exigem `GetMem/FreeMem` correspondentes

```
procedure ExemploPonteiro;
var
  S: AnsiString;
  P: PChar;
begin
  S := 'Teste';
  P := PChar(S);  // Somente leitura
  // Não modifique P diretamente
end;
```

Att.

sc10tech

  • Newbie
  • Posts: 4
Ok, obrigado.

Bart

  • Hero Member
  • *****
  • Posts: 5674
    • Bart en Mariska's Webstek
@andersonscinfo: please write in English (use the appropriate subfora to communicate in other languages).

Bart

PascalDragon

  • Hero Member
  • *****
  • Posts: 6311
  • Compiler Developer
@andersonscinfo: please write in English (use the appropriate subfora to communicate in other languages).

Also the forum does not support Markdown. So please use the appropriate BBCodes from the forum like [code][/code] for code fragments.

Khrys

  • Sr. Member
  • ****
  • Posts: 390
Each  AnsiString  instance (that isn't  Nil,  of course) points to a null-terminated string immediately preceded by the structure defined in  rtl/inc/astrings.inc:

Code: Pascal  [Select][+][-]
  1. Type
  2.   PAnsiRec = ^TAnsiRec;
  3.   TAnsiRec = Record
  4.     CodePage    : TSystemCodePage;
  5.     ElementSize : Word;
  6. {$ifdef CPU64} 
  7.     { align fields  }
  8.         Dummy       : DWord;
  9. {$endif CPU64}
  10.     Ref         : SizeInt;
  11.     Len         : SizeInt;
  12.   end;

Empty (zero-length) strings are represented by  Nil.  This could cause problems when interacting with C code, because an empty string in C is not represented as a null pointer, but as a pointer to a null byte. This is why  PChar(SomeAnsiString)  doesn't just extract the pointer value, but actually inserts code just for this case (essentially like this):

Code: Pascal  [Select][+][-]
  1. function AnsiStringToPChar(const Str: AnsiString): PChar;
  2. begin
  3.   Result := Pointer(Str);
  4.   if Result = Nil then Result := #0; // FPC_EMPTYCHAR
  5. end;
  6.  

Since  AnsiString  is a reference-counted type and as such needs to be managed by the compiler, there is some overhead whenever reference-count manipulation is required. More specifically, in functions where an  AnsiString  is either defined or passed in as a plain parameter (i.e. not  const, constref, var, out), an implicit  try/finally  block is generated to handle the reference-count decrement at the end (unless  {$implicitexceptions}  is disabled, that is).

[...] in what situations should one be concerned with manual memory management when manipulating memory through pointers?

In situations that involve manually overwriting the pointer value:

Code: Pascal  [Select][+][-]
  1. procedure Leak(Str: AnsiString);
  2. begin
  3.   Pointer(Str) := Nil;
  4. end;

The above function first increments  Str's  reference count due to the reasons outlined above, then forgets about it when the pointer is overwritten, only to finally attempt to decrement the reference count of  Nil  (which has no effect), leaving the original string with an inflated reference count and thus leaking it.



Edit: Fixed two errors in a whopping four lines of example code  :-[
« Last Edit: January 21, 2026, 06:52:48 am by Khrys »

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
The latter example is a very good example why you should not do that in principle.
If you need to manipulate a string like that, first make sure to call UniqueString(), which makes a copy with a refcount of 1. Any operation on that string will not propagate to the original string.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 379
  • I use FPC [main] 💪🐯💪
Code: Pascal  [Select][+][-]
  1. program app;
  2.  
  3. var
  4.   ssv: shortstring;
  5.   asv: ansistring;
  6.   usv: unicodestring;
  7.   wsv: widestring;
  8.   pchv: PAnsiChar;
  9.   pwchv: PWideChar;
  10.   arrch: array [0..1] of AnsiChar;
  11.   arrwch: array [0..1] of WideChar;
  12.  
  13. begin
  14.   WriteLn(ssv + asv + usv + wsv + pchv + pwchv + arrch + arrwch);
  15. end.

I think I forgot something else ::)
I may seem rude - please don't take it personally

Thaddy

  • Hero Member
  • *****
  • Posts: 18729
  • To Europe: simply sell USA bonds: dollar collapses
Nah, you would not forget RawByteString? No, that is an alias anyway  :D

You forgot string! Right?  and PChar? O:-)

But that is also an alias..... :P

The only string types that are reference counted are:
- AnsiString
- UnicodeString
and
- RawByteString (because it is a straight alias)

string itself depends on {$H-/+}
Code: Pascal  [Select][+][-]
  1. {$ifdef fpc}{$mode objfpc}{$endif}
  2. program app;
  3. var
  4.   sme: string = 'test';  // depends on mode $H-/+ {$H+} is refcounted, as is unicodestring
  5.   smp: PChar = 'test'; // not refcounted, depends on mode, though. delphiunicode is PWideChar.
  6.   ssv: shortstring = 'test'; // not refcounted
  7.   asv: ansistring = 'test';  // refcounted
  8.   usv: unicodestring= 'test';// refcounted
  9.   wsv: widestring= 'test';   // not refcounted
  10.   pchv: PAnsiChar= 'test';   // not refcounted
  11.   pwchv: PWideChar= 'test';  // not refcounted
  12.   arrch: array [0..3] of AnsiChar = 'test'; // not refcounted
  13.   arrwch: array [0..3] of WideChar = 'test';// not refcounted
  14.   rbs: RawByteString = 'test'; // refcounted = type ansi string.
  15. begin
  16.   WriteLn(ssv + asv + usv + wsv + pchv + pwchv + arrch + arrwch+rbs);
  17. end.
The compiler issues appropriate conversion warnings...

Also forgot (In Delphi mode, use square brackets):
Code: Pascal  [Select][+][-]
  1.   arrch2: array of AnsiChar = ('t','e','s','t'); // refcounted
  2.   arrwch2: array of WideChar =('t','e','s','t'); // refcounted

But that is a bit over the top..
« Last Edit: January 20, 2026, 10:53:21 am by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

BeniBela

  • Hero Member
  • *****
  • Posts: 955
    • homepage

Code: Pascal  [Select][+][-]
  1. procedure Leak(Str: AnsiString);
  2. begin
  3.   Pointer(S) = Nil;
  4. end;

The above function first increments  Str's  reference count due to the reasons outlined above, then forgets about it when the pointer is overwritten, only to finally attempt to decrement the reference count of  Nil  (which has no effect), leaving the original string with an inflated reference count and thus leaking it.

if that could leak, it is false

PascalDragon

  • Hero Member
  • *****
  • Posts: 6311
  • Compiler Developer

Code: Pascal  [Select][+][-]
  1. procedure Leak(Str: AnsiString);
  2. begin
  3.   Pointer(S) = Nil;
  4. end;

The above function first increments  Str's  reference count due to the reasons outlined above, then forgets about it when the pointer is overwritten, only to finally attempt to decrement the reference count of  Nil  (which has no effect), leaving the original string with an inflated reference count and thus leaking it.

if that could leak, it is false

Assuming it's fixed for correct compilation (namely either renaming the parameter to S or the variable inside the cast to Str) and using := instead of = for the assignment then this will indeed leak, because the decrement of the reference count at the end of the routine will encounter a Nil variable and thus do nothing and thus the increment which was done upon entry of the function will not be undone.

Just take this example:

Code: Pascal  [Select][+][-]
  1. program tstrleak;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. procedure Test(aStr: String);
  6. begin
  7.   Pointer(aStr) := Nil;
  8. end;
  9.  
  10. var
  11.   s: String;
  12. begin
  13.   { to enforce a allocation }
  14.   s := 'Hello';
  15.   s := s + ' World';
  16.   Test(s);
  17. end.

Compile this with -gh and you'll see an unreleased block, but not if the assignment to aStr is removed.

 

TinyPortal © 2005-2018