Lazarus

Programming => General => Topic started by: LemonParty on April 15, 2025, 06:23:14 pm

Title: Unneeded copy of string
Post by: LemonParty on April 15, 2025, 06:23:14 pm
Hello.

I have this simple code:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. procedure P1(constref Ch: AnsiChar);
  4. begin
  5.   Writeln(Ch);
  6. end;
  7.  
  8. var
  9.   S: AnsiString = '12345';
  10.  
  11. begin
  12.   P1(S[1]);
  13. end.
  14.  
When I disassebled this code I found this:
Code: ASM  [Select][+][-]
  1. # [32] P1(S[1]);
  2.   leaq  TC_$P$PROGRAM_$$_S(%rip),%rcx
  3.   call  fpc_ansistr_unique
  4.   movq  %rax,%rcx
  5.   call  P$PROGRAM_$$_P1$CHAR
  6.  
The question is why compiler make copy of a string?
Title: Re: Unneeded copy of string
Post by: PascalDragon on April 15, 2025, 11:06:13 pm
The question is why compiler make copy of a string?

constref enforces the passing of Ch as a reference aka a Pointer. To ensure that another potential reference to the string doesn't interfere with that passed in reference the compiler ensures a unique copy of the string. For a primitive argument type const is better. In general you should prefer to use const because the compiler will pick the most efficient way to pass the parameter - unless you really need a reference to the passed in value.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 16, 2025, 10:56:16 am
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. procedure P1(constref Ch: AnsiChar);
  4. begin
  5.   Writeln('P1 Char');
  6. end;
  7.  
  8. procedure P1(constref Ch: WideChar);
  9. begin
  10.   Writeln('P1 Wide');
  11. end;
  12.  
  13. procedure P2(P: PAnsiChar);
  14. begin
  15.   Writeln('P2 Char');
  16. end;
  17.  
  18. procedure P2(P: PWideChar);
  19. begin
  20.   Writeln('P2 Wide');
  21. end;
  22.  
  23. var
  24.   S: AnsiString = '12345';
  25.   W: WideString = '12345';
  26.  
  27. begin
  28.   P1(S[1]);
  29.   P1(W[1]);
  30.   P2(@S[1]);
  31.   P2(@W[1]);
  32.   Writeln('1');
  33. end.
  34.  
This code demonstrate a problem when use something else than constref. It would be nice if compiler have an option to turn off such hidden copying.
Title: Re: Unneeded copy of string
Post by: Thaddy on April 16, 2025, 11:14:29 am
well, then make it const as PascalDragon suggested.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 16, 2025, 11:22:45 am
I need a reference.
Title: Re: Unneeded copy of string
Post by: Thaddy on April 16, 2025, 11:45:56 am
PascalDragon explained that in that case you can't avoid a copy.
That is related to the type being a managed type.
Title: Re: Unneeded copy of string
Post by: Khrys on April 16, 2025, 11:46:28 am
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. procedure P1(constref Ch: AnsiChar);
  4. begin
  5.   Writeln('P1 Char');
  6. end;
  7.  
  8. procedure P1(constref Ch: WideChar);
  9. begin
  10.   Writeln('P1 Wide');
  11. end;
  12.  
  13. procedure P2(P: PAnsiChar);
  14. begin
  15.   Writeln('P2 Char');
  16. end;
  17.  
  18. procedure P2(P: PWideChar);
  19. begin
  20.   Writeln('P2 Wide');
  21. end;
  22.  
  23. var
  24.   S: AnsiString = '12345';
  25.   W: WideString = '12345';
  26.  
  27. begin
  28.   P1(S[1]);
  29.   P1(W[1]);
  30.   P2(@S[1]);
  31.   P2(@W[1]);
  32.   Writeln('1');
  33. end.
  34.  
This code demonstrate a problem when use something else than constref. It would be nice if compiler have an option to turn off such hidden copying.

This can be solved by using  {$TYPEDADDRESS ON}  aka  {$T+}
Title: Re: Unneeded copy of string
Post by: Thaddy on April 16, 2025, 11:53:21 am
Did you check the assembler output?
Title: Re: Unneeded copy of string
Post by: Khrys on April 16, 2025, 01:15:25 pm
Did you check the assembler output?

I assume the problem @LemonParty was referring to is the compiler selecting the "wrong" overload in line 31 (PWideChar  is supplied, but under  {$T-}  the compiler chooses the  PAnsiChar  version). It's not about string copying.

The output should be  P1 Char, P1 Wide, P2 Char, P2 Wide  but without  {$T+}  it's actually  P1 Char, P1 Wide, P2 Char, P2 Char.
With constref the @ operator could be avoided, preventing loss of type information in that case and allowing the compiler to select the correct overload.

@LemonParty is this the actual problem you were trying to solve? I've answered a similiar question about overloads (https://forum.lazarus.freepascal.org/index.php/topic,70363.msg548352.html#msg548352) before (that you posted).
Title: Re: Unneeded copy of string
Post by: Thaddy on April 16, 2025, 01:48:28 pm
In this case we need the assembler output.
Will check when I am home.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 16, 2025, 03:19:50 pm
Quote
This can be solved by using  {$TYPEDADDRESS ON}  aka  {$T+}
Is this going to work when I place procedures in separate unit or whole project should be compiled with this switch?
Title: Re: Unneeded copy of string
Post by: beria2 on April 16, 2025, 03:33:11 pm
I need a reference.
it is the link that is needed - use only the Pointer type. I always do that. And you always know exactly what you have.
Title: Re: Unneeded copy of string
Post by: ASerge on April 16, 2025, 04:03:38 pm
Quote
This can be solved by using  {$TYPEDADDRESS ON}  aka  {$T+}
Is this going to work when I place procedures in separate unit or whole project should be compiled with this switch?
A separate unit is enough.
Interestingly, the discussion turned to another issue, not the one stated in the topic.
Title: Re: Unneeded copy of string
Post by: Khrys on April 16, 2025, 04:24:52 pm
Quote
This can be solved by using  {$TYPEDADDRESS ON}  aka  {$T+}
Is this going to work when I place procedures in separate unit or whole project should be compiled with this switch?

It's a local directive (https://www.freepascal.org/docs-html/current/prog/progse2.html#x7-60001.2), meaning that it can be toggled at the token level.
You could even enable it just for a single part of an expression if you really wanted to:

Code: Pascal  [Select][+][-]
  1. P2({$push}{$T+} @W[1] {$pop});

Interestingly, the discussion turned to another issue, not the one stated in the topic.

Turns out it was an XY problem (https://en.wikipedia.org/wiki/XY_problem) after all!   :)
Title: Re: Unneeded copy of string
Post by: LemonParty on April 16, 2025, 05:25:49 pm
I have built a test program
u1.pas:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. {$T+}
  4.  
  5. unit u1;
  6.  
  7. interface
  8.  
  9. procedure P2(P: PAnsiChar);
  10. procedure P2(P: PWideChar);
  11.  
  12. implementation
  13.  
  14. procedure P2(P: PAnsiChar);
  15. begin
  16.   Writeln('P2 Char');
  17. end;
  18.  
  19. procedure P2(P: PWideChar);
  20. begin
  21.   Writeln('P2 Wide');
  22. end;
  23.  
  24. end.

prog.pas:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2.  
  3. uses u1;
  4.  
  5. var
  6.   S: AnsiString = '12345';
  7.   W: WideString = '12345';
  8.  
  9. begin
  10.   {$Push}{$T+}
  11.   P2(@S[1]);
  12.   P2(@W[1]);
  13.   {$Pop}
  14.   {$Push}{$T-}
  15.   P2(@S[1]);
  16.   P2(@W[1]);
  17.   {$Pop}
  18. end.
Output is:
Quote
P2 Char
P2 Wide
P2 Char
P2 Char
This mean all project should be built with {T+} option. This also mean we can't use overloaded functions in units that rely on strings.
I think behavior of compiler around constref should be changed. As I suppouse idea of constref is that we reference something, but not change it. So compiler should not do copy of a string on constref.
Title: Re: Unneeded copy of string
Post by: ASerge on April 17, 2025, 03:55:32 am
This mean all project should be built with {T+} option. This also mean we can't use overloaded functions in units that rely on strings.
I think behavior of compiler around constref should be changed. As I suppouse idea of constref is that we reference something, but not change it. So compiler should not do copy of a string on constref.
You've mixed two different things.
1. The issue from your example, when the compiler does not warn that there is a conflict when choosing from two overloaded functions, but chooses the first one that comes to hand. According to the documentation, the @ operator for a variable has the Pointer type in the {$T-} state, which means both functions are suitable.
2. Using the UniqueString procedure before calling a function that uses part of a string. This is not a issue. If you want to avoid unnecessary calls, cast it to the PChar type:
Code: Pascal  [Select][+][-]
  1. P1(PChar(Pointer(S))[0]);
Code: ASM  [Select][+][-]
  1. # [31] P1(PChar(Pointer(S))[0]);
  2.         movq    TC_$P$PROGRAM_$$_S(%rip),%rcx
  3.         call    P$PROGRAM_$$_P1$CHAR
  4.  
In both cases, constref has nothing to do with it.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 17, 2025, 08:13:11 am
Code: Pascal  [Select][+][-]
  1. P1(PChar(Pointer(S))[0]);
This code is ugly.
Can anyone show an example when copying a string on constref is a necessery move?
Title: Re: Unneeded copy of string
Post by: Khrys on April 17, 2025, 09:17:50 am
Putting aside the discussion whether  constref  should copy the underlying string, what exactly are you trying to do?

Are you writing functions that utilize character pointers as C strings? What are these functions supposed to do?

From what I've gathered, you want overloads for both ansi and wide strings, but without either  {$T+}  or  constref  the compiler can't choose the correct overload, so you insisted on abusing  constref  just to help the compiler discern types, while the actual point of  constref  is to guarantee immutable pass-by-reference.

Coming back to the  constref  discussion, I think it's reasonable for the compiler to ensure that references are valid. For this purpose FPC has  varout  and  constref  while C++ has its own kind of non-nullable references (e.g.  const char&). If you don't need such guarantees, just use a plain pointer instead.
Title: Re: Unneeded copy of string
Post by: Martin_fr on April 17, 2025, 09:58:57 am
Can anyone show an example when copying a string on constref is a necessery move?

It all depends on the level of optimization, and other settings. Maybe even compiler version.
To demonstrate the effect of the missing "uniquestring", I used the PAnsiChar version.

The below passes a pointer to 'a', but prints '5' (which is something random that went in place of the released string).

If you comment the first line, and uncomment the 2nd, then it prints 's', the first char of the 2nd string.

Code: Pascal  [Select][+][-]
  1. Program foo; {$mode objfpc}{$H+}
  2. uses SysUtils;
  3.  
  4. var
  5.   S, S2: AnsiString;
  6.  
  7. //procedure P1(constref Ch: AnsiChar);
  8. procedure P1( Ch: PAnsiChar);
  9. begin
  10.   s := '';  s2 := 'something else' + IntToStr(random(9)); // avoid compiletime const eval
  11.   //s := 'something else' + IntToStr(random(9)); // avoid compiletime const eval
  12.   Writeln(Ch^);
  13. end;
  14.  
  15. begin
  16.   s := 'abcdef' + inttostr(random(9)); // avoid compiletime const eval
  17.   P1(@S[1]);
  18.  
  19.   readln;
  20. end.
  21.  



In real life, I have seen stuff like this when using "const s: ansistring"
- because that is a much more common construct
- because that does not do the uniquesting (in fact it is used because it does not)

And last time I saw it, the string was a field in an object. It wasn't directly modified by the called procedure, but that procedure had callbacks (events that it called, and about 10 calls deep into the stack on the callback was some code that modified the field.



@PascalDragon: I only partly agree with your statement.

1) UniqueString is overkill. (for constref)

It is not only a reference. It is also const.

It does only need an Increment-refcount. If anyone else writes to it, and the refcount is greater 1 then the other will make a copy of their own.
Sure you can bypass that with pchar magic. But then you intentionally break protection.

The general idea is, that having any code that holds a ref to a string, can trust it. If the caller in this case holds that ref, then the callee is fine by those means.

( That differs for a "var c: ansichar" param (or "out"), because then "c" itself can be changed, and at that point it must act on a uniquely referenced copy of the string)

2) It is inconsistent....
While it is nice that there is safety first...

constref is a form of const.

const does not add that form of protection.

In fact it has been stated countless times, that const is a contract where the user (programmer) tells the compiler, that the variable will not change.
And "will not change" includes, that it will not be changed by any code while the called code is running (has not returned to its caller).

With const too, as I stated other code may change (breaking the contract) a string that was passed without protection. And that is by design.

Why is that design not applied for constref?
I may have missed something, but I though constref is the same optimization as const? Except for adding a ref?



EDIT /APPEND

And then, if the string is protected because "constref" takes a pointer (reference) to a char in the string, then why is it not protected if
Code: Pascal  [Select][+][-]
  1. @s[1]
takes the same pointer? (only without read-only protection...)
Title: Re: Unneeded copy of string
Post by: LemonParty on April 17, 2025, 10:24:05 am
Quote
Putting aside the discussion whether  constref  should copy the underlying string, what exactly are you trying to do?

I am writing a library. And I want to have overload of a function that can handle both AnsiChar and WideChar buffers. They look like this:
Code: Pascal  [Select][+][-]
  1. function Pos(constref Buf: AnsiChar; C: AnsiChar; Range: SizeUInt): SizeUInt;
  2. function Pos(constref Buf: WideChar; C: WideChar; Range: SizeUInt): SizeUInt;
  3.  
Buf in this case is not a single character, but a pointer to the buffer. This works fine, until we get to strings.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 17, 2025, 10:35:53 am
Quote
And then, if the string is protected because "constref" takes a pointer (reference) to a char in the string, then why is it not protected if
This express my point very well. It is illogical that constref protect when taking a pointer, but taking a pointer not protect.
Title: Re: Unneeded copy of string
Post by: Thaddy on April 17, 2025, 12:12:08 pm
It all depends on the level of optimization, and other settings. Maybe even compiler version.
Optimization should never affect result.
Title: Re: Unneeded copy of string
Post by: Martin_fr on April 17, 2025, 12:45:48 pm
It all depends on the level of optimization, and other settings. Maybe even compiler version.
Optimization should never affect result.

Yes, and actually in the example that is obeyed. Optimization does not affect the result. The result of the example I gave is undefined. That is according to the documentation.

Optimization does not affect the undefined-ness. It only affect how it can be observed / how it manifests.

But that is ok. Undefined is undefined. In fact, it would not be undefined, if it would always manifest in one and the same way.
Title: Re: Unneeded copy of string
Post by: LemonParty on April 17, 2025, 06:11:21 pm
Do trunk have this problem?
Should I report this to bugtracker?
Title: Re: Unneeded copy of string
Post by: Martin_fr on April 17, 2025, 06:50:40 pm
Do trunk have this problem?
Should I report this to bugtracker?

For your original snippet of code, todays fpc 3.3.1 at eba0624535cc504fcaf367055cd3adeab56097a4 generates the uniquestring call.

As for "problem", it may or may not be. PascalDragon is part of the FPC team. They have to decide what the desired behaviour is.  (I am part of the Lazarus team, I just communicated my personal thoughts on the topic).
Title: Re: Unneeded copy of string
Post by: LemonParty on April 17, 2025, 09:23:09 pm
I found temporary decision. If call procedure this way
Code: Pascal  [Select][+][-]
  1. P(PChar( @S[1] )^)
compiler not put copying.
Title: Re: Unneeded copy of string
Post by: Martin_fr on April 17, 2025, 09:28:06 pm
You can have that a bit shorter
Code: Pascal  [Select][+][-]
  1. P1(pchar(S)[0]);

Mind the index changes to 0-based.

And you can shorten it with "type p=pchar;" / though I wouldn't.
Title: Re: Unneeded copy of string
Post by: nanobit on April 17, 2025, 10:20:34 pm
This mean all project should be built with {T+} option.

{$T+} is an old invention, and has some caveat in modern Pascal (since pointermath):
Most programmers are accustomed to think that "@" means untyped ({$T-} default).
{$T+} allows type-checking (matching) in pointer-assignments,
but also changes "@" behavior (to typed pointer) in pointermath (generated code).
Therefore I use {$T-} only.
Title: Re: Unneeded copy of string
Post by: ASerge on April 17, 2025, 10:47:33 pm
Code: Pascal  [Select][+][-]
  1. P1(pchar(S)[0]);
The same as I indicated above, only my version is more optimal, but @LemonParty rejected it.
TinyPortal © 2005-2018