Recent

Author Topic: ShortStrings vs long strings  (Read 3145 times)

LemonParty

  • Sr. Member
  • ****
  • Posts: 438
ShortStrings vs long strings
« on: March 10, 2026, 04:03:56 pm »
Have you experienced a significant perfomance improvement when used Pascal strings over long strings? What cases it was?
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: ShortStrings vs long strings
« Reply #1 on: March 10, 2026, 04:13:11 pm »
Have you experienced a significant perfomance improvement when used Pascal strings over long strings? What cases it was?

Keep in mind that with a value type like shortstrings, it possibly can cause performance degradations due to repeated copying because parameters are not declared with const.

I assume that for certain short strings in really heavily processed trees, shortstrings can be better, as you only getmem() once for a record or record instead of additional ones per strings. But shortstrings can also be longer than strict necessary which can impact locality.

By default I (would) keep long strings (since it has no encoding or length problems), and only replace with shortstrings after extensive testing.

Thaddy

  • Hero Member
  • *****
  • Posts: 18963
  • Glad to be alive.
Re: ShortStrings vs long strings
« Reply #2 on: March 10, 2026, 08:04:55 pm »
Shortstrings with a fixed length of the maximum field value you expect (e.g. string[10], string[100], string[255]) in properly aligned records can be dramatically faster than other string types with file io since the records containing them can be "flat" which means there is a contiguous memory layout and no further streaming mechanism is needed.
But the speed gain - which can be easily in the range of 300-400% compared to streaming records with Ansi strings, - is only achieved with fixed length declared shortstrings.
« Last Edit: March 10, 2026, 09:43:24 pm by Thaddy »
Recovered from removal of tumor in tongue following tongue reconstruction with a part from my leg.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: ShortStrings vs long strings
« Reply #3 on: March 10, 2026, 08:43:21 pm »
Have you experienced a significant perfomance improvement when used Pascal strings over long strings? What cases it was?

Not yet, but I haven't been looking for it.

As MarcoV said, you would need to avoid any copying, i.e. any passing at as a param (except constref, var, or explicit by PShortString).

One thing you would save is the ref-counting. But "normally" that would likely not make to much of a difference. And that is even though it uses an interlocked inc/dec. So it has some small impact.



There is one case though were it could be making a difference. (And I have used PChar to avoid it).

But the problem here is, that a shortstring has only 255 chars, so to get a real impact, you would have to perform the problematic operation on thousands of shortstring, compared to thousands of short longstrings.
Code: Pascal  [Select][+][-]
  1.  for a := 1 to length(s) do s[a] := uppercase(s[a]);

Because (long/ansi)strings are copy on write this code is doing a check in each iteration of the loop if s has a refcount > 1 (and would need copy).

At least I am not aware that the compiler optimizes this.

Do a uniquesting at the start, then use pchar to access s.


LemonParty

  • Sr. Member
  • ****
  • Posts: 438
Re: ShortStrings vs long strings
« Reply #4 on: March 10, 2026, 10:00:05 pm »
I wrote a simple benchmark to check how it affect performance:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2. {$Calling Register}
  3.  
  4. uses stopwatch;
  5.  
  6. const
  7.   CSize = 1024 * 1024;
  8.  
  9. type
  10.   TRecShortString = object
  11.   public
  12.     A, B: String[31];
  13.   end;
  14.  
  15.   TRecAnsiString = object
  16.   public
  17.     A, B: String;
  18.   end;
  19.  
  20. function SimpleHash(const S: String): Byte;
  21. var
  22.   i: SizeInt;
  23.   hash: Cardinal;
  24. begin
  25.   hash:= 0;
  26.  
  27.   for i:= 1 to Length(S) do
  28.     hash:= hash + ord(S[i]);
  29.  
  30.   Result:= hash mod 256;
  31. end;
  32.  
  33. function SimpleHash(constref S: ShortString): Byte;
  34. var
  35.   i: SizeInt;
  36.   hash: Cardinal;
  37. begin
  38.   hash:= 0;
  39.  
  40.   for i:= 1 to Length(S) do
  41.     hash:= hash + ord(S[i]);
  42.  
  43.   Result:= hash mod 256;
  44. end;
  45.  
  46. function GenString: String;
  47. begin
  48.   SetLength(Result, 15 + Random(15));
  49. end;
  50.  
  51. var
  52.   i, j: SizeInt;
  53.   S: String;
  54.   SSA: array of TRecShortString;
  55.   ASA: array of TRecAnsiString;
  56.   sw: TStopWatch;
  57.  
  58. begin
  59.   sw:= TStopWatch.Create;
  60.   SetLength(SSA, CSize);
  61.   SetLength(ASA, CSize);
  62.  
  63.   for i:= 0 to Pred(CSize) do begin
  64.     S:= GenString;
  65.     SSA[i].A:= S;
  66.     ASA[i].A:= S;
  67.     S:= GenString;
  68.     SSA[i].B:= S;
  69.     ASA[i].B:= S;
  70.   end;
  71.  
  72.   j:= 0;
  73.   sw.Reset; sw.Start;
  74.   for i:= 0 to Pred(CSize) do
  75.     Inc(j, SimpleHash(SSA[i].A) + SimpleHash(SSA[i].B));
  76.   sw.Stop;
  77.   Writeln('ShortString: ', sw.ElapsedTicks);
  78.  
  79.   j:= 0;
  80.   sw.Reset; sw.Start;
  81.   for i:= 0 to Pred(CSize) do
  82.     Inc(j, SimpleHash(ASA[i].A) + SimpleHash(ASA[i].B));
  83.   sw.Stop;
  84.   Writeln('Ansi string: ', sw.ElapsedTicks);
  85. end.
  86.  

Here the results of execution (Intel Core Ultra 7 258V):
Quote
ShortString: 575855
Ansi string: 587829
(2 % difference)
ShortString: 586032
Ansi string: 600617
(2.5 % difference)
ShortString: 587901
Ansi string: 595240
(1.2 % difference)
ShortString: 585499
Ansi string: 592956
(1.2 % difference)
ShortString: 591619
Ansi string: 604926
(2.14 % difference)

Performance of AnsiString is suprisingly good in this case compare to ShortString.
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: ShortStrings vs long strings
« Reply #5 on: March 10, 2026, 10:27:04 pm »
And those 2% don't exist. They are error margin.

I tested, if I change the order
- swap the order of the procedure SimpleHash
- swap the order of the 2 bench loops in main

then AnsiString has the advantage.

It may be that it depends on code alignment of the loop and procs.... That happens (and that can in some case cause way more than 2% diff).

Or maybe some date is still in the CPU cache....


But its not unexpected you only read from the strings. And the const<>constref => both have one level of pointer.

In fact the ansi string would technically have the advantage to store the length in a native size, where as the byte stored len of the shortstrig must be converted. But for todays CPU that gets optimized away inside the CPU.
« Last Edit: March 10, 2026, 10:32:07 pm by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: ShortStrings vs long strings
« Reply #6 on: March 10, 2026, 10:31:12 pm »
Your generation of the strings might take ever so slightly longer for the ansistrings.

The shortstring variant allocs mem for the arrays, and that mem includes the space for the text in the string. So one call to getmem.
Setting up the ansistring must get an extra block of memory for each string. That might be measurable.

But if you later need to resize the array, the ansistring may be faster, as less memory needs to be copied.

Khrys

  • Sr. Member
  • ****
  • Posts: 418
Re: ShortStrings vs long strings
« Reply #7 on: March 11, 2026, 07:39:21 am »
One thing you would save is the ref-counting. But "normally" that would likely not make to much of a difference. And that is even though it uses an interlocked inc/dec. So it has some small impact.

I'd wager a guess that most of the slowdowns stem from the implicit  try..finally  blocks that generate wherever ansistrings are "owned" (i.e. excluding any qualified parameters).
Just take a look at the disassembly of the following function and you'll see what I mean  (x86_64-linux,  FPC 3.2.2,  -O3):

Code: Pascal  [Select][+][-]
  1. procedure Foo();
  2. var
  3.   S: AnsiString;
  4. begin
  5.   S := 'bar';
  6. end;

Code: ASM  [Select][+][-]
  1. ; Allocate space for `setjmp` buffer
  2. lea     rsp, [rsp - 104]
  3.  
  4. ; Initialize `S` to ''
  5. mov     QWORD PTR [rsp], 0
  6.  
  7. ; Register exception handler
  8. lea     rdx, [rsp + 8]
  9. lea     rsi, [rsp + 32]
  10. mov     edi, 1
  11. call    fpc_pushexceptaddr
  12.  
  13. ; Store context
  14. mov     rdi, rax
  15. call    fpc_setjmp
  16.  
  17. ; `setjmp`/`longjmp` target; store status code in memory
  18. movsxd  rdx, eax
  19. mov     QWORD PTR [rsp + 96], rdx
  20.  
  21. ; Execute `try` body only the first time (when status code is 0)
  22. test    eax, eax
  23. jne     .FINALLY
  24.  
  25.   ; Assign 'bar' to `S` (this is the actual function body)
  26.   mov     rdi, rsp
  27.   mov     rsi, .CONST_BAR
  28.   call    fpc_ansistr_assign
  29.  
  30. .FINALLY:
  31.  
  32. ; Unregister exception handler
  33. call    fpc_popaddrstack
  34.  
  35. ; Finalize `S` (implicit `finally` block)
  36. mov     rdi, rsp
  37. call    fpc_ansistr_decr_ref
  38.  
  39. ; Propagate any exceptions that might've occurred up the call stack
  40. mov     rax, QWORD PTR [rsp + 96]
  41. test    rax, rax
  42. je      .NOEXCEPT
  43.  
  44.   call    fpc_reraise
  45.  
  46. .NOEXCEPT:
  47.  
  48. ; Deallocate locals & return
  49. lea     rsp, [rsp + 104]
  50. ret

Do note however that exceptions are implemented differently depending on the platform; e.g. on 64-bit Windows  try..finally  handlers are essentially zero-cost provided that no exception actually occurs.

Another factor is that this "managed-ness" (implicit finalization unless  {$implicitexceptions}  is turned off) is contagious. Records containing ansistrings become managed themselves, and optimization becomes even poorer:  fpc_initialize  and  fpc_finalize  each consist of a single case statement over all kinds of managed types, dispatched at runtime by passing a reference to RTTI data - even though all relevant types are known at compile-time.

LemonParty

  • Sr. Member
  • ****
  • Posts: 438
Re: ShortStrings vs long strings
« Reply #8 on: March 11, 2026, 03:01:54 pm »
Martin_fr, OK those 2% is not exist, but this is still suprising that ShortString code have the same speed as code with AnsiString. I suspected that code with ShortStrings should be faster because it has better cache locality.

Khrys, that's a lot of code just for setting string. Thank you for explanation.
Lazarus v. 4.99. FPC v. 3.3.1. Windows 11

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12296
  • Debugger - SynEdit - and more
    • wiki
Re: ShortStrings vs long strings
« Reply #9 on: March 11, 2026, 03:20:32 pm »
Martin_fr, OK those 2% is not exist, but this is still suprising that ShortString code have the same speed as code with AnsiString. I suspected that code with ShortStrings should be faster because it has better cache locality.
Well, in your example you are allocating the arrays, and then you only allocate strings. And your app has at that point not done anything to fragment the memory space.

So all the text data of all the strings is in one big block of continuous memory too. (whatever amount of that fit per cache line).
So all it needs is one extra cache line for the current fragment of the "array of internal pointers". That is just way below measurable.

Depending on the size and align requirements either type of string may have "bigger gaps". The shortstring will always carry the unused chars, as it allocates according to declaration, even if the data is shorter. but the longstring will (with most mem managers) allocate to a multiple of 16, and the allocation includes extra for the header.
Yet again, you will need extreme cases, and maybe even then not have enough to measure it.

Modern CPU do a lot of parallel stuff in their internal processing. That also leads to memory often being loaded to cache way ahead. After all once you loaded a cache line your code processes each byte (each char of the string) in that cache line (well not each, if the string has alignment padding at the end). But that gives likely a lot of time to load the next.

You could find out if you downclock your ram in your bios. You would have to process much larger amount of data though to see a difference. If you do, be sure to know how to get back. Wrong settings can make your system unstable.
(I did upgrade my RAM last year as I needed more. But the new one is also faster, and some (few) tasks improved by about 5% in speed. But only some few tasks).


Quote
Khrys, that's a lot of code just for setting string. Thank you for explanation.
As I pointed out in one of my earlier posts, when you do a lot of changes to a string, you may want to use pchar. But be sure to call uniquestring first.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12771
  • FPC developer.
Re: ShortStrings vs long strings
« Reply #10 on: March 11, 2026, 03:30:18 pm »
Martin_fr, OK those 2% is not exist, but this is still suprising that ShortString code have the same speed as code with AnsiString. I suspected that code with ShortStrings should be faster because it has better cache locality.

That is also not a hard guarantee, the shortstrings are always maximally sized, which spaces data further apart for non trivial amounts with low occupancy.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1225
Re: ShortStrings vs long strings
« Reply #11 on: March 12, 2026, 08:55:02 pm »
One thing you would save is the ref-counting. But "normally" that would likely not make to much of a difference. And that is even though it uses an interlocked inc/dec. So it has some small impact.
I'd wager a guess that most of the slowdowns stem from the implicit  try..finally  blocks that generate wherever ansistrings are "owned" (i.e. excluding any qualified parameters).
Just take a look at the disassembly of the following function and you'll see what I mean  (x86_64-linux,  FPC 3.2.2,  -O3):

Code: Pascal  [Select][+][-]
  1. procedure Foo();
  2. var
  3.   S: AnsiString;
  4. begin
  5.   S := 'bar';
  6. end;

Code: ASM  [Select][+][-]
  1. ; Allocate space for `setjmp` buffer
  2. lea     rsp, [rsp - 104]
  3.  
  4. ; Initialize `S` to ''
  5. mov     QWORD PTR [rsp], 0
  6.  
  7. ; Register exception handler
  8. lea     rdx, [rsp + 8]
  9. lea     rsi, [rsp + 32]
  10. mov     edi, 1
  11. call    fpc_pushexceptaddr
  12.  
  13. ; Store context
  14. mov     rdi, rax
  15. call    fpc_setjmp
  16.  
  17. ; `setjmp`/`longjmp` target; store status code in memory
  18. movsxd  rdx, eax
  19. mov     QWORD PTR [rsp + 96], rdx
  20.  
  21. ; Execute `try` body only the first time (when status code is 0)
  22. test    eax, eax
  23. jne     .FINALLY
  24.  
  25.   ; Assign 'bar' to `S` (this is the actual function body)
  26.   mov     rdi, rsp
  27.   mov     rsi, .CONST_BAR
  28.   call    fpc_ansistr_assign
  29.  
  30. .FINALLY:
  31.  
  32. ; Unregister exception handler
  33. call    fpc_popaddrstack
  34.  
  35. ; Finalize `S` (implicit `finally` block)
  36. mov     rdi, rsp
  37. call    fpc_ansistr_decr_ref
  38.  
  39. ; Propagate any exceptions that might've occurred up the call stack
  40. mov     rax, QWORD PTR [rsp + 96]
  41. test    rax, rax
  42. je      .NOEXCEPT
  43.  
  44.   call    fpc_reraise
  45.  
  46. .NOEXCEPT:
  47.  
  48. ; Deallocate locals & return
  49. lea     rsp, [rsp + 104]
  50. ret

Do note however that exceptions are implemented differently depending on the platform; e.g. on 64-bit Windows  try..finally  handlers are essentially zero-cost provided that no exception actually occurs.

Another factor is that this "managed-ness" (implicit finalization unless  {$implicitexceptions}  is turned off) is contagious. Records containing ansistrings become managed themselves, and optimization becomes even poorer:  fpc_initialize  and  fpc_finalize  each consist of a single case statement over all kinds of managed types, dispatched at runtime by passing a reference to RTTI data - even though all relevant types are known at compile-time.

Interesting.

 

TinyPortal © 2005-2018