Recent

Author Topic: Small speedup for FExpand function  (Read 1417 times)

Bart

  • Hero Member
  • *****
  • Posts: 4030
    • Bart en Mariska's Webstek
Re: Small speedup for FExpand function
« Reply #15 on: October 02, 2020, 02:22:06 pm »
How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

Bart

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 6708
  • Debugger - SynEdit - and more
    • wiki
Re: Small speedup for FExpand function
« Reply #16 on: October 02, 2020, 02:31:57 pm »
How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

I don't really know. But I dont think code like my example is really needed. I only meant it as proof of concept.

I can see it getting used heavily, if file locations are compared (if you have a list of pathes).
But: pathes are usually short. I do not even know, how long a path would be acceptable by most OS.

On a string of 1000 chars the speed diff will probably be negligible.



PascalDragon

  • Hero Member
  • *****
  • Posts: 2405
  • Compiler Developer
Re: Small speedup for FExpand function
« Reply #17 on: October 02, 2020, 02:35:24 pm »
How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

No one can say in what situation the code of fexpand.inc is used, because e.g. ExpandFileName does not do disk I/O by itself (aside from what the code in fexpand.inc itself might do). So while optimizing fexpand.inc might not help everyone, due to a potentially following disk I/O being more expensive, there are definitely cases where it does. If it should for example result in smaller/faster code for embedded platforms that would be a plus, and memory allocations (the Delete in the current code) is definitely more expensive than simply adjusting pointers.

ASerge

  • Hero Member
  • *****
  • Posts: 1693
Re: Small speedup for FExpand function
« Reply #18 on: October 02, 2020, 09:28:36 pm »
Sometimes there is no need to look for a solution like the C language, we can use regular Pascal, without any #0:
Code: Pascal  [Select][+][-]
  1. procedure CompressDuplicatesToOne(var InStr: string; Sample: Char);
  2. var
  3.   SLength: SizeInt;
  4.   WritePos: SizeInt;
  5.   ReadPos: SizeInt;
  6. begin
  7.   SLength := Length(InStr);
  8.   ReadPos := 1;
  9.   WritePos := 1;
  10.   while ReadPos <= SLength do
  11.   begin
  12.     InStr[WritePos] := InStr[ReadPos];
  13.     Inc(WritePos);
  14.     if InStr[ReadPos] <> Sample then
  15.       Inc(ReadPos)
  16.     else
  17.       repeat
  18.         Inc(ReadPos);
  19.       until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  20.   end;
  21.   if WritePos < ReadPos then
  22.     SetLength(InStr, WritePos - 1);
  23. end;
In line 12, the compiler implicitly calls the UniqueString function.
We can optimize this by reducing the readability of the code. We will also replace the value of the WritePos variable, since we now need to use it for PChar:
Code: Pascal  [Select][+][-]
  1. var
  2.   SLength: SizeInt;
  3.   WritePosZeroBase: SizeInt; // Zero index
  4.   ReadPos: SizeInt;
  5. begin
  6.   UniqueString(InStr); // Call only once
  7.   SLength := Length(InStr);
  8.   ReadPos := 1;
  9.   WritePosZeroBase := 0;
  10.   while ReadPos <= SLength do
  11.   begin
  12.     // Cast to PChar to eliminate implicit UniqueString call for InStr[i]:=
  13.     (PChar(Pointer(InStr)) + WritePosZeroBase)^ := InStr[ReadPos];
  14.     Inc(WritePosZeroBase);
  15.     if InStr[ReadPos] <> Sample then
  16.       Inc(ReadPos)
  17.     else
  18.       repeat
  19.         Inc(ReadPos);
  20.       until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  21.   end;
  22.   if (WritePosZeroBase + 1) < ReadPos then
  23.     SetLength(InStr, WritePosZeroBase);
  24. end;
Given the fact that modern processors work equally fast with or without index addressing, this code, I think, will be no worse in speed than the code with bringing everything to PChar. It also works for strings containing the #0 character.
At least the assembler text on x64 looks optimal:
Code: ASM  [Select][+][-]
  1. # [57] UniqueString(InStr);
  2.         movq    %rbx,%rcx
  3.         call    FPC_ANSISTR_UNIQUE
  4. .Ll19:
  5. # [58] SLength := Length(InStr);
  6.         movq    (%rbx),%rax
  7.         testq   %rax,%rax
  8.         je      .Lj22
  9.         movq    -8(%rax),%rax
  10. .Lj22:
  11. # Var SLength located in register rax
  12. # Var ReadPos located in register r9
  13. .Ll20:
  14. # [59] ReadPos := 1;
  15.         movl    $1,%r9d
  16. # Var WritePosZeroBase located in register rdx
  17. .Ll21:
  18. # [60] WritePosZeroBase := 0;
  19.         xorl    %edx,%edx
  20. .Ll22:
  21. # [61] while ReadPos <= SLength do
  22.         jmp     .Lj24
  23.         .balign 8,0x90
  24. .Lj23:
  25. .Ll23:
  26.         movq    (%rbx),%rcx
  27. .Ll24:
  28. # [64] (PChar(Pointer(InStr)) + WritePosZeroBase)^ := InStr[ReadPos];
  29.         leaq    (%rcx,%rdx),%r8
  30.         movb    -1(%rcx,%r9,1),%cl
  31.         movb    %cl,(%r8)
  32. .Ll25:
  33. # [65] Inc(WritePosZeroBase);
  34.         addq    $1,%rdx
  35. .Ll26:
  36. # [66] if InStr[ReadPos] <> Sample then
  37.         movq    (%rbx),%rcx
  38.         cmpb    -1(%rcx,%r9,1),%sil
  39.         je      .Lj27
  40. .Ll27:
  41. # [67] Inc(ReadPos)
  42.         addq    $1,%r9
  43.         jmp     .Lj28
  44. .Lj27:
  45.         .balign 8,0x90
  46. .Lj29:
  47. .Ll28:
  48. # [70] Inc(ReadPos);
  49.         addq    $1,%r9
  50. .Ll29:
  51. # [71] until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  52.         cmpq    %r9,%rax
  53.         jl      .Lj31
  54.         movq    (%rbx),%rcx
  55.         cmpb    -1(%rcx,%r9,1),%sil
  56.         je      .Lj29
  57. .Lj31:
  58. .Lj28:
  59. .Lj24:
  60. .Ll30:
  61.         cmpq    %r9,%rax
  62.         jge     .Lj23
  63. .Ll31:
  64. # [73] if (WritePosZeroBase + 1) < ReadPos then
  65.         leaq    1(%rdx),%rax
  66.         cmpq    %r9,%rax
  67.         jnl     .Lj36
  68. .Ll32:
  69. # [74] SetLength(InStr, WritePosZeroBase);
  70.         movq    %rbx,%rcx
  71.         xorl    %r8d,%r8d
  72.         call    fpc_ansistr_setlength
  73.         .balign 4,0x90
  74. .Lj36:
  75. .Ll33:
  76. # [75] end;

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8894
  • FPC developer.
Re: Small speedup for FExpand function
« Reply #19 on: October 02, 2020, 09:37:46 pm »
Given the fact that modern processors work equally fast with or without index addressing, this code, I think, will be no worse in speed than the code with bringing everything to PChar. It also works for strings containing the #0

Recent processors re-valuate string functions. Modern processors can process up to 100 bytes per tick (!). While even with uopcache, 6 uops per tick is the maximum

Maybe we need an IndexNByte for this code:

Code: Pascal  [Select][+][-]
  1.  repeat
  2.         Inc(ReadPos);
  3.       until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);

« Last Edit: October 03, 2020, 02:56:36 pm by marcov »

ASerge

  • Hero Member
  • *****
  • Posts: 1693
Re: Small speedup for FExpand function
« Reply #20 on: October 03, 2020, 10:17:47 am »
Maybe we need an IndexNByte for this code:

Code: Pascal  [Select][+][-]
  1.  repeat
  2.         Inc(ReadPos);
  3.       until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
Thinking is not necessary, in practice there are not many identical consecutive characters, but the IndexByte function is very long compared to the code:
Code: ASM  [Select][+][-]
  1. .Lj29:
  2. .Ll28:
  3. # [70] Inc(ReadPos);
  4.         addq    $1,%r9
  5. .Ll29:
  6. # [71] until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  7.         cmpq    %r9,%rax
  8.         jl      .Lj31
  9.         movq    (%rbx),%rcx
  10.         cmpb    -1(%rcx,%r9,1),%sil
  11.         je      .Lj29
  12. .Lj31:

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8894
  • FPC developer.
Re: Small speedup for FExpand function
« Reply #21 on: October 03, 2020, 03:32:47 pm »
Yeah, and then you get to the topic what CPU such routines should optimize for. Which is not easy.

 

TinyPortal © 2005-2018