Small speedup for FExpand function

Bart

Hero Member
Posts: 5288

Re: Small speedup for FExpand function

« Reply #15 on: October 02, 2020, 02:22:06 pm »

How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

Bart

Logged

Martin_fr

Administrator
Hero Member
Posts: 9855
Debugger - SynEdit - and more

Re: Small speedup for FExpand function

« Reply #16 on: October 02, 2020, 02:31:57 pm »

Quote from: Bart on October 02, 2020, 02:22:06 pm

How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

I don't really know. But I dont think code like my example is really needed. I only meant it as proof of concept.

I can see it getting used heavily, if file locations are compared (if you have a list of pathes).
But: pathes are usually short. I do not even know, how long a path would be acceptable by most OS.

On a string of 1000 chars the speed diff will probably be negligible.

Logged

From the wiki: Ide Tools, Code completion and more / IDE cool features / Debugger Status

PascalDragon

Hero Member
Posts: 5462
Compiler Developer

Re: Small speedup for FExpand function

« Reply #17 on: October 02, 2020, 02:35:24 pm »

Quote from: Bart on October 02, 2020, 02:22:06 pm

How relevant is this discussion about speeding up FExpand?
Since this code is primarily used in preparation of some disk IO, which is orders of magnitude slower than string handling.
So it is very unlikely that FExpand will ever be the bottleneck in your program.

IMO it is better to have good readable code then pre-mature optimized code that is harder to follow (and potentially therefore more prone to new bugs).

No one can say in what situation the code of fexpand.inc is used, because e.g. ExpandFileName does not do disk I/O by itself (aside from what the code in fexpand.inc itself might do). So while optimizing fexpand.inc might not help everyone, due to a potentially following disk I/O being more expensive, there are definitely cases where it does. If it should for example result in smaller/faster code for embedded platforms that would be a plus, and memory allocations (the Delete in the current code) is definitely more expensive than simply adjusting pointers.

Logged

ASerge

Hero Member
Posts: 2240

Re: Small speedup for FExpand function

« Reply #18 on: October 02, 2020, 09:28:36 pm »

Sometimes there is no need to look for a solution like the C language, we can use regular Pascal, without any #0:

Code: Pascal [Select][+]

procedure CompressDuplicatesToOne(var InStr: string; Sample: Char);
var
  SLength: SizeInt;
  WritePos: SizeInt;
  ReadPos: SizeInt;
begin
  SLength := Length(InStr);
  ReadPos := 1;
  WritePos := 1;
  while ReadPos <= SLength do
  begin
    InStr[WritePos] := InStr[ReadPos];
    Inc(WritePos);
    if InStr[ReadPos] <> Sample then
      Inc(ReadPos)
    else
      repeat
        Inc(ReadPos);
      until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  end;
  if WritePos < ReadPos then
    SetLength(InStr, WritePos - 1);
end;

In line 12, the compiler implicitly calls the UniqueString function.
We can optimize this by reducing the readability of the code. We will also replace the value of the WritePos variable, since we now need to use it for PChar:

Code: Pascal [Select][+]

var
  SLength: SizeInt;
  WritePosZeroBase: SizeInt; // Zero index
  ReadPos: SizeInt;
begin
  UniqueString(InStr); // Call only once
  SLength := Length(InStr);
  ReadPos := 1;
  WritePosZeroBase := 0;
  while ReadPos <= SLength do
  begin
    // Cast to PChar to eliminate implicit UniqueString call for InStr[i]:=
    (PChar(Pointer(InStr)) + WritePosZeroBase)^ := InStr[ReadPos];
    Inc(WritePosZeroBase);
    if InStr[ReadPos] <> Sample then
      Inc(ReadPos)
    else
      repeat
        Inc(ReadPos);
      until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
  end;
  if (WritePosZeroBase + 1) < ReadPos then
    SetLength(InStr, WritePosZeroBase);
end;

Given the fact that modern processors work equally fast with or without index addressing, this code, I think, will be no worse in speed than the code with bringing everything to PChar. It also works for strings containing the #0 character.
At least the assembler text on x64 looks optimal:

Code: ASM [Select][+]

# [57] UniqueString(InStr);
        movq    %rbx,%rcx
        call    FPC_ANSISTR_UNIQUE
.Ll19:
# [58] SLength := Length(InStr);
        movq    (%rbx),%rax
        testq   %rax,%rax
        je      .Lj22
        movq    -8(%rax),%rax
.Lj22:
# Var SLength located in register rax
# Var ReadPos located in register r9
.Ll20:
# [59] ReadPos := 1;
        movl    $1,%r9d
# Var WritePosZeroBase located in register rdx
.Ll21:
# [60] WritePosZeroBase := 0;
        xorl    %edx,%edx
.Ll22:
# [61] while ReadPos <= SLength do
        jmp     .Lj24
        .balign 8,0x90
.Lj23:
.Ll23:
        movq    (%rbx),%rcx
.Ll24:
# [64] (PChar(Pointer(InStr)) + WritePosZeroBase)^ := InStr[ReadPos];
        leaq    (%rcx,%rdx),%r8
        movb    -1(%rcx,%r9,1),%cl
        movb    %cl,(%r8)
.Ll25:
# [65] Inc(WritePosZeroBase);
        addq    $1,%rdx
.Ll26:
# [66] if InStr[ReadPos] <> Sample then
        movq    (%rbx),%rcx
        cmpb    -1(%rcx,%r9,1),%sil
        je      .Lj27
.Ll27:
# [67] Inc(ReadPos)
        addq    $1,%r9
        jmp     .Lj28
.Lj27:
        .balign 8,0x90
.Lj29:
.Ll28:
# [70] Inc(ReadPos);
        addq    $1,%r9
.Ll29:
# [71] until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
        cmpq    %r9,%rax
        jl      .Lj31
        movq    (%rbx),%rcx
        cmpb    -1(%rcx,%r9,1),%sil
        je      .Lj29
.Lj31:
.Lj28:
.Lj24:
.Ll30:
        cmpq    %r9,%rax
        jge     .Lj23
.Ll31:
# [73] if (WritePosZeroBase + 1) < ReadPos then
        leaq    1(%rdx),%rax
        cmpq    %r9,%rax
        jnl     .Lj36
.Ll32:
# [74] SetLength(InStr, WritePosZeroBase);
        movq    %rbx,%rcx
        xorl    %r8d,%r8d
        call    fpc_ansistr_setlength
        .balign 4,0x90
.Lj36:
.Ll33:
# [75] end;

Logged

marcov

Administrator
Hero Member
Posts: 11445
FPC developer.

Re: Small speedup for FExpand function

« Reply #19 on: October 02, 2020, 09:37:46 pm »

Quote from: ASerge on October 02, 2020, 09:28:36 pm

Given the fact that modern processors work equally fast with or without index addressing, this code, I think, will be no worse in speed than the code with bringing everything to PChar. It also works for strings containing the #0

Recent processors re-valuate string functions. Modern processors can process up to 100 bytes per tick (!). While even with uopcache, 6 uops per tick is the maximum

Maybe we need an IndexNByte for this code:

Code: Pascal [Select][+]

 repeat
        Inc(ReadPos);
      until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);

« Last Edit: October 03, 2020, 02:56:36 pm by marcov »

Logged

ASerge

Hero Member
Posts: 2240

Re: Small speedup for FExpand function

« Reply #20 on: October 03, 2020, 10:17:47 am »

Quote from: marcov on October 02, 2020, 09:37:46 pm

Maybe we need an IndexNByte for this code:

Code: Pascal [Select][+][-]
repeat
Inc(ReadPos);
until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);

Thinking is not necessary, in practice there are not many identical consecutive characters, but the IndexByte function is very long compared to the code:

Code: ASM [Select][+]

.Lj29:
.Ll28:
# [70] Inc(ReadPos);
        addq    $1,%r9
.Ll29:
# [71] until (ReadPos > SLength) or (InStr[ReadPos] <> Sample);
        cmpq    %r9,%rax
        jl      .Lj31
        movq    (%rbx),%rcx
        cmpb    -1(%rcx,%r9,1),%sil
        je      .Lj29
.Lj31:

Logged

marcov

Administrator
Hero Member
Posts: 11445
FPC developer.

Re: Small speedup for FExpand function

« Reply #21 on: October 03, 2020, 03:32:47 pm »

Yeah, and then you get to the topic what CPU such routines should optimize for. Which is not easy.

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Small speedup for FExpand function (Read 3591 times)

Bart

Re: Small speedup for FExpand function

Martin_fr

Re: Small speedup for FExpand function

PascalDragon

Re: Small speedup for FExpand function

ASerge

Re: Small speedup for FExpand function

marcov

Re: Small speedup for FExpand function

ASerge

Re: Small speedup for FExpand function

marcov

Re: Small speedup for FExpand function

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook