I noticed this programming with Delphi 2007, but FPC does suffer from the same problem: StringReplace is awful with long strings. I ran into this in real world while wondering how Indy's headers could be so slow with big POST buffers. It converted '+' to ' ' with StringReplace.
When I looked at the code, the reason was obvious: StringReplace does a lot of string manipulation, copy and concatenation. It gets really ugly fast, when there's a lot to replace and the string is big.
I wrote a new version, which is a lot more optimized. It can certainly be optimized more, but the general idea is to calculate the length of the result string, set the length once and only manipulate the contents of it.
Here are some test results:
58058 bytes Single char, replace all 203 x faster
58058 bytes Same length, replace all 61 x faster
58058 bytes Single char, replace all, ignore case 68 x faster
58058 bytes Same length, replace all, ignore case 34 x faster
58058 bytes New longer, replace all 32 x faster
58058 bytes New shorter, replace all 32 x faster
58058 bytes New longer, replace all, ignore case 22 x faster
58058 bytes New shorter, replace all, ignore case 22 x faster
58058 bytes New empty, replace all 35 x faster
58058 bytes New empty, replace all, ignore case 23 x faster
116058 bytes Single char, replace all 260 x faster
116058 bytes Same length, replace all 148 x faster
116058 bytes Single char, replace all, ignore case 153 x faster
116058 bytes Same length, replace all, ignore case 75 x faster
116058 bytes New longer, replace all 62 x faster
116058 bytes New shorter, replace all 61 x faster
116058 bytes New longer, replace all, ignore case 45 x faster
116058 bytes New shorter, replace all, ignore case 44 x faster
116058 bytes New empty, replace all 68 x faster
116058 bytes New empty, replace all, ignore case 50 x faster
174058 bytes Single char, replace all 518 x faster
174058 bytes Same length, replace all 228 x faster
174058 bytes Single char, replace all, ignore case 219 x faster
174058 bytes Same length, replace all, ignore case 113 x faster
174058 bytes New longer, replace all 100 x faster
174058 bytes New shorter, replace all 104 x faster
174058 bytes New longer, replace all, ignore case 70 x faster
174058 bytes New shorter, replace all, ignore case 72 x faster
174058 bytes New empty, replace all 107 x faster
174058 bytes New empty, replace all, ignore case 80 x faster
232058 bytes Single char, replace all 634 x faster
232058 bytes Same length, replace all 291 x faster
232058 bytes Single char, replace all, ignore case 305 x faster
232058 bytes Same length, replace all, ignore case 152 x faster
232058 bytes New longer, replace all 141 x faster
232058 bytes New shorter, replace all 140 x faster
232058 bytes New longer, replace all, ignore case 98 x faster
232058 bytes New shorter, replace all, ignore case 98 x faster
232058 bytes New empty, replace all 153 x faster
232058 bytes New empty, replace all, ignore case 107 x faster
290058 bytes Single char, replace all 1043 x faster
290058 bytes Same length, replace all 487 x faster
290058 bytes Single char, replace all, ignore case 542 x faster
290058 bytes Same length, replace all, ignore case 311 x faster
290058 bytes New longer, replace all 254 x faster
290058 bytes New shorter, replace all 249 x faster
290058 bytes New longer, replace all, ignore case 183 x faster
290058 bytes New shorter, replace all, ignore case 179 x faster
290058 bytes New empty, replace all 277 x faster
290058 bytes New empty, replace all, ignore case 197 x faster
And the new code is: (Update: This original code has a bug in it, look below for a fixed and improved version)
Function NewStringReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;
var
OldPat,Srch: string; // Srch and Oldp can contain uppercase versions of S,OldPattern
PatLength,NewPatLength,P,i,PatCount,PrevP: Integer;
c,d: pchar;
begin
PatLength:=Length(OldPattern);
if PatLength=0 then begin
Result:=S;
exit;
end;
if rfIgnoreCase in Flags then begin
Srch:=AnsiUpperCase(S);
OldPat:=AnsiUpperCase(OldPattern);
end else begin
Srch:=S;
OldPat:=OldPattern;
end;
PatLength:=Length(OldPat);
if Length(NewPattern)=PatLength then begin
//Result length will not change
Result:=S;
P:=1;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=1 to PatLength do
Result[P+i-1]:=NewPattern[i];
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
end;
until p=0;
end else begin
//Different pattern length -> Result length will change
//To avoid creating a lot of temporary strings, we count how many
//replacements we're going to make.
P:=1; PatCount:=0;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
inc(P,PatLength);
inc(PatCount);
if not (rfReplaceAll in Flags) then break;
end;
until p=0;
if PatCount=0 then begin
Result:=S;
exit;
end;
NewPatLength:=Length(NewPattern);
SetLength(Result,Length(S)+PatCount*(NewPatLength-PatLength));
P:=1; PrevP:=0;
c:=pchar(Result); d:=pchar(S);
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=PrevP+1 to P-1 do begin
c^:=d^;
inc(c); inc(d);
end;
for i:=1 to NewPatLength do begin
c^:=NewPattern[i];
inc(c);
end;
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
inc(d,PatLength);
PrevP:=P-1;
end else begin
for i:=PrevP+1 to Length(S) do begin
c^:=d^;
inc(c); inc(d);
end;
end;
until p=0;
end;
end;
Comments are welcome. It'd be nice to get this one or an even more improved one to the FPC source tree.