Forum > FPC development
Slow TStringList.Sort in Applications.
(1/1)
Borneq:
This forum is proper for this post? Post is about error/bad behavior of FPC.
I have tested sort time in FPC console, FPC applications and GCC release console. I have Linux Fedora and Laptop with core I3 20 GB RAM.
variable d means:
big value like 100 = most strings are different each with other
small value like 0.01 = a lot of the same string valiues
First FPC console compiled with aggressive optimalization
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program timeSortCon; uses Linux, unixtype, Classes, SysUtils, Math; procedure fill(list: TStringList; size: integer; d: double);var i,n: integer;begin list.Clear; for i:=0 to size-1 do begin n:=Random(round(size*d)); list.Add(IntToStr(n)); end;end; const c1e9 = 1000*1000*1000;var tp1,tp2: ttimespec; t1,t2: int64; list:TStringList; size: integer; d: double;begin list:=TStringList.Create; size:=1024; d:=100; repeat Fill(list, size, d); clock_gettime(CLOCK_MONOTONIC, @tp1); list.sort(); clock_gettime(CLOCK_MONOTONIC, @tp2); t1:=tp1.tv_nsec+tp1.tv_sec*c1e9; t2:=tp2.tv_nsec+tp2.tv_sec*c1e9; writeln(FloatToStr((t2-t1)/Math.Log2(size)/size)); size:=size*2; until size>4*1024*1024; list.Free; writeln('end');end. output is
33.86015625
33.3103249289773
33.4970092773438
35.7384127103365
37.3490033830915
38.675048828125
38.206618309021
42.8031939338235
45.9571946461995
51.8758956507633
53.9631856918335
62.9143598420279
69.6882724003358
end
id d = 0.01:
112.74697265625
104.522283380682
88.3057861328125
66.2308349609375
65.185782296317
66.95908203125
66.3638725280762
70.4515856574563
77.0949376424154
77.421782844945
80.1974610805511
87.4744812647502
91.7128844369542
end
A bit slower
Compare to C++:
--- Code: C [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---#include <iostream>#include <algorithm>#include <vector>#include <chrono>#include <cmath>using namespace std; class stopwatch{public: chrono::time_point<chrono::high_resolution_clock> a, b; void start() { a = chrono::high_resolution_clock::now(); } void stop() { b = chrono::high_resolution_clock::now(); } double duration() { chrono::duration<double> elapsed_seconds = b - a; return elapsed_seconds.count(); }}; void fill(vector<string> &list, int size, double d) { list.clear(); list.resize(size); for (int i=0; i<size; i++) list[i] = to_string(rand()%((int)(size*d)));} int main() { int size = 1024; double d = 100; vector<string> list; do{ fill(list, size, d); stopwatch st; st.start(); sort(list.begin(), list.end()); st.stop(); cout << st.duration()*1e9/ log2(size)/size << endl; size *= 2; }while(size<=4*1024*1024); cout << "end" <<endl; return 0;} d=100
16.7077
15.6237
16.1659
14.5625
13.7777
15.116
13.3096
13.6631
13.0391
12.9087
12.9574
12.8193
12.6872
end
d=0.01, even faster!
8.36992
10.4783
10.5804
10.4236
9.90457
9.99056
9.80442
9.87245
9.76277
9.91909
9.97854
10.568
10.2577
end
FPC is several times slower, but fatal is in form application.
I make thread , but it is not important, the same is in main thread, except that thread allows convenient view results, when process not yet finished
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program timeSort; {$mode objfpc}{$H+}{$define UseCThreads} uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads,cmem, {$ENDIF}{$ENDIF} Interfaces, // this includes the LCL widgetset Forms, Unit1 { you can add units after this }; {$R *.res} begin RequireDerivedFormResource:=True; Application.Scaled:=True; Application.Initialize; Application.CreateForm(TForm1, Form1); Application.Run;end. and
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---object Form1: TForm1 Left = 248 Height = 240 Top = 131 Width = 320 Caption = 'Form1' ClientHeight = 240 ClientWidth = 320 OnCreate = FormCreate LCLVersion = '2.0.12.0' object Button1: TButton Left = 76 Height = 25 Top = 22 Width = 75 Caption = 'Button1' OnClick = Button1Click TabOrder = 0 end object Memo1: TMemo Left = 16 Height = 130 Top = 96 Width = 296 Lines.Strings = ( 'Memo1' ) ScrollBars = ssAutoBoth TabOrder = 1 endend and
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---unit Unit1; {$mode objfpc}{$H+} interface uses Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls,Math; type { TForm1 } TForm1 = class(TForm) Button1: TButton; Memo1: TMemo; procedure Button1Click(Sender: TObject); private public end; var Form1: TForm1; implementation uses Linux, unixtype; {$R *.lfm} type { TTestThread } TTestThread = class(TThread) protected fToMemo: string; procedure AddToMemo; procedure Execute; override; public constructor Create(); end; { TTestThread } procedure fill(list: TStringList; size: integer; d: double);var i,n: integer;begin list.Clear; for i:=0 to size-1 do begin n:=Random(round(size*d)); list.Add(IntToStr(n)); end;end; procedure TTestThread.Execute;const c1e9 = 1000*1000*1000;var tp1,tp2: ttimespec; t1,t2: int64; list:TStringList; size: integer; d: double;begin list:=TStringList.Create; size:=1024; d:=100; repeat Fill(list, size, d); clock_gettime(CLOCK_MONOTONIC, @tp1); list.sort(); clock_gettime(CLOCK_MONOTONIC, @tp2); t1:=tp1.tv_nsec+tp1.tv_sec*c1e9; t2:=tp2.tv_nsec+tp2.tv_sec*c1e9; fToMemo := FloatToStr((t2-t1)/Math.Log2(size)/size); synchronize(@AddTOMemo); size:=size*2; until size>4*1024*1024; list.Free; fToMemo := 'end'; synchronize(@AddTOMemo);end; constructor TTestThread.Create();begin inherited Create(false); FreeOnTerminate:=true;end; procedure TTestThread.AddToMemo;begin Form1.Memo1.Lines.Add(fToMemo);end; { TForm1 } procedure TForm1.Button1Click(Sender: TObject);var thread: TTestThread;begin thread:=TTestThread.Create(); thread.Start;end; end. Results for d=100 (aggressive optimization)
1955.9984375
1255.27476917614
505.58701578776
459.529766376202
486.121451241629
512.559214274089
517.353302955627
561.423980712891
604.507530212402
642.564016342163
644.596619796753
697.545514220283
716.581226251342
end
and for d=0.01
3564.07783203125
1221.64275568182
1036.1786702474
995.541672926683
1013.31421334403
1049.10948689779
1012.10831642151
1034.92465344597
1054.99673144023
1066.52670609324
1088.22270264626
1120.13203434717
1127.15126429905
end
Horrible slow!
So worst, BeginUpdate/EndUpdate
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- list.BeginUpdate; list.sort(); list.EndUpdate; not helps....
avk:
Perhaps one reason could be that TStringList uses AnsiCompareText() as its default comparator, which, if I'm not mistaken, calls Utf8CompareText() under the hood in LCL applications.
What happens if you add
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---begin list:=TStringList.Create; list.UseLocale := False; // <--- ...
Borneq:
Nothing happen in console program, but very speedup for Application - now is time as console. Thanks.
paweld:
use generics, eg. Generics.Collections or LGenerics ( https://github.com/avk959/LGenerics )
Warfley:
The difference between FPC and C++ is to be expected. The major difference here is that the FPC version uses function pointers for both comparison, but also swapping. C++ on the other hand takes the comparator as template parameter and does the swapping using std::move. Both of these calls can be inlined.
For example:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1; {$mode objfpc}{$H+} uses SysUtils; type TIntArray = Array of Integer; TLessFunc = function(A, B: Integer): Boolean; TSwapFunc = procedure(var A, B: Integer); function Less(A, B: Integer): Boolean; inline;begin Result := A < B;end; procedure Swap(var A, B: Integer); inline;begin A := A Xor B; B := B Xor A; A := A Xor B;end; procedure BubbleSortFP(data: TIntArray; LessFunc: TLessFunc; SwapFunc: TSwapFunc);var i, j: Integer;begin for i := 1 to Length(data) - 1 do for j := 0 to Length(data) - 1 - i do if LessFunc(data[j + 1], data[j]) then SwapFunc(data[j], data[j + 1]);end; procedure BubbleSortInline(data: TIntArray);var i, j: Integer;begin for i := 0 to Length(data) - 1 do for j := 0 to Length(data) - 2 - i do if Less(data[j + 1], data[j]) then Swap(data[j], data[j + 1]);end; var data: Array of Integer; i: Integer; Start: QWord;begin SetLength(data, 10000); for i:=0 to length(data) -1 do data[i] := Random(Integer.MaxValue); Start := GetTickCount64; BubbleSortFP(data, @Less, @Swap); WriteLn('FuncPtr: ', GetTickCount64 - Start); // Refill array with new numbers to "fool" predictive caching of CPU for i:=0 to length(data) -1 do data[i] := Random(Integer.MaxValue); Start := GetTickCount64; BubbleSortInline(data); WriteLn('Inline: ', GetTickCount64 - Start); ReadLn;end. On my machine the BubbleSortFP takes around double the time the inlined version BubbleSortInline does, just by passing function pointers.
Generally speaking, the classes in the RTL (or generally most parts of the RTL) are not optimised for performance. Plus generally are most C++ compilers much better at optimisation than the FPC.
PS: what this example also wonderfully demonstrates is the power of predictive caching, remove the second filling of the array with new random values and you will see a major speedup on many modern CPU, on mine its like 4x
Navigation
[0] Message Index