Forum > FPC development

Slow TStringList.Sort in Applications.

(1/1)

Borneq:
This forum is proper for this post? Post is about error/bad behavior of FPC.
I have tested sort time in FPC console, FPC applications and GCC release console. I have Linux Fedora and Laptop with core I3 20 GB RAM.
variable d means:
big value like 100 = most strings are different each with other
small value like 0.01 = a lot of the same string valiues
First FPC console compiled with aggressive optimalization

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program timeSortCon; uses Linux, unixtype, Classes, SysUtils, Math;  procedure fill(list: TStringList; size: integer; d: double);var  i,n: integer;begin  list.Clear;  for i:=0 to size-1 do  begin    n:=Random(round(size*d));    list.Add(IntToStr(n));  end;end; const c1e9 = 1000*1000*1000;var  tp1,tp2: ttimespec;  t1,t2: int64;  list:TStringList;  size: integer;  d: double;begin  list:=TStringList.Create;  size:=1024;  d:=100;  repeat     Fill(list, size, d);     clock_gettime(CLOCK_MONOTONIC, @tp1);     list.sort();     clock_gettime(CLOCK_MONOTONIC, @tp2);     t1:=tp1.tv_nsec+tp1.tv_sec*c1e9;     t2:=tp2.tv_nsec+tp2.tv_sec*c1e9;     writeln(FloatToStr((t2-t1)/Math.Log2(size)/size));     size:=size*2;  until size>4*1024*1024;  list.Free;  writeln('end');end. output is

33.86015625
33.3103249289773
33.4970092773438
35.7384127103365
37.3490033830915
38.675048828125
38.206618309021
42.8031939338235
45.9571946461995
51.8758956507633
53.9631856918335
62.9143598420279
69.6882724003358
end

id d = 0.01:

112.74697265625
104.522283380682
88.3057861328125
66.2308349609375
65.185782296317
66.95908203125
66.3638725280762
70.4515856574563
77.0949376424154
77.421782844945
80.1974610805511
87.4744812647502
91.7128844369542
end

A bit slower
Compare to C++:

--- Code: C  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---#include <iostream>#include <algorithm>#include <vector>#include <chrono>#include <cmath>using namespace std; class stopwatch{public:    chrono::time_point<chrono::high_resolution_clock> a, b;    void start() { a = chrono::high_resolution_clock::now(); }    void stop() { b = chrono::high_resolution_clock::now(); }    double duration()    {        chrono::duration<double> elapsed_seconds = b - a;        return elapsed_seconds.count();    }}; void fill(vector<string> &list, int size, double d) {    list.clear();    list.resize(size);    for (int i=0; i<size; i++)        list[i] = to_string(rand()%((int)(size*d)));} int main() {    int size = 1024;    double d = 100;    vector<string> list;    do{        fill(list, size, d);        stopwatch st;        st.start();        sort(list.begin(), list.end());        st.stop();        cout << st.duration()*1e9/ log2(size)/size << endl;        size *= 2;    }while(size<=4*1024*1024);    cout << "end" <<endl;    return 0;} d=100

16.7077
15.6237
16.1659
14.5625
13.7777
15.116
13.3096
13.6631
13.0391
12.9087
12.9574
12.8193
12.6872
end

d=0.01, even faster!

8.36992
10.4783
10.5804
10.4236
9.90457
9.99056
9.80442
9.87245
9.76277
9.91909
9.97854
10.568
10.2577
end


FPC is several times slower, but fatal is in form application.
I make thread , but it is not important, the same is in main thread, except that thread allows convenient view results, when process not yet finished

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program timeSort; {$mode objfpc}{$H+}{$define UseCThreads} uses  {$IFDEF UNIX}{$IFDEF UseCThreads}  cthreads,cmem,  {$ENDIF}{$ENDIF}  Interfaces, // this includes the LCL widgetset  Forms, Unit1  { you can add units after this }; {$R *.res} begin  RequireDerivedFormResource:=True;  Application.Scaled:=True;  Application.Initialize;  Application.CreateForm(TForm1, Form1);  Application.Run;end. and

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---object Form1: TForm1  Left = 248  Height = 240  Top = 131  Width = 320  Caption = 'Form1'  ClientHeight = 240  ClientWidth = 320  OnCreate = FormCreate  LCLVersion = '2.0.12.0'  object Button1: TButton    Left = 76    Height = 25    Top = 22    Width = 75    Caption = 'Button1'    OnClick = Button1Click    TabOrder = 0  end  object Memo1: TMemo    Left = 16    Height = 130    Top = 96    Width = 296    Lines.Strings = (      'Memo1'    )    ScrollBars = ssAutoBoth    TabOrder = 1  endend and

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---unit Unit1; {$mode objfpc}{$H+} interface uses  Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls,Math; type   { TForm1 }   TForm1 = class(TForm)    Button1: TButton;    Memo1: TMemo;    procedure Button1Click(Sender: TObject);  private  public  end; var  Form1: TForm1; implementation uses Linux, unixtype; {$R *.lfm} type   { TTestThread }   TTestThread = class(TThread)  protected    fToMemo: string;    procedure AddToMemo;    procedure Execute; override;  public    constructor Create();  end; { TTestThread } procedure fill(list: TStringList; size: integer; d: double);var  i,n: integer;begin  list.Clear;  for i:=0 to size-1 do  begin    n:=Random(round(size*d));    list.Add(IntToStr(n));  end;end; procedure TTestThread.Execute;const c1e9 = 1000*1000*1000;var  tp1,tp2: ttimespec;  t1,t2: int64;  list:TStringList;  size: integer;  d: double;begin  list:=TStringList.Create;  size:=1024;  d:=100;  repeat     Fill(list, size, d);     clock_gettime(CLOCK_MONOTONIC, @tp1);     list.sort();     clock_gettime(CLOCK_MONOTONIC, @tp2);     t1:=tp1.tv_nsec+tp1.tv_sec*c1e9;     t2:=tp2.tv_nsec+tp2.tv_sec*c1e9;     fToMemo := FloatToStr((t2-t1)/Math.Log2(size)/size);     synchronize(@AddTOMemo);     size:=size*2;  until size>4*1024*1024;  list.Free;  fToMemo := 'end';  synchronize(@AddTOMemo);end; constructor TTestThread.Create();begin  inherited Create(false);  FreeOnTerminate:=true;end; procedure TTestThread.AddToMemo;begin  Form1.Memo1.Lines.Add(fToMemo);end; { TForm1 } procedure TForm1.Button1Click(Sender: TObject);var  thread: TTestThread;begin  thread:=TTestThread.Create();  thread.Start;end; end. Results for d=100 (aggressive optimization)

1955.9984375
1255.27476917614
505.58701578776
459.529766376202
486.121451241629
512.559214274089
517.353302955627
561.423980712891
604.507530212402
642.564016342163
644.596619796753
697.545514220283
716.581226251342
end

and for d=0.01

3564.07783203125
1221.64275568182
1036.1786702474
995.541672926683
1013.31421334403
1049.10948689779
1012.10831642151
1034.92465344597
1054.99673144023
1066.52670609324
1088.22270264626
1120.13203434717
1127.15126429905
end

Horrible slow!
So worst, BeginUpdate/EndUpdate

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---     list.BeginUpdate;     list.sort();     list.EndUpdate;  not helps....

avk:
Perhaps one reason could be that TStringList uses AnsiCompareText() as its default comparator, which, if I'm not mistaken, calls Utf8CompareText() under the hood in LCL applications.
What happens if you add

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---begin  list:=TStringList.Create;  list.UseLocale := False; // <---  ... 

Borneq:
Nothing happen in console program, but very speedup for Application  - now is time as console. Thanks.

paweld:
use generics, eg. Generics.Collections or LGenerics ( https://github.com/avk959/LGenerics )

Warfley:
The difference between FPC and C++ is to be expected. The major difference here is that the FPC version uses function pointers for both comparison, but also swapping. C++ on the other hand takes the comparator as template parameter and does the swapping using std::move. Both of these calls can be inlined.

For example:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1; {$mode objfpc}{$H+} uses  SysUtils; type  TIntArray = Array of Integer;  TLessFunc = function(A, B: Integer): Boolean;  TSwapFunc = procedure(var A, B: Integer); function Less(A, B: Integer): Boolean; inline;begin  Result := A < B;end; procedure Swap(var A, B: Integer); inline;begin  A := A Xor B;  B := B Xor A;  A := A Xor B;end; procedure BubbleSortFP(data: TIntArray; LessFunc: TLessFunc; SwapFunc: TSwapFunc);var  i, j: Integer;begin  for i := 1 to Length(data) - 1 do    for j := 0 to Length(data) - 1 - i do      if LessFunc(data[j + 1], data[j]) then SwapFunc(data[j], data[j + 1]);end; procedure BubbleSortInline(data: TIntArray);var  i, j: Integer;begin  for i := 0 to Length(data) - 1 do    for j := 0 to Length(data) - 2 - i do      if Less(data[j + 1], data[j]) then Swap(data[j], data[j + 1]);end; var  data: Array of Integer;  i: Integer;  Start: QWord;begin  SetLength(data, 10000);   for i:=0 to length(data) -1 do    data[i] := Random(Integer.MaxValue);  Start := GetTickCount64;  BubbleSortFP(data, @Less, @Swap);  WriteLn('FuncPtr: ', GetTickCount64 - Start);   // Refill array with new numbers to "fool" predictive caching of CPU  for i:=0 to length(data) -1 do    data[i] := Random(Integer.MaxValue);  Start := GetTickCount64;  BubbleSortInline(data);  WriteLn('Inline: ', GetTickCount64 - Start);  ReadLn;end. On my machine the BubbleSortFP takes around double the time the inlined version BubbleSortInline does, just by passing function pointers.

Generally speaking, the classes in the RTL (or generally most parts of the RTL) are not optimised for performance. Plus generally are most C++ compilers much better at optimisation than the FPC.

PS: what this example also wonderfully demonstrates is the power of predictive caching, remove the second filling of the array with new random values and you will see a major speedup on many modern CPU, on mine its like 4x

Navigation

[0] Message Index

Go to full version