Recent

Author Topic: Threadvar performance  (Read 4375 times)

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Threadvar performance
« on: May 24, 2017, 06:25:17 pm »
hi!
in seveal post over the net it's mentioned that threadvar implementation in fpc is very os specific and has a huge prerformance penalty ... is this still true for the 3.0 branch?
i want to base some server code on threadvars to inslutate threads processing clients requests... but still keep these vars globals to the diferent units... any suggestions?
thanks!
Speak postscript or die!
Translate to pdf and live!

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: Threadvar performance
« Reply #1 on: May 24, 2017, 08:26:10 pm »
Uhhmm.

A threadvar - in any language - just means it is allocated on a thread local heap. Which means the information you got is from a  rubbish >:D >:D source.

Basic computer science.

It is not the reason FPC performs sometimes faster or slower than any other compiled language.

Oh, and better provide a link to that source. If it is within travelling distance I may hit him (and share a bottle of wine after).I am not violent. >:( >:D
« Last Edit: May 24, 2017, 08:34:07 pm by Thaddy »
Specialize a type, not a var.

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: Threadvar performance
« Reply #2 on: May 24, 2017, 08:37:38 pm »
@thaddy : no several posts mentioned that tls involves syscall and kernel table lookup.... check google for threadvar penalty... and they point a workaround in some compilers using gs / fs regiaters loaded with the tls addr and using these regiaters in place of ds... and some posts including fpc wiki states performance penalty on several oses including linux ( i do not care about win because i using it only as dev/test and deploy on linux)... pls check and give me your opinion for the implementation on fpc 3.0 +) thanks thaddy!
Speak postscript or die!
Translate to pdf and live!

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: Threadvar performance
« Reply #3 on: May 24, 2017, 08:46:49 pm »
The implementation in FPC is known and can be found in the compiler source:
It is one single locked operand following two -if possible, given the platform, but in practice always- instructions.
This is taken care of in the high level code generator. Read the code. Then decide what's wrong (not!)
Do not trust the wiki, trust the compiler source code. The wiki is polluted.
« Last Edit: May 24, 2017, 08:54:20 pm by Thaddy »
Specialize a type, not a var.

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: Threadvar performance
« Reply #4 on: May 24, 2017, 09:27:45 pm »
please be more specific on the implementatio;)
Speak postscript or die!
Translate to pdf and live!

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Speak postscript or die!
Translate to pdf and live!

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Re: Threadvar performance
« Reply #6 on: May 25, 2017, 10:30:12 am »
<sigh> LOOK AT THE COMPILER SOURCES, silly. < I am now getting very angry and not in any way grumpy. >:D >:D >:D >:D>
And you deserve that, because you are NOT a beginner. 8-)

And the wiki is not official information. Anyone can write nonsense there. The official information is in the extremely good manuals.
« Last Edit: May 25, 2017, 10:33:03 am by Thaddy »
Specialize a type, not a var.

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: Threadvar performance
« Reply #7 on: May 25, 2017, 05:03:54 pm »
hahahah ok :)))  O:-)
Speak postscript or die!
Translate to pdf and live!

argb32

  • Jr. Member
  • **
  • Posts: 89
    • Pascal IDE based on IntelliJ platform
Re: Threadvar performance
« Reply #8 on: May 25, 2017, 07:09:51 pm »
Why not to write a simple benchmark?

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: Threadvar performance
« Reply #9 on: May 27, 2017, 04:52:30 pm »
Simple benchmar done!
threadvars are aprox 8 times slower
 for then following code :
Code: [Select]

T:=GetTickCount64;
 LocalInt:=00;
 Temp:=00;;
 for i:=0 to 10000000 do
  begin
    Inc(Temp);
    LocalInt:=Temp;
  end;
 T2:=GetTickCount64;
 Memo1.Lines.Add(format('Local Var: %d',[T2-T]));

 T:=GetTickCount64;
 GlobalInt:=00;
 Temp:=0;
 for i:=0 to 10000000 do
  begin
    Inc(Temp);
    GlobalInt:=Temp
  end;
 T2:=GetTickCount64;
 Memo1.Lines.Add(format('Global Var: %d',[T2-T]));


 T:=GetTickCount64;
 ThreadInt:=00;
 Temp:=0;
 for i:=0 to 10000000 do
  begin
    Inc(Temp);
    ThreadInt:=Temp
  end;
 T2:=GetTickCount64;
 Memo1.Lines.Add(format('Global Var: %d',[T2-T]));   



Local Var: 16
Global Var: 16
Thread Var: 110

P.S Adding Pointer to thereadvar works

Local Var: 16
Global Var: 15
ThreadVar Var: 125
Pointer to ThreadVar Var: 16
« Last Edit: May 27, 2017, 04:56:50 pm by Blestan »
Speak postscript or die!
Translate to pdf and live!

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Threadvar performance - complete nightmare!
« Reply #10 on: May 28, 2017, 01:09:55 pm »
with long loops the situation is a nightmare -> 40 times slower threadvar access

Local Var: 640
Global Var: 641
ThreadVar: 24469
With ThreadVar do: 24265
Pointer to ThreadVar Var: 625
Speak postscript or die!
Translate to pdf and live!

 

TinyPortal © 2005-2018