Recent

Author Topic: Thread performance with global objects  (Read 3409 times)

Nitorami

  • Sr. Member
  • ****
  • Posts: 368
Thread performance with global objects
« on: January 12, 2016, 07:46:58 pm »
I made a simulation program whose core is encapsulated in Class TSim.

For optimum performance on multi core processors, I run two instances of TSim in separate threads, and combine their results after they have finished. In comparison to single thread operation, I get 50%...90% performance gain on a dual core CPU.

The simulation needs a lot of random numbers, therefore I replaced FPC's own Mersenne Twister by a simple but very fast random generator, encapsulated in object TRandGen.

I started with a single global instance of TRandGen, which is probably not the best idea. Manipulating RandGen's internal state by two concurrent processes might be detrimental to the generators properties... not sure. I expect that FPC's random generator uses critical sections to avoid that.
But that is not the issue. What puzzled me is that the global RandGen seems to cause a bottleneck, and performance breaks down to almost single core operation.

I can easily get round it by making the instances of TRandGen local to TSim. But why does the standard FPC generator, which is global in unit systems, not seem to cause this bottleneck ?

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 674
Re: Thread performance with global objects
« Reply #1 on: January 12, 2016, 08:32:30 pm »
I don't know where the performance difference could come from, unless you added a critical section. FPC's standard random number generator is not protected by a critical section, so you cannot safely use it from multiple threads at the same time.

Nitorami

  • Sr. Member
  • ****
  • Posts: 368
Re: Thread performance with global objects
« Reply #2 on: January 14, 2016, 09:56:11 am »
Thank you. Is that generally true for system functions ? For instance when copying memory sections using "move", would I have to protect that by a Critical Section ?

As regards performance, I found that it drops severely if several threads try to write to the same global variables concurrently. I had not intended that, but it seems that the constants and initialized constants within my Rand function are global, no local as I thought, although the function is encapsulated within an object. That is a pitfall to remember, and in my case not only a performance issue but a fault, because instances of the generator should be independent.

Code: Pascal  [Select]
  1. function TRandGen.MWC256: dword; //not thread safe
  2. // MWC256 from Usenet posting by G. Marsaglia - Period 2^8222
  3. var   t : qword;
  4. const c : dword = 362436; //global !
  5.       i : byte  = $FF;  //global !
  6. begin
  7.   inc (i);
  8.   t := qword (809430660) * Q[i] + c;
  9.   c      := hi (t);
  10.   Q[i]   := lo (t);
  11.   result := lo (t);
  12. end;

 

Leledumbo

  • Hero Member
  • *****
  • Posts: 8114
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Thread performance with global objects
« Reply #3 on: January 14, 2016, 10:18:32 am »
I had not intended that, but it seems that the constants and initialized constants within my Rand function are global, no local as I thought, although the function is encapsulated within an object.
Indeed, otherwise typed constants can't remember their values between function calls ;) Use initialized variable instead if you want to keep them local.

Nitorami

  • Sr. Member
  • ****
  • Posts: 368
Re: Thread performance with global objects
« Reply #4 on: January 14, 2016, 11:56:40 am »
Yes, of course. And my question on system.move was stupid, obviously it would cause problems to use move on the same memory location in several threads... think I had blinkers on.

Still, is there a way to know which library functions are thread safe and which are not ? The documentation does not seem to tell.

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: Thread performance with global objects
« Reply #5 on: January 14, 2016, 12:19:22 pm »
Why would use of global constants make it not thread safe?

edit: I re-read the code, it seems he is modifying the values in code.
« Last Edit: January 14, 2016, 12:21:37 pm by User137 »

taazz

  • Hero Member
  • *****
  • Posts: 5363
Re: Thread performance with global objects
« Reply #6 on: January 14, 2016, 12:37:16 pm »
2 generic rules on thread safe functions
1) if it is not stated in the documents that it is thread safe then it is not.
2) if it accesses global variables it is not.

Keep in mind that when it comes to multi threading access to the code is mandatory your question is a good example of what not to do, ee "why my code is not as fast as yours" with out code to look at is not going to get much attention.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 9293
Re: Thread performance with global objects
« Reply #7 on: January 14, 2016, 01:09:54 pm »
The above goes for any compiler, not just FPC... Thread safe programming is not easy.
also related to equus asinus.

Nitorami

  • Sr. Member
  • ****
  • Posts: 368
Re: Thread performance with global objects
« Reply #8 on: January 14, 2016, 02:20:04 pm »
@taazz: Thank you. To me, the question was of a principle nature rather than "where is the bug in my code". I feel I put it correctly, because I got responses that helped me understand the issue.

@User137: "use" of global variables is not the problem, of course threads may read global variables. Writing to global variables is the problem.

argb32

  • Jr. Member
  • **
  • Posts: 78
    • Pascal IDE based on IntelliJ platform
Re: Thread performance with global objects
« Reply #9 on: January 14, 2016, 02:51:38 pm »
But that is not the issue. What puzzled me is that the global RandGen seems to cause a bottleneck, and performance breaks down to almost single core operation.

This may be caused by the fact that there are writes to the same memory block each time. CPU should load the values from memory and can't use L1 cache to ensure cache coherency. The issue may be hidden by storing the variables in registers by compiler but it seems not the case but may be the case with standard random().
Using own random generator with encapsulated state for each thread is best solution.

taazz

  • Hero Member
  • *****
  • Posts: 5363
Re: Thread performance with global objects
« Reply #10 on: January 14, 2016, 03:10:54 pm »
@taazz: Thank you. To me, the question was of a principle nature rather than "where is the bug in my code". I feel I put it correctly, because I got responses that helped me understand the issue.
Your feeling is wrong. The only thing that question is good is if you were thinking out loud.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64