Recent

Author Topic: InterlockedExchange v. CriticalSection  (Read 4702 times)

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
InterlockedExchange v. CriticalSection
« on: July 15, 2023, 07:36:02 pm »
Testing some "fool around code" and while debugging, tried both InterlockedExchange and CriticalSection.  (Seeing how many threads it takes to bring my Mac to a grinding halt*).

I got the impression during tests that CS took longer than IE.

Question really is: What criteria would compel me to choose CS over IE?  (or v-v for that matter)?



* I haven't adjusted the memory (stack) allocation per thread - it is probably very excessive in my test case.
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

Red_prig

  • Full Member
  • ***
  • Posts: 153
Re: InterlockedExchange v. CriticalSection
« Reply #1 on: July 15, 2023, 08:03:22 pm »
Depending on what you need, CriticalSection is a kernel object (because it is slower) but at the same time it has the advantage of being recursive within a single thread, in the case of InterlockedExchange it is an instric and is a lower level operation and you need to use it correctly and carefully, you can say this difference of approaches: locking based vs lock-free

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12430
  • Debugger - SynEdit - and more
    • wiki
Re: InterlockedExchange v. CriticalSection
« Reply #2 on: July 15, 2023, 08:22:36 pm »
Not all CPU may have InterlockedExchange. If so then FPC likely does a CriticalSection....
Anyway...


There are things that InterlockedExchange can't do. Like protecting an entire System call (maybe to create/update a file).

You can do your own CS, by using InterlockedExchange, and doing a spin lock. But if the wait is longer, then a CS means your thread can sleep instead of using CPU time.


Sometimes you may need even less than an InterlockedExchange.
You may just need a read or write boundary.

And sometimes you can just normal write memory, and another thread will read it eventually.
« Last Edit: July 15, 2023, 08:24:15 pm by Martin_fr »

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #3 on: July 15, 2023, 08:46:45 pm »
CriticalSection is a kernel object (because it is slower) but at the same time it has the advantage of being recursive within a single thread

As I usually do real-time-ish code, recursion is something I avoid.  Indeed I can't fathom how/why I'd re-call a function defined to be thread from within the scope of the thread (or its subs).

I guess the key advantage with CS is that I've "given up control" to the OS until the lock is obtained whereas with IE, I have to manage the result of the function return and do so correctly.

, in the case of InterlockedExchange it is an instric and is a lower level operation and you need to use it correctly and carefully ...

instric? Intrinsic?

I typically enter/exit with something like:
Code: Pascal  [Select][+][-]
  1. While InterlockedExchange (f1LCL, 1)>0 do sleep(0);
  2. <stuff>
  3. InterlockedExchange(f1LCL,0);
  4.  
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12430
  • Debugger - SynEdit - and more
    • wiki
Re: InterlockedExchange v. CriticalSection
« Reply #4 on: July 15, 2023, 08:53:36 pm »
On Intel/AMD InterlockedExchange  is not a function call. It is a single assemble instruction.

Hence it is on the faster side...
The CPU still needs to lock the bus, and other threads on other cores may have to wait. But it will be for a single instruction only.


AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #5 on: July 15, 2023, 08:56:58 pm »
Not all CPU may have InterlockedExchange. If so then FPC likely does a CriticalSection....
On Intel/AMD InterlockedExchange  is not a function call. It is a single assemble instruction.

FPC for Mac (x86) gives:
Code: Pascal  [Select][+][-]
  1. # [42] While InterlockedExchange (LCL, 1)>0 do sleep(0);
  2.         jmp     Lj14
  3.         .align 2
  4. Lj13:
  5.         xorl    %edi,%edi
  6.         call    _SYSUTILS_$$_SLEEP$LONGWORD
  7. Lj14:
  8.         leaq    _U_$P$THR_$$_LCL(%rip),%rdi
  9.         movl    $1,%esi
  10.         call    FPC_INTERLOCKEDEXCHANGE
  11.         testl   %eax,%eax
  12.         ja      Lj13
  13.  

Suggests it is InterlockedExchange due to the Lj13 loop. (It doesn't stay down in FPC_INTERLOCKEDEXCHANGE as it would for CS).
IAC, since It's waiting for a state change, it has to come back up in order to enter the sleep(0).

Whereas for CS:

Code: Pascal  [Select][+][-]
  1. # [41] EnterCriticalSection(_F1CritLock);
  2.         leaq    _U_$P$THR_$$_F1CritLock(%rip),%rdi
  3.         call    _SYSTEM_$$_ENTERCRITICALSECTION$TRTLCRITICALSECTION
  4.  

It stays down there until released.

There are things that InterlockedExchange can't do. Like protecting an entire System call (maybe to create/update a file).

You can do your own CS, by using InterlockedExchange, and doing a spin lock. But if the wait is longer, then a CS means your thread can sleep instead of using CPU time.

I code it as:
      While InterlockedExchange (LCL, 1)>0 do sleep(0);

Where sleep (0) could also be sleep (10) or 100 or whatever - typically sleep(0) as then the threadmanager can give the CPU to whatever needs it rather than no sleep at all.




Sometimes you may need even less than an InterlockedExchange.
You may just need a read or write boundary.

And sometimes you can just normal write memory, and another thread will read it eventually.

If I'm using it it's because the some variable risks being written by different executing threads.  If that variable is, for example, a counter or queue, then it must be protected.
« Last Edit: July 15, 2023, 09:09:32 pm by AlanTheBeast »
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

Red_prig

  • Full Member
  • ***
  • Posts: 153
Re: InterlockedExchange v. CriticalSection
« Reply #6 on: July 15, 2023, 08:57:16 pm »
Recursive locking is very convenient when you just write a ton of different code and a bunch of interconnected components, in small programs it makes no sense that's right

Red_prig

  • Full Member
  • ***
  • Posts: 153
Re: InterlockedExchange v. CriticalSection
« Reply #7 on: July 15, 2023, 08:59:12 pm »
There are queuing algorithms for passing data between two threads that only need memory barriers, probably what martin is talking about

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12430
  • Debugger - SynEdit - and more
    • wiki
Re: InterlockedExchange v. CriticalSection
« Reply #8 on: July 15, 2023, 08:59:45 pm »
Quote
Code: Pascal  [Select][+][-]
  1. While InterlockedExchange (f1LCL, 1)>0 do sleep(0);

Not tested, but since this is not guaranteeing an order, if many are waiting - You could probably do
Code: Pascal  [Select][+][-]
  1. repeat
  2.   While f1LCL>0 do sleep(0);
  3. until InterlockedExchange (f1LCL, 1)=0;
  4.  

That way, you don't need to execute locked statements until you have a chance. But someone else may snatch the chance.

Also, it must be ensured, that the loop does not loose the memory read to optimization....

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12430
  • Debugger - SynEdit - and more
    • wiki
Re: InterlockedExchange v. CriticalSection
« Reply #9 on: July 15, 2023, 09:02:49 pm »
There are queuing algorithms for passing data between two threads that only need memory barriers, probably what martin is talking about

Without InterlockedExchange, I can only think of solutions where you have
- Exactly 1 dedicated thread that reads "the receiver"
- Exactly 1 dedicated thread that writes "the sender"

Then I think it is possible lock free, but you need the Read/Write barrier. (But that has no impact on other threads / so it has the least possible slow down)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12430
  • Debugger - SynEdit - and more
    • wiki
Re: InterlockedExchange v. CriticalSection
« Reply #10 on: July 15, 2023, 09:09:45 pm »
Quote
Code: Pascal  [Select][+][-]
  1. While InterlockedExchange (f1LCL, 1)>0 do sleep(0);

Not tested, but since this is not guaranteeing an order, if many are waiting - You could probably do
Code: Pascal  [Select][+][-]
  1. repeat
  2.   While f1LCL>0 do sleep(0);
  3. until InterlockedExchange (f1LCL, 1)=0;
  4.  

That way, you don't need to execute locked statements until you have a chance. But someone else may snatch the chance.

Also, it must be ensured, that the loop does not loose the memory read to optimization....

Also (and I may have that wrong, it some time since I last used any of this)

Within the thread that currently holds the lock (that has set f1LCL:=1 / and that is expected to set it to 0.
You can do
Code: Pascal  [Select][+][-]
  1. WriteBarrier;
  2. f1LCL := 0;

You don't need InterLocked => no other thread will write.

You need the WriteBarrier => or the cpu may change the order of things and do the write way earlier than you intended.

I don't know if there could be issues with the value sitting in cache for a certain time...
Especially if you got multiply CPU (not cores, but actual CPU)

Warfley

  • Hero Member
  • *****
  • Posts: 2067
Re: InterlockedExchange v. CriticalSection
« Reply #11 on: July 15, 2023, 09:10:36 pm »
I typically enter/exit with something like:
Code: Pascal  [Select][+][-]
  1. While InterlockedExchange (f1LCL, 1)>0 do sleep(0);
  2. <stuff>
  3. InterlockedExchange(f1LCL,0);
  4.  

The reason why this is very fast is simple, when you sleep 0 and there is no load on the system, the scheduler will not take the process of the cpu, so in effect you've just implemented as spin lock

If your system is on load on the other hand, then your process may get rescheduled during that sleep, in which case you get a context switch and this takes some time. AFAIK do critical sections always put your process at sleep, meaning you will always have the scheduling and context switch times.

What you could do is test the code above against a spinlock (using pthreads or winapi critical section), or use a sleep > 0. Alternatively you could pin your process to a CPU and have another process pinned on the same CPU putting load onto it such that also the sleep(0) will cause a re-sheduling. The performance advantage may vanish then
« Last Edit: July 15, 2023, 09:13:00 pm by Warfley »

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #12 on: July 15, 2023, 09:19:31 pm »
Within the thread that currently holds the lock (that has set f1LCL:=1 / and that is expected to set it to 0.
You can do
Code: Pascal  [Select][+][-]
  1. WriteBarrier;
  2. f1LCL := 0;

You don't need InterLocked => no other thread will write.

You need the WriteBarrier => or the cpu may change the order of things and do the write way earlier than you intended.

I don't know if there could be issues with the value sitting in cache for a certain time...
Especially if you got multiply CPU (not cores, but actual CPU)


What!   :o

I've never seen this: Write/ReadBarrier before. 

I just looked at the description ( https://www.freepascal.org/docs-html/rtl/system/readdependencybarrier.html ) and it's not exaclty clear what is happening.

What is the scope of it?  The one following line of code?

Really ambiguous to me.
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8572
Re: InterlockedExchange v. CriticalSection
« Reply #13 on: July 15, 2023, 09:24:19 pm »
Question really is: What criteria would compel me to choose CS over IE?  (or v-v for that matter)?

A CS prevents multiple threads calling the same piece of code.

IE can be used to prevent the same thread reentering a piece of code.

If you call- for example- an event handler which then calls a chain of functions, and one of those does an APM, then your event handler might be re-entered.

And finding that bug will be a fossicking nasty job if you're not primed for it. Ask me how I know.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #14 on: July 15, 2023, 09:25:29 pm »
I typically enter/exit with something like:
Code: Pascal  [Select][+][-]
  1. While InterlockedExchange (f1LCL, 1)>0 do sleep(0);
  2. <stuff>
  3. InterlockedExchange(f1LCL,0);
  4.  

The reason why this is very fast is simple, when you sleep 0 and there is no load on the system, the scheduler will not take the process of the cpu, so in effect you've just implemented as spin lock


Sleep(0) as I understand it, will yield to the next executable thread.

If I wanted a SpinLock, I would have simply done        While .... do ;

As stuff I do usually has other things to do, I usually sleep(0) to give other things the opportunity to do what they need to do.
« Last Edit: July 15, 2023, 09:28:59 pm by AlanTheBeast »
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

 

TinyPortal © 2005-2018