Recent

Author Topic: InterlockedExchange v. CriticalSection  (Read 4682 times)

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #30 on: July 16, 2023, 03:13:11 pm »
Stretching it, hu?.  CPU affinity is not normally a concern.  A core is available for work and there is a thread needing a core, the marriage is made.   (I do use affinity on some board projects where I reserve 1 or 2 cores for certain tasks and relegate all others to the remaining cores, but these are for very narrow cases where input data senescence is a concern).
With CPU affinity I do not mean CPU pinning, I mean that the OS will try to keep threads running on the same CPU to avoid cache invalidation. When a task is re-scheduled it is not simply assigned the first CPU thats available, if it has run before on one CPU the OS will preferr continuing that task on the same CPU sometimes at the cost of other already waiting tasks.

If you look at long running tasks taking a lot of CPU time you will observe that they will mostly always be rescheduled on the same CPU. Unless there is heavy load on the system, all long running tasks will basically get their own CPU, while short running/often breaking tasks will be sharing the remaining ones.

For example, I have a Threadripper with 24 CPUs, when I run any loop with Sleep(0) it will pretty much always get one CPU and solely spin that up to 100%, because I never have more other High CPU usage tasks, such that the OS would decide to break that process up, even though with "Sleep(0)" if it was just first come first serve it should. It needs around 20-22 other high CPU usage long running tasks before the OS will start interrupting the Sleep(0) for another task.

High CPU usage tasks, including Sleep(0) loops, will only be interrupted if there is a high load. So in effect they basically get a higher priority by the OS scheduler than tasks that get regularly interrupted. This is something you always need to keep in mind when writing a Sleep(0) loop. It will never starve other processes, but sd I already said, it will spin up the CPU to 100% and this has some potentially negative side effects (like energy usage, or simply noise level from spinning up the fan)

A lovely quibble that I don't care about for my needs.  Thanks.
« Last Edit: July 16, 2023, 03:36:31 pm by AlanTheBeast »
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

440bx

  • Hero Member
  • *****
  • Posts: 6524
Re: InterlockedExchange v. CriticalSection
« Reply #31 on: July 16, 2023, 03:22:45 pm »
Well, I'm doing these experiments on a Mac and the actual application is OS-less RTL on a Raspberry Pi ... so not sure how Windows examples will play out.  I'll go grab them and see what's there.

Thanks.
You're welcome.  However, I'm afraid that since you're on a Mac and an R-Pi, the examples' usefulness will likely be very limited at best.

Also, to compile them you'll need an installation of Visual Studio which takes some space.

I'd say it's probably more work than the examples are worth for your platforms of interest but, I'll let you be the ultimate judge of that.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 7755
Re: InterlockedExchange v. CriticalSection
« Reply #32 on: July 16, 2023, 03:32:59 pm »
I noticed the InterlockedExchange isn't intrinsic which should simply with 1 or 2 lines of code inserted in the ASM stream.

At least on a Intel type processor.

 I should look at Delphi, it seems that I've done this before and remember seeing it inlined!

The only true wisdom is knowing you know nothing

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #33 on: July 16, 2023, 03:39:19 pm »
I noticed the InterlockedExchange isn't intrinsic which should simply with 1 or 2 lines of code inserted in the ASM stream.

At least on a Intel type processor.

 I should look at Delphi, it seems that I've done this before and remember seeing it inlined!

No telling what happens in FPC_INTERLOCKEDEXCHANGE, didn't inline it.
(Compiled on a Mac (FPC  3.2.0)).

Code: ASM  [Select][+][-]
  1.  # [46] While InterlockedExchange (LCL, 1)>0 do sleep(0);
  2.         jmp     Lj14
  3.         .align 2
  4. Lj13:
  5.         xorl    %edi,%edi
  6.         call    _SYSUTILS_$$_SLEEP$LONGWORD
  7. Lj14:
  8.         leaq    _U_$P$THR_$$_LCL(%rip),%rdi
  9.         movl    $1,%esi
  10.         call    FPC_INTERLOCKEDEXCHANGE
  11.         testl   %eax,%eax
  12.         ja      Lj13
  13.  
« Last Edit: July 16, 2023, 03:43:02 pm by AlanTheBeast »
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

jamie

  • Hero Member
  • *****
  • Posts: 7755
Re: InterlockedExchange v. CriticalSection
« Reply #34 on: July 16, 2023, 04:28:14 pm »
if you insert a ASM Xhcg .... End instruction it will lock the bus during the xchange.
The only true wisdom is knowing you know nothing

Leledumbo

  • Hero Member
  • *****
  • Posts: 8836
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: InterlockedExchange v. CriticalSection
« Reply #35 on: July 16, 2023, 05:02:06 pm »
I noticed the InterlockedExchange isn't intrinsic which should simply with 1 or 2 lines of code inserted in the ASM stream.

At least on a Intel type processor.

 I should look at Delphi, it seems that I've done this before and remember seeing it inlined!
There's only one definition when it's marked as inline:
Code: Pascal  [Select][+][-]
  1. $ grep -i 'function InterLockedExchange ' -rn * | grep inline
  2. inc/systemh.inc:1566:function InterlockedExchange (var Target: Pointer;Source : Pointer) : Pointer; {$ifndef FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER}inline;{$endif}
  3.  
that is, when both parameters are pointers. Even so, the symbol FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER must not be defined (if it is, then the respective system has its own implementation instead of a generic one).

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12888
  • FPC developer.
Re: InterlockedExchange v. CriticalSection
« Reply #36 on: July 16, 2023, 05:32:51 pm »
I noticed the InterlockedExchange isn't intrinsic which should simply with 1 or 2 lines of code inserted in the ASM stream.

At least on a Intel type processor.

 I should look at Delphi, it seems that I've done this before and remember seeing it inlined!

Only in very old versions maybe, before NUMA x86. but since afaik neither Delphi nor FPC inline assembler functions that is somewhat doubtful anyway.

But newer versions also support server processors that support multiple memory systems which need to be over an API call, so that the NUMA aware kernel can correct it. If you have a non NUMA kernel it simply points to the x86 primitive.

However I believe the function also corrects some cases when the lock goes over a page border, by making sure that both pages are paged in when the lock executes. So there might be a simple AND testing around it additionally.

« Last Edit: July 16, 2023, 06:06:47 pm by marcov »

AlanTheBeast

  • Sr. Member
  • ****
  • Posts: 407
  • My software never cras....
Re: InterlockedExchange v. CriticalSection
« Reply #37 on: July 16, 2023, 05:34:06 pm »
if you insert a ASM Xhcg .... End instruction it will lock the bus during the xchange.

... sorta hoping the compiler takes care of these details ...  IAC XCHG itself doesn't solve the whole problem (I think)


There's only one definition when it's marked as inline:
Code: Pascal  [Select][+][-]
  1. $ grep -i 'function InterLockedExchange ' -rn * | grep inline
  2. inc/systemh.inc:1566:function InterlockedExchange (var Target: Pointer;Source : Pointer) : Pointer; {$ifndef FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER}inline;{$endif}
  3.  
that is, when both parameters are pointers. Even so, the symbol FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER must not be defined (if it is, then the respective system has its own implementation instead of a generic one).

Is there somewhere that lists what systems/CPU's have FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER ?
Everyone talks about the weather but nobody does anything about it.
..Samuel Clemens.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12888
  • FPC developer.
Re: InterlockedExchange v. CriticalSection
« Reply #38 on: July 16, 2023, 06:06:53 pm »
Is there somewhere that lists what systems/CPU's have FPC_SYSTEM_HAS_EXPLICIT_INTERLOCKED_POINTER ?

Old OSes that don't support Numa probably.

 

TinyPortal © 2005-2018