Recent

Author Topic: How to debug critical sections? (equal use of Enter and Leave)  (Read 3208 times)

piola

  • Full Member
  • ***
  • Posts: 108
  • Lazarus 2.2, 64bit on Windows 8.1 x64
How to debug critical sections? (equal use of Enter and Leave)
« on: January 13, 2022, 11:24:39 pm »
Hello,

in my program, I have hundreds of getters and setters like this:

Code: Pascal  [Select][+][-]
  1. function TGameSession.GetX;
  2. begin
  3.   FLock.Enter; // type(FLock) = SyncObjs.TCriticalSection
  4.   try
  5.     ... do something ...
  6.   finally
  7.     FLock.Leave;
  8.   end;
  9. end;
  10.  

Occasionally I get deadlocks. I'm 99% sure that they are due to a mistake in Enter/Leave. I already found some occurrences where I accidentally used FLock.Enter instead of FLock.Leave. I have already created a descendent class of TCriticalSection which informs me if I use more Leaves than Enters.

However, I have no idea how to handle the other round way where more Enters than Leaves are used.

Someone sketched an idea how to check for proper usage of critical sections but I can't quite figure out how to implement his idea.

If anyone pointed me into the right direction, this would be a valuable help.

PS: Checking for "lock count = 0" at the beginning of Enter won't work because the critical section is sometimes part of a chain of multiple nested calls.

I can store the current lock count within Enter. What I need is some kind of automatism that gets called whenever the function or procedure is being left because that's the only point where I can check the correctness of the lock count for sure.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7885
  • Debugger - SynEdit - and more
    • wiki
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #1 on: January 14, 2022, 12:30:55 am »
IF it in "only" double enter/leave....

You could use some logging, with indent.

Then if you log like
Code: Text  [Select][+][-]
  1. ->Enter called by "foo"
  2. -->Enter called by "bar"
  3. --<Leave called by
  4.  
You would see, where it becomes unbalanced. With too many enters the indent will grow.

You do need to log from where it was called. Then you can count
- how often each caller did enter,
- and how often each caller did leave
You can do that without the indent too...



But to many Enter will not Deadlock.
They will just lock all other threads. The thread that did (over-)enter will keep going.

Deadlocks often happen due to: Wrong lock order.

If you have more than one lock (more than one critical section object.

And you have sometimes:
  Lock1.Enter;
  Lock2.Enter;
and other times
  Lock2.Enter;
  Lock1.Enter;

Then you can get

Code: Text  [Select][+][-]
  1. Thread-1:  Lock1.Enter;
  2. Thread-2:  Lock2.Enter;
  3. Thread-2:  Lock1.Enter;  // wait....
  4. Thread-1:  Lock2.Enter;  // wait..., and since Thread2 waits for Lock1, it wont ever become available
  5.  

All locks must always enter in the same order.

In the above example, you could in Lock1 (when it is entered non-nested) add a check, that lock2 has not been entered (by the same thread).
Using "threadvar" you can for each lock have a variable that counts, if that lock has been entered by the current thread.



If you are on Linux, there is a great tool called valgrind.
It has 2 very similar tools to analyse your threaded code.
https://valgrind.org/docs/manual/hg-manual.html
https://valgrind.org/docs/manual/drd-manual.html

However, reading and understanding the output of those 2 tools requires time. (And googling).
And, it requires to filter out a lot of false positives.

Also, it does NOT tell you when you enter to often. It can tell you when you Leave to often.
And it can tell you, when you get the order mixed up.

piola

  • Full Member
  • ***
  • Posts: 108
  • Lazarus 2.2, 64bit on Windows 8.1 x64
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #2 on: January 16, 2022, 05:51:46 pm »
Thanks for your suggestion. I tried that (with indents) but because I needed the name of the calling method, thus calling BackTraceStrFunc each time, this extremely slowed down program execution.

I ended up writing a small tool which reads the source code files and performs some checks about the correct usage of .Enter and .Leave. And it found 2 occurrences! Hopefully all :)

MarkMLl

  • Hero Member
  • *****
  • Posts: 4187
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #3 on: January 18, 2022, 10:54:39 pm »
I echo everything that Martin says, and in addition suggest that (a) you reimplement the critical section object (or whatever) with a counter so that you can trap multiple-entry with an assertion, and (b) that in all cases you have a "backdoor" so that where you /know/ that it's safe you can bypass the lock/release.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

piola

  • Full Member
  • ***
  • Posts: 108
  • Lazarus 2.2, 64bit on Windows 8.1 x64
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #4 on: January 25, 2022, 11:10:05 pm »
The problem is not entering (I can track this quite well) but leaving. Or better say: forgotten to leave. This happens from time to time because I always type the "TRY FINALLY END" en bloc and add the parts between TRY and FINALLY and between FINALLY and END later. And sometimes I forget the latter -> missing FLock.Leave.

But anyway, thanks for all your help.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7885
  • Debugger - SynEdit - and more
    • wiki
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #5 on: January 25, 2022, 11:43:50 pm »
Well, here is how I solved a similar issue. It's quite some work to setup though.

Look at the FpDebug package (unit FpPascalParser).

Code: Pascal  [Select][+][-]
  1.   FSourceTypeSymbol.AddReference{$IFDEF WITH_REFCOUNT_DEBUG}(@FSourceTypeSymbol, 'TPasParserSymbolValueMakeReftype'){$ENDIF};
  2.  

AddReference and ReleaseReference have to be in pairs (same problem you have). Each of the calls have a unique name.

If an outer caller is responsible, I opted for a way to rename the entry: "Result.DbgRenameReference".
But it would also be possible to pass the name in over several calls....

The code that deals with it is in LazUtils/LazClasses.
It tracks all the calls => you might want to track them in a global list. Then you can at some point dump that list to a file. And you would see which "enter" have not been matched by a "leave".

Instead of keeping track in a list, you can immediately lock the enter/leave names to a file. Just count the amount of occurrences in that file.

It all relies, on passing in the required unique names, so you can identify the caller. 

----------------------
You can try to log stack traces (address only / names take way to long). But even address only can be slow....
But you then need tools (like gdb and self written scripts) to resolve the addresses and find unmatched entries....) So in the end, that is a lot of work too.



440bx

  • Hero Member
  • *****
  • Posts: 2767
Re: How to debug critical sections? (equal use of Enter and Leave)
« Reply #6 on: January 26, 2022, 12:13:26 am »
The problem is not entering (I can track this quite well) but leaving. Or better say: forgotten to leave. This happens from time to time because I always type the "TRY FINALLY END" en bloc and add the parts between TRY and FINALLY and between FINALLY and END later. And sometimes I forget the latter -> missing FLock.Leave.

But anyway, thanks for all your help.
I see you are using Windows.  There is a solution that after some "setup" works nicely for that kind of problem and others that need balanced enter/leave, acquire/release and so on.  The method is to have a piece of code that traps the APIs you're interested in monitoring and either logs them or displays the enter/leave on a console.  As you pointed out, the hard part is usually figuring out that a leave/release is missing. If you trap APIs then what you do is select another API that you know should only be called when the resource (lock or whathaveyou) has been released. That way you have a "bracketed" set of enter/leave and you can see where a leave is missing. 

Basically, what Martin_fr above was saying (logging) but using/trapping the API(s) directly. 

HTH.
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

 

TinyPortal © 2005-2018