Recent

Author Topic: Finishing thread does not release Virtual memory  (Read 4932 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 7865
  • Debugger - SynEdit - and more
    • wiki
Re: Finishing thread does not release Virtual memory
« Reply #15 on: January 11, 2022, 10:25:13 pm »
Ah, I forgot something, for valgrind you need to use "CMem". Afaik if you did specify the "... for valgrind (-gv)" option, then that is done automatically.
Otherwise you have to do that yourself (and then no heaptrc)



==27332==    still reachable: 15,497 bytes in 149 blocks
==27332==         suppressed: 0 bytes in 0 blocks
Afaik, "still reachable" is memory that was released, but only when the program terminated.
Some of this would be expected. Question is, does the number grow, if the program runs for longer? I.e. does your app accumulate more and more memory that it only frees at exit?

Mind that leaks can be in libraries too. 10 years ago, I had a webserver with mem issues, turned out openssl was leaking (mind you 10 years, so probably not what you have now).

Quote
I need more time to understand it, but most of them are in libcrypto (openssl) and libc. I don't know what I can do with them, and whether it can keep the reserved large virtual memory locked for few Kbyte of still
Well, the first issue is, if the grow (in count and accumulated byte-size), if you run the app for longer.

It is very well possible that some code "leaks" something when it is initialized. If that memory is indeed needed for the entire run time, and if it is only taken just once, then that is not an issue.
But if over time, it gets more...

It can be an issue in the library, but it can also be how it is called...


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Finishing thread does not release Virtual memory
« Reply #16 on: January 11, 2022, 11:12:02 pm »
I need more time to understand it, but most of them are in libcrypto (openssl) and libc

You might have a leak in your usage of libCrypto. Any function that gives a big number, for instance,, you are supposed to free the memory when done with that number. Like BN_bin2bn, SRP_calc_u..etc

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 963
Re: Finishing thread does not release Virtual memory
« Reply #17 on: January 12, 2022, 08:18:22 am »
@Jonas
Thanks, it might be leading to the solution. Two problems:
I do not use TThread, but the one level lower BeginThread, so I would need to call whatever has to be called manually. It would not be a problem if I knew what to call. I also have my own Wait implementation, so that is not a problem either.
I check the TThread implementation and FFreeOnTermiante I found only once in ThreadProc, and it calls Free (i.e. Destroy), but in Destroy I did not find what to do. Can you help?
We indeed don't have the functional equivalent for detaching a thread with the procedural interface. However, just to make sure, are you calling waitforthreadterminate on all of (non-detached) your threads? Otherwise they will indeed linger on forever.

jollytall

  • Full Member
  • ***
  • Posts: 205
Re: Finishing thread does not release Virtual memory
« Reply #18 on: January 12, 2022, 10:06:22 am »
Thanks for the many useful comments, replies. I try to answer them and report my current progress:

Ah, I forgot something, for valgrind you need to use "CMem". Afaik if you did specify the "... for valgrind (-gv)" option, then that is done automatically.
Otherwise you have to do that yourself (and then no heaptrc)
Yes, you are right. Specifying -gv adds cmem. Trying to add cmem manually results in a Duplicate Identifier error.

While we are at this point, a bit OFF spin-off question, that bothers me. It is always said (and indeed necessary) to put cmem and cthread to the top of the uses lists. I guess it is probably because their Initialization section does some initialization that is needed before other units start to initialize themselves. However I also thought that some of these special units redeclare some methods to provide extra functionality. The issue is that when two units have the same method then always the last one is used not the first. How is it managed for these special units, or they do not overwrite any existing method?



Afaik, "still reachable" is memory that was released, but only when the program terminated.
Some of this would be expected. Question is, does the number grow, if the program runs for longer? I.e. does your app accumulate more and more memory that it only frees at exit?
No, this is not the case. Actually, this is only possible if one makes some special efforts, e.g. a dynamic array with pointers or linked objects, but I use none of them. Under "normal" circumstances it cannot happen, because by definition "Still reachable" mean that their pointer (i.e. the variable holding that pointer) still points to the reserved memory area. Having a fix number of variables mean that the amount of Still reachable cannot grow unlimited.

To make things even stranger, I analyzed my program a lot and found that when I only start and stop it, it has 13 blocks (5243 bytes) still reachable, but if I use it and exit after that, then this number goes down to 9 (3597). I am trying to check where the difference comes from.



Mind that leaks can be in libraries too. 10 years ago, I had a webserver with mem issues, turned out openssl was leaking (mind you 10 years, so probably not what you have now).
The openssl problem I found. What happened was, I had a class level variable to store the SSL Context. When a new thread (object running in the thread) was created, I checked if the class var was already set or not. If not, it was initialized. The problem was that threads were started too close to each other without control to make it thread safe and so the same initialization was called multiple times. Adding a CriticalSection solved it.



Well, the first issue is, if the grow (in count and accumulated byte-size), if you run the app for longer.

It is very well possible that some code "leaks" something when it is initialized. If that memory is indeed needed for the entire run time, and if it is only taken just once, then that is not an issue.
But if over time, it gets more...

It can be an issue in the library, but it can also be how it is called...
I could make a minimal example (see later) to show what grows and how.



You might have a leak in your usage of libCrypto. Any function that gives a big number, for instance,, you are supposed to free the memory when done with that number. Like BN_bin2bn, SRP_calc_u..etc
Thanks God it was not the case. It is only that it was started multiple times parallel without thread control (see above).



Wait..wait..waitfor! Use waitfor as already mentioned.
We indeed don't have the functional equivalent for detaching a thread with the procedural interface. However, just to make sure, are you calling waitforthreadterminate on all of (non-detached) your threads? Otherwise they will indeed linger on forever.
Where shall I put it? See the example below that (a) is copied (and further simplified) from the manual (https://www.freepascal.org/docs-html/prog/progse44.html) (b) demonstrates the problem and (c) does not have waitfor in it (maybe this is why it has the same problem).



So, the minimal program, I can show the problem with is:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}
  4.  
  5. uses
  6.   cthreads,
  7.   sysutils;
  8.  
  9. function InThread(p : pointer) : Int64;
  10.   begin
  11.   Sleep(Random(50));
  12.   result:=0;
  13.   end;
  14.  
  15. var i : longint;
  16. begin
  17. for i:=1 to 100 do
  18.   begin
  19.   BeginThread(@InThread,nil);
  20.   Sleep(Random(50));
  21.   end;
  22. Sleep(600);
  23. writeln('Finished');
  24. readln;
  25. end.
After 100 run it uses 659.4MB VIRT, 2.6MB RES and 1.7MB SHR memory. If I run it 200 times, the memory usage goes up to 1059.8MB/3.4MB/1.7MB. With other words one extra thread blocks 4MB VIRT and 8KB RES.
Valgrind shows 100blocks (i.e. the number of threads) and 27200 bytes (i.e. 272byte/thread) possible lost.
These numbers bother me.

Where shall I put the waitfor? Will it stop this leak?

Thanks again, and sorry for the long post.

jollytall

  • Full Member
  • ***
  • Posts: 205
Re: Finishing thread does not release Virtual memory
« Reply #19 on: January 12, 2022, 12:46:13 pm »
It seems I have the solution. Thank to all who helped.

The solution has two parts.

It is true, that there was a memory leak. Although it was reported as a "possibly lost" I think it was really a "still reachable". What happens, when the thread is created it gets a memory area. In the c library they reserve a bit larger space than needed for the array to have enough memory for the array length and reference count (-1, -2 index) and then moves the array pointer to the third block, i.e. element #0. This is why Valgrind reports it as a block that has a pointer pointing inside it (offset 16) and not at the first byte. So it thinks the pointer might have lost it.
But it did not. It was worse. The pointer was really there and "in use". As many of you pointed out, although the thread was gone, it left behind its pointer and the allocated memory. If the pointer had pointed to the first byte of the block, Valgrind would report it as "still reachable", clearly indicating that something is not released.
To release it, not Waitfor but EndThread was needed. In the above example, placing EndThread between lines 12 and 13 does the trick, and all memory leak is gone.

The other part of the story is the Virtual memory usage. It is clear that while the thread did not end correctly (as above), the virtual memory allocated was still reserved.
That is also true that an area once reserved is not necessarily released. The good news is though, that it is reused if available. So, what happens is, that the total VIRT memory used is capped by the maximum concurrent threads running at any point in the past. Apparently the OS manages it smart and when I tried to overload it with many threads the VIRT usage temporarily went up high, but then also decreased back to a reasonable level, i.e. the OS freed up some of the reserved memory addresses.

PascalDragon

  • Hero Member
  • *****
  • Posts: 4000
  • Compiler Developer
Re: Finishing thread does not release Virtual memory
« Reply #20 on: January 12, 2022, 01:43:04 pm »
While we are at this point, a bit OFF spin-off question, that bothers me. It is always said (and indeed necessary) to put cmem and cthread to the top of the uses lists. I guess it is probably because their Initialization section does some initialization that is needed before other units start to initialize themselves. However I also thought that some of these special units redeclare some methods to provide extra functionality. The issue is that when two units have the same method then always the last one is used not the first. How is it managed for these special units, or they do not overwrite any existing method?

Those don't override any methods, but instead set the managers (TMemoryManager in case of cmem and TThreadManager in case of cthreads) which are in turn used by functions like GetMem or BeginThread.

But it did not. It was worse. The pointer was really there and "in use". As many of you pointed out, although the thread was gone, it left behind its pointer and the allocated memory. If the pointer had pointed to the first byte of the block, Valgrind would report it as "still reachable", clearly indicating that something is not released.
To release it, not Waitfor but EndThread was needed. In the above example, placing EndThread between lines 12 and 13 does the trick, and all memory leak is gone.

Yes, one is supposed to use EndThread from the thread function. We should probably mention that in the documentation for BeginThread.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 963
Re: Finishing thread does not release Virtual memory
« Reply #21 on: January 12, 2022, 09:25:00 pm »
Quote
We indeed don't have the functional equivalent for detaching a thread with the procedural interface. However, just to make sure, are you calling waitforthreadterminate on all of (non-detached) your threads? Otherwise they will indeed linger on forever.
Where shall I put it?

You can put it wherever you want. You have to store the result from BeginThread in a TThreadID variable (e.g. an array of TThreadID), and then use WaitForThreadTerminate with those TThreadID values as parameter to reap the threads after they are finished. WaitForThreadTerminate is a blocking call (i.e., if the thread hasn't ended yet, the call will keep waiting until it does), but you can pass a timeout if you want.

Quote
Will it stop this leak?
It will definitely fix a leak you have when not doing this.

jollytall

  • Full Member
  • ***
  • Posts: 205
Re: Finishing thread does not release Virtual memory
« Reply #22 on: January 13, 2022, 08:29:05 am »
@Jonas,
I still do not understand this logic.

The one written above is clear. A Thread is started and when it finishes its activity, as a last step after it released all user allocated resources, it also releases the thread management allocated resources with an EndThread call.

What would be the purpose of WaitForThreadTerminate in this setup?
Surely, it would give an assurance to the caller (typically the main thread) that all threads finished their activity (e.g. closed all files they opened) and it is save to stop the whole process. This is important once in the lifetime of the program, at the very end.
Does it also make sure, that when the thread is terminated (joined) its resources are also freed? Reading the source code, it seems (without lengthy analysis) that it only calls the pthread_join system call and in the manual of pthread_join I do not see any resource de-allocation. This would probably not even be possible (I am not sure!), since a terminated thread might still have resources allocated and managed later by the thread it joined to.
Also, it would not be practical to have either the main thread blocked during the normal operation of the program (i.e. not at the very end) to see when a thread terminates (hundreds of threads with various running lengths), or to make a separate "thread managing thread" :).
As written above, the EndThread does the trick perfectly, without any blocking operation in the main thread. (And yes, I have a separate mechanism to wait for all the threads at the very end of the run of the program, not to kill them in the middle of their operation, but that is another topic).

Last to note, it is always a blocking operation on *nix operating systems; pthread_join has no timeout parameter.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 963
Re: Finishing thread does not release Virtual memory
« Reply #23 on: January 13, 2022, 11:32:35 am »

@Jonas,
I still do not understand this logic.

The one written above is clear. A Thread is started and when it finishes its activity, as a last step after it released all user allocated resources, it also releases the thread management allocated resources with an EndThread call.

What would be the purpose of WaitForThreadTerminate in this setup?
Mainly synchronisation between the main thread and the child thread. Additionally, pthread_exit can return a value which can be captured by pthread_join (which is used by the FPC RTL to return the value passed to EndThread to WaitForThreadTerminate). This means the thread must keep state until WaitForThreadTerminate/pthread_join is called.

Quote
Does it also make sure, that when the thread is terminated (joined) its resources are also freed? Reading the source code, it seems (without lengthy analysis) that it only calls the pthread_join system call and in the manual of pthread_join I do not see any resource de-allocation. This would probably not even be possible (I am not sure!), since a terminated thread might still have resources allocated and managed later by the thread it joined to.
Also, it would not be practical to have either the main thread blocked during the normal operation of the program (i.e. not at the very end) to see when a thread terminates (hundreds of threads with various running lengths), or to make a separate "thread managing thread" :).
As written above, the EndThread does the trick perfectly, without any blocking operation in the main thread. (And yes, I have a separate mechanism to wait for all the threads at the very end of the run of the program, not to kill them in the middle of their operation, but that is another topic).
It's simply how OS threading interfaces work on either Windows or Unix. This is not an FPC design. And yes, the OS/threading library keeps state around until you join a thread, unless it's a detached one. If you don't care for synchronisation with your threads for whatever reason, you need a detached threads. However, this functionality is indeed not exposed by the procedural FPC threading interface, because it's the exception rather than the rule in most programs.

Quote
Last to note, it is always a blocking operation on *nix operating systems; pthread_join has no timeout parameter.
Indeed, I forgot about that.

 

TinyPortal © 2005-2018