I don't know how useful I will be. I use Windows on a desktop computer with 6 cores and 12 threads. Programs can potentially have 100 or more threads, but in real calculations, no more than 12 threads are used. A further increase in the number of threads reduces the efficiency to the level of the Trevithick steam locomotive.
I have fairly niche tasks related to parallel processing large amounts of data. The scheme for using threads is simple. The program is launched with the required number of threads, controlled by instances of the TEvent class, and threads "live" until the program is completed.
I have not noticed anything suspicious before, for example, when the program processed data larger than 4 GB for over 100 hours. Now I launched it in test mode with 6 threads within an hour. At the time of run, it was 94.5 MB, and after an hour, it was 93.6 MB.
GUI App, Lazarus 3.4, FPC 3.2.2, Windows 11.