I was looking for a statistical (ie. non-instrumenting) profiler that works with Free Pascal and couldn't find any since pretty much everything assumes you are using PDB (Visual C++) symbols. Very Sleepy claims some MinGW support but it still didn't work with Free Pascal (it has two parsers for DWARF data, one that seemed to enter an endless loop with random garbage spewed out and another that simply didn't load anything regardless of settings).
So i decided to write mine. I wanted one that works in retro systems anyway since i wanted to profile my
retro game there (though the API i used to grab thread state requires at least Win2K).
You can find it here:
http://runtimeterror.com/tools/fpwprof/This can use either STABS or DWARFv2 debug info. Initially i wrote STABS support because it seemed simpler (it took me around 30 minutes to write the code for loading the data i needed)... but then i realized that it only works with 32bit applications, so i had to bite the bullet and write a DWARF parser too. That took me the next 1.5 days :-P. But at least it now works with 64bit applications too and as a bonus it also works with MinGW programs (as long as they're compiled with -gdwarf-2 so that only DWARFv2 debug data is generated - i haven't implemented any later version). STABS is still much faster to parse though. Also some older versions of Free Pascal (e.g. 2.2.4 which i used to compile my game's Windows version because it works in Windows 95 that i wanted to use for running in retro PCs with 3dfx Voodoo 1 GPUs) may generate invalid DWARF data for some programs (e.g. my game) but proper STABS data (initially i thought it was my parser that had the problem but i also tried it with LLVM's llvm-dwarfdump.exe and it also couldn't parse the data). Recent versions of FPC work fine though.
Beyond that the profiler can be configured to start automatically when a specified executable is launched, collect up to some maximum number of samples and be started/stopped when a key is pressed. These can be useful for when you cannot interact with the program directly, like e.g. when a game is being profiled that runs in fullscreen (i guess you can spot a pattern by now :-P).
Note that at the moment this only samples EIP/RIP, it doesn't capture callstack or anything else. That should be enough to get a quick and clear idea of where the time is spent though perhaps not enough to see why :-P. I'll see about adding that (and better filtering) in the future. Similarly i threw together the UI quickly, so no advanced stuff (the entire tool was made in a couple of days, most of which was spent reading about DWARF, cursing the modern industry that decided a debug format which includes stuff like an entire VM for decoding line numbers and locations is preferable to the much simpler format that simply gives you address ranges - and of course writing the parser for it). You can at least copy the profile results to the clipboard and paste it in an editor for comparing multiple runs.