Recent

Author Topic: Debugger regression in Lazarus 4.0  (Read 7453 times)

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Debugger regression in Lazarus 4.0
« on: July 11, 2025, 01:14:05 am »
I believe there is a regression in the debug side of Lazarus 4.0.
I have written a command app for Linux. I know debugging a command Line app is not easy but using Dwarf 3 in Lazarus version 3.6 it worked pretty well.

My app uses an external library and a recent update of that library had an issue, it would produce a General Protection fault. In Lazarus 4 the GPF caused Lazarus itself to fully crash out and close.
In Lazarus 3.6 it does not crash the IDE but clearly shows the where the issue was (the call in the external lib).

I thought this was worth reporting but not sure what info you would need to help with this.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #1 on: July 11, 2025, 09:46:44 am »
Thanks for reporting and sorry to hear...

If possible could you provide further info?

Please go to menu: Tools > Configure Build Lazarus
In "custom options" add -gw3 -O-1
Then build.

Exit the IDE and start it with
Code: Text  [Select][+][-]
  1. gdb -ex r lazarus
When it asks about "Enable debuginfod " answer no.
Once the IDE crashed enter:  bt  (and return)

Copy the output here.
Thanks

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #2 on: July 15, 2025, 02:53:52 am »
Ok, I can easily reproduce this. I will post once I have done it.

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #3 on: July 16, 2025, 09:21:36 pm »
Is this the output you expected to see?

#0  0x0000000000562704 in AVL_TREE$_$TAVLTREE_$__$$_ROTATERIGHT$TAVLTREENODE ()
#1  0x0000000000561b04 in AVL_TREE$_$TAVLTREE_$__$$_BALANCEAFTERINSERT$TAVLTREENODE ()
#2  0x00007fffa844e000 in ?? ()
#3  0x00007fffa884ecdc in ?? ()
#4  0x00007fff9f787c80 in ?? ()
#5  0x00007fff9f257640 in ?? ()
#6  0x00007fff9f787cc0 in ?? ()
#7  0x0000000000561325 in AVL_TREE$_$TAVLTREE_$__$$_ADD$TAVLTREENODE ()
#8  0x00007fff9f787c80 in ?? ()
#9  0x00007fff9f787300 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x00007fff9f257640 in ?? ()
#11 0x0000000000561139 in AVL_TREE$_$TAVLTREE_$__$$_ADD$POINTER$$TAVLTREENODE ()
#12 0x00007fffffffcd00 in ?? ()
#13 0x000000000000000b in ?? ()
#14 0x00007fffa884d470 in ?? ()
#15 0x0000000000640821 in InternalAdd (this=0x7fffbb4ef760, AId=<error reading variable: Attempt to dereference a generic pointer.>, AData=<error reading variable: Attempt to dereference a generic pointer.>) at maps.pp:415
#16 0x00000000006419e2 in Add (this=0x7fffbb4ef760, AId=<error reading variable: Attempt to dereference a generic pointer.>, AData=<error reading variable: Attempt to dereference a generic pointer.>) at maps.pp:763
#17 0x0000000001134c7f in Add (this=0x7fffbb4ef760, AId=<error reading variable: Attempt to dereference a generic pointer.>, AData=<error reading variable: Attempt to dereference a generic pointer.>) at fpdbgclasses.pp:1274
#18 0x000000000113a744 in AddThread (this=0x7fffbb74a120, AThreadIdentifier=29452) at fpdbgclasses.pp:3011
#19 0x00000000011aeffe in ProcessLoop (this=0x7ffff2d4d640) at fpdbgcontroller.pas:1917
#20 0x00000000013c2fce in DoExecute (this=0x7fff9d5a2dc0) at fpdebugdebuggerworkthreads.pas:490
#21 0x000000000115f428 in ExecuteInThread (this=0x7fff9d5a2dc0, MyWorkerThread=0x7fffbb4cfcc0) at fpdbgutil.pp:807
#22 0x00000000011609b5 in Execute (this=0x7fffbb4cfcc0) at fpdbgutil.pp:1090
#23 0x000000000050d52f in CLASSES_$$_THREADFUNC$POINTER$$INT64 ()
#24 0x00007fffbb4cfcc0 in ?? ()
#25 0x00007fff9f787000 in ?? ()
#26 0x00007fffbb4cfcc0 in ?? ()
#27 0x000000000043daec in SYSTEM_$$_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD ()
#28 0x00007fffa884da78 in ?? ()
#29 0x0000000000000000 in ?? ()

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #4 on: July 16, 2025, 09:23:05 pm »
It didnt ask  about "Enable debuginfod " FYI

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #5 on: July 16, 2025, 10:05:07 pm »
Is this the output you expected to see?

#17 0x0000000001134c7f in Add (this=0x7fffbb4ef760, AId=<error reading variable: Attempt to dereference a generic pointer.>, AData=<error reading variable: Attempt to dereference a generic pointer.>) at fpdbgclasses.pp:1274
#18 0x000000000113a744 in AddThread (this=0x7fffbb74a120, AThreadIdentifier=29452) at fpdbgclasses.pp:3011
#19 0x00000000011aeffe in ProcessLoop (this=0x7ffff2d4d640) at fpdbgcontroller.pas:1917
#20 0x00000000013c2fce in DoExecute (this=0x7fff9d5a2dc0) at fpdebugdebuggerworkthreads.pas:490
#21 0x000000000115f428 in ExecuteInThread (this=0x7fff9d5a2dc0, MyWorkerThread=0x7fffbb4cfcc0) at fpdbgutil.pp:807
#22 0x00000000011609b5 in Execute (this=0x7fffbb4cfcc0) at fpdbgutil.pp:1090

Yes, the output is good. Well as good as it gets. Unfortunately the problem is a bit harder to detect.

The AVL stuff itself is extremely unlikely to be at fault. That has been around for a long time. And its in the RTL, it hasn't changed between the Lazarus versions.

So the problem is further down. (Hoping that the problem isn't random / e.g. if any other code did write to a dangling pointer, then this code would crash as a result of an unseen error in that other code.).

This code is executed when the debugged process started a new thread.
So the next question is, does the app that you are debugging start (and stop) a lot of threads? (that includes any threads started/stopped inside any library/so file.

Anything else thread releated that your app does?





I will set up a testrun here, having the debugger deal with threads coming and going. And I will see if valgrind turn up something.

If not, or if you have time, maybe you can "valgrind" it on your side.
But be warned, it's a painfully slow process.

Basically valgrind watches an app running, and notes every mem-alloc/free, as well as initialization of mem. If anything accesses wrong memory (such as leading to the access violation) then valgrind will know if and when that memory was last used, and by what code....

To do the test, you would need to install valgrind.

Then run Lazarus within Valgrind
Code: Bash  [Select][+][-]
  1. valgrind --num-callers=35   --tool=memcheck  --log-file=laz.trc ./lazarus

Before you do, make sure the project is open, and had been compiled and is ready to produce the error in the debug.

Starting the IDE like this will take between half a minute and several minutes until the IDE is fully loaded. And starting the debugger in the IDE will take time again, and a debug session that normally is one minute can become 5 to 10 minutes.

So then when the IDE runs like this, get the error to happen.
Then get the logfile, and upload it (zip it, it will be big).



But as I said, I will try on my end, on the off chance anything goes wrong by just debugging an app doing TThread.create over and over...

I am not to hopeful my attempts will get me the error. I have a test case that dose a fair amount of threads. And I hadn't seen issues. But I haven't valgrinded it in some time.


Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #6 on: July 16, 2025, 10:17:31 pm »
My app uses an external library and a recent update of that library had an issue,

Is that library written in FPC?
Or is it 3rd party?

If 3rd party, does it have debug info? (see below to check how)

There had been issues in FpDebug with more recent DWARF versions. Those were fixed IIRC (of course only those known).
So maybe there are more...

If there are, it is possible that reading the DWARF the fpdebug writes to incorrect internal pointers and overwrites other memory. That would be setting up for later errors in (semi) random places.

If the library is freely available, I wouldn't mind a copy, then I could run checks if the DWARF is correctly read.



You can use objdump (should be on your system, IIRC comes with fpc / so on Linux it may be pre-existing / if not every distro has a package to install it).

replace lazbuild with your dll

Code: Bash  [Select][+][-]
  1. objdump --dwarf ./lazbuild > /dev/null
  2. objdump --dwarf=info ./lazbuild | grep -A12 -i 'compilation unit' | grep -i 'name\|version'
  3.  

The first one is just, if objdump prints any error or warning to stdout. If so, that is important.

The 2nd one will give the version of DWARF. There may be many versions, but they should all be the same.

If it's 4 or 5 then there is a chance that FpDebug goes wrong while reading it.  (also something that valgrind would likely pick up)

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #7 on: July 16, 2025, 10:28:46 pm »
The library is external 3rd party written in C. The GPF error as shown in lazarus 3.6.

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #8 on: July 16, 2025, 10:31:52 pm »
Oh yes to answer your other question both the library and my app are creating/terminating threads.

This is the library https://github.com/libplctag/libplctag

version 2.6.6 on my machine has a race condition that causes the GPF.

TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #9 on: July 16, 2025, 10:35:06 pm »
this is the wrapper I use

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #10 on: July 16, 2025, 11:19:53 pm »
Ok, I downloaded the 2.2.6. (the x86 Ubuntu build)

I am not likely to run it myself. Just wanted to test it being loaded into the debugger (I can do that without actually needing to exec it).

My tools and valgrind didn't show any issues loading it. (it has some minimal info that the debugger will read, to decode stackframes. (when the app pauses and it needs to show the callstack).


Technically it could still go wrong when it uses the info to calculate the stack. But the gdb backtrace does not point anywhere near that.



TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #11 on: July 17, 2025, 01:26:53 am »
So it gets stranger, the first time I ran that today the GPF in my app fully crashed Lazarus itself. Each crash since has showed the fault exactly as per Lazarus 3.6 and not crashed the IDE. The other day when I initially reported this, was happening every time.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #12 on: July 17, 2025, 11:17:39 am »
So it gets stranger, the first time I ran that today the GPF in my app fully crashed Lazarus itself. Each crash since has showed the fault exactly as per Lazarus 3.6 and not crashed the IDE. The other day when I initially reported this, was happening every time.

That could indicate that the error is either
- a race condition (sensitive to very very small changes in timing)
  However, given that it used to happen 100%, that doesn't fit completely
- an uninitialized/dangling pointer access (including bugs in the compiler)


A race condition would mean, that the order in which the IDE receives the events from the debugged process, and other events from the outside (such as paint, mouse, keyboard) matters, and that if 2 specific of those events occur exactly in the right order, with the exact correct time between them (usually micro seconds), then and only then it fails.
But, race conditions rarely happen 100% of the time over a long time, and then stop or become less frequent.
(Except, if some other part of the fault changes / a race condition could trigger a dangling pointer... and that may change)

That said, there is some thread-shared data in the exact code where you recorded the crash. And data shared between threads is often the base for race conditions.
I am sure, I did review that when I wrote it, and again later (there was a dangling pointer issue some years back, that was near impossible to find, and at that time I reviewed all thread shared data). Yet, I have put it on my list to review it again (but for that I need to make some more time).


A dangling pointer could be.

You did rebuild the IDE (and with different setting, e.g. -O-1 ). Any setting change would change the exact memory layout (some asm code may be differently optimized, and change in size / some data may be differently aligned, other optimizations can touch mem layout of the data).
In other words, the IDE still works with the same data, but the order in which variables are placed into the RAM of your PC may have changed.
An uninitialized/dangling pointer means that some random data gets damaged at some point (and later triggers the crash). If that was the case, then the "random data" may now be either not causing a crash, or it may be within free/unused memory.

That is something that valgrind can pick up (even if no crash happens).



You are using FPC 3.2.2 ?

I remember there was at least one (I think I saw even 2 different reports) where that FPC would wrongly translate valid Pascal code, so that the result would have a dangling pointer. (even that was not in the original Pascal code).

One of those was only found on Windows (it is possible it could happen on other OS, but it was never observed despite a real in depth search).

IIRC both of those are in the optimizer.
So you could try to build your IDE with -O-

Mind you, both of them are about as likely as winning a significant bit in the lottery.
So its probably not them. But ....

You can also build your IDE alternating with
-O2
-O-1
-O-

and see if the crash appears/disappears.
If that makes a diff, it could still be a bug in the debugger code, rather than the compiler. Both would be possible.

You could then next try to get the 3.2.4RC and test with that. (or use fpcupdeluxe to install 3.2.3 / fixes_3_2 => fpcupdeluxe makes it easier to have 2 installs)


TheMouseAUS

  • Full Member
  • ***
  • Posts: 101
Re: Debugger regression in Lazarus 4.0
« Reply #13 on: July 17, 2025, 12:52:03 pm »
Yes I am using FPC 3.2.2. I did check but valgrind is not available on the repo of the distro I use.

I haven't checked yet what it would take to compile/install it. I do want to do it at some point as I wanted to run my app through to make sure I was handling everything correctly anyway (threads, memory etc).  Heaptrc is showing its pretty good so havent got round to it yet.

I will look into FPC upgrade. I do find it unusual I dont see these bugs on Lazarus 3.6. I have found 4.0 to be a lot more 'crashy', even today just right clicking on a variable i wanted to create a 'watch' for crashed the whole IDE out. Even closing Lazarus I managed to catch this :-

Thread 1 "lazarus" received signal SIGSEGV, Segmentation fault.
0x00007ffff6142f73 in QObject::disconnect(QObject const*, char const*, QObject const*, char const*) () from /usr/lib64/libQt6Core.so.6
(gdb) bt
#0  0x00007ffff6142f73 in QObject::disconnect(QObject const*, char const*, QObject const*, char const*) () from /usr/lib64/libQt6Core.so.6
#1  0x00007ffff7e3658a in QObject_hook::~QObject_hook() () from /usr/lib/libQt6Pas.so.6
#2  0x00000000007831ca in DETACHEVENTS (this=0x7fffc471cc60) at qt6/qtwidgets.pas:16530
#3  0x0000000000763a93 in DEINITIALIZEWIDGET (this=0x7fffc471cc60) at qt6/qtwidgets.pas:2279
#4  0x0000000000763cbb in DESTROY (this=0x7fffc471cc60, vmt=0x0) at qt6/qtwidgets.pas:2342
#5  0x000000000078304f in DESTROY (this=0x7fffc471cc60, vmt=0x1) at qt6/qtwidgets.pas:16504
#6  0x00000000004371eb in SYSTEM$_$TOBJECT_$__$$_FREE ()
#7  0x00007fffffffd7c0 in ?? ()
#8  0x0000000000752831 in RELEASE (this=0x7fffc471cc60) at qt6/qtobjects.pas:1126
#9  0x0000000000763deb in RELEASE (this=0x7fffc471cc60) at qt6/qtwidgets.pas:2382
#10 0x00000000007f68c0 in DESTROYHANDLE (self=0x7fffc5029180, AMENUITEM=0x7fffa8e9b020) at qt6/qtwsmenus.pp:257
#11 0x00000000006d2a26 in DESTROYHANDLE (this=0x7fffa8e9b020) at include/menuitem.inc:905
#12 0x00000000006d29a6 in DESTROYHANDLE (this=0x7fffa8e9ac00) at include/menuitem.inc:900
#13 0x00000000006d29a6 in DESTROYHANDLE (this=0x7fffb9341b80) at include/menuitem.inc:900
#14 0x00000000006d29a6 in DESTROYHANDLE (this=0x7fffc42f3040) at include/menuitem.inc:900
#15 0x00000000006d015b in DESTROYHANDLE (this=0x7fffc50341c0) at include/menu.inc:173
#16 0x00000000006d505b in DESTROY (this=0x7fffc50341c0, vmt=0x1) at include/popupmenu.inc:56
#17 0x00000000005157a6 in CLASSES$_$TCOMPONENT_$__$$_DESTROYCOMPONENTS ()
#18 0x0000000000000001 in ?? ()
#19 0x00007fffe3f87410 in ?? ()
#20 0x00000000ffffffff in ?? ()
#21 0x000000000051570e in CLASSES$_$TCOMPONENT_$__$$_DESTROY ()
#22 0x0000000000000001 in ?? ()
#23 0x00007ffff7ffd000 in ?? () from /lib64/ld-linux-x86-64.so.2
#24 0x00007fffffffe220 in ?? ()
#25 0x0000000000000001 in ?? ()
#26 0x00007fffffffdca8 in ?? ()
#27 0x00000000006cc3fd in DESTROY (this=0x7fffe3f87410, vmt=0x0) at lclclasses.pp:157
#28 0x000000000068a9b1 in DESTROY (this=0x7fffe3f87410, vmt=0x0) at include/control.inc:5227
#29 0x0000000000678b0f in DESTROY (this=0x7fffe3f87410, vmt=0x0) at include/wincontrol.inc:6687
#30 0x000000000068d225 in DESTROY (this=0x7fffe3f87410, vmt=0x0) at include/customcontrol.inc:40
#31 0x0000000000484a25 in DESTROY (this=0x7fffe3f87410, vmt=0x0) at include/scrollingwincontrol.inc:360
#32 0x0000000000486139 in DESTROY (this=0x7fffe3f87410, vmt=0x0) at include/customform.inc:138
#33 0x0000000000c5bc25 in DESTROY (this=0x7fffe3f87410, vmt=0x1) at sourceeditor.pp:7032
#34 0x00000000004371eb in SYSTEM$_$TOBJECT_$__$$_FREE ()
#35 0x00007fffe029d390 in ?? ()
#36 0x0000000000c68631 in FREESOURCEWINDOWS (this=0x7fffd405dc10) at sourceeditor.pp:10063
#37 0x0000000000c69eff in DESTROY (this=0x7fffd405dc10, vmt=0x0) at sourceeditor.pp:10572
#38 0x0000000000c6f682 in DESTROY (this=0x7fffd405dc10, vmt=0x1) at sourceeditor.pp:12007
#39 0x00000000004371eb in SYSTEM$_$TOBJECT_$__$$_FREE ()
#40 0x00007fffffffe000 in ?? ()
#41 0x000000000049e462 in FREETHENNIL (OBJ=0) at lazutilities.pas:76
#42 0x00000000004ae694 in DESTROY (this=0x7fffe029d390, vmt=0x1) at main.pp:1764
#43 0x00000000004371eb in SYSTEM$_$TOBJECT_$__$$_FREE ()
#44 0x00007fffffffe0f0 in ?? ()
#45 0x0000000000425891 in main () at lazarus.pp:169

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12123
  • Debugger - SynEdit - and more
    • wiki
Re: Debugger regression in Lazarus 4.0
« Reply #14 on: July 17, 2025, 01:24:26 pm »
The "closing crash" looks like a QT6 issue. I suggest to report it (with the stack trace). Maybe our QT maintainer can find something there.

Also that would need info about the exact Linux distro, probably xserver/wayland, and QT details.

Quote
I do find it unusual I dont see these bugs on Lazarus 3.6. I have found 4.0 to be a lot more 'crashy', even today just right clicking on a variable i wanted to create a 'watch'

I don't say it has to be in FPC. It's just one possibility, and not necessarily the likeliest one. Just covering all bases.

Most issues are close to where they occur. I.e. you got a crash while debugging, 95% chance the debugger code is faulty. But I have yet to come up with more checks on how to trace this particular one (your original report).



As for the watch adding. If you got a stacktrace...

That can be anything:
- frontend / QT / menu (given that the "exit crash" potentially points to a QT menu issue)
- source edit / codetools
- the watch window (and there again, could be QT)




As for frontend (e.g. menu / but NOT the original crash): From what I hear there are issues with some Linux distros that use unstable wayland libs (I don't know anything, except hear say...). Usually I hear that in gtk2 context / not sure if this can affect QT.




As for the original crash report.

If it turns out to be a dangling pointer, or if it is a race conditon => then it may have been the same in 3.6.
Only difference is, that because other code changed, and other data structures changed, memory layout and timings changed. So the same bug then suddenly has a different effect.

Unfortunately that doesn't give me much of a clue where to look for it.

 

TinyPortal © 2005-2018