Recent

Author Topic: [SOLVED] Access violation when ending program / Forms with Parent 'Application'  (Read 7547 times)

Hartmut

  • Hero Member
  • *****
  • Posts: 1000
Yesterday I stopped my program normally, but then it throwed this AV:
Code: Text  [Select][+][-]
  1. [FORMS.PP] ExceptionOccurred
  2.   Sender=EAccessViolation
  3.   Exception=Access violation
  4.   Stack trace:
  5.   $00000000005D1B42
  6.   $00000000005BEF09
  7. ...
  8.   $00000000004355A9
  9. Exception at 00000000005D1B42: EAccessViolation:
  10. Access violation.

I found no way to reproduce this AV, but I want to try to come closer to it and to add some debug output at reasonable code positions.

It's a GUI-program about 22000 lines plus a couple of common units about 100000 lines. Compiled with Lazarus 2.0.10 and FPC 3.2.0 (because it's an older project). Running on Linux Ubuntu 22.04 64-bit with GTK2.

Question 1:
Is it possible to find the code position, where the AV occurs, by the info "Exception at 00000000005D1B42"? In good old Turbo Pascal it was possible, to enter this address in the IDE and then the code position was found. Is this possible in Lazarus too? How please?

My program always writes some debug infos into a terminal. The last 2 lines before the AV were:
Code: Text  [Select][+][-]
  1. TForm1.FormClose() finished
  2. TForm1.IniPropStorage1SavingProperties() started
  3. [FORMS.PP] ExceptionOccurred
  4. ...
That means: event FormClose() of the main form had run completely and the event to write the INI-file was started correctly. There is currently no debug message when this event has run completely, but the INI-file was written correctly and had a current timestamp. So the AV must have occured "very late".

I guessed that "[FORMS.PP]" could be a source filename and found this line in file <installdir>/lazarus/lcl/forms.pp, but this did not help me to progress:
Code: Pascal  [Select][+][-]
  1. procedure ExceptionOccurred(Sender: TObject; Addr:Pointer; FrameCount: Longint;
  2.   Frames: PPointer);
  3. Begin
  4.   DebugLn('[FORMS.PP] ExceptionOccurred ');
  5.   if HaltingProgram or HandlingException then Halt;
  6.   HandlingException:=true;
  7.   if Sender<>nil then
  8.   begin
  9.     DebugLn('  Sender=',Sender.ClassName);
  10.     if Sender is Exception then
  11.     begin
  12.       DebugLn('  Exception=',Exception(Sender).Message);
  13.       DumpExceptionBackTrace();
  14.     end;
  15.   end else
  16.     DebugLn('  Sender=nil');
  17.   if Application<>nil then
  18.     Application.HandleException(Sender);
  19.   HandlingException:=false;
  20. end;

Question 2:
Can anybody read something helpful from the informations, which are showed together with the AV?

I searched the sources of Lazarus for all references to procedure 'ExceptionOccurred'. I found only this reference in <installdir>/lazarus/lcl/include/application.inc:
Code: Pascal  [Select][+][-]
  1. procedure TApplication.SetCaptureExceptions(const AValue: boolean);
  2. begin
  3.   if FCaptureExceptions=AValue then exit;
  4.   FCaptureExceptions:=AValue;
  5.   if FCaptureExceptions then begin
  6.     // capture exceptions
  7.     // store old exceptproc
  8.     if FOldExceptProc=nil then
  9.       FOldExceptProc:=ExceptProc;
  10.     ExceptProc:=@ExceptionOccurred;
  11.   end else begin
  12.     // do not capture exceptions
  13.     if ExceptProc=@ExceptionOccurred then begin
  14.       // restore old exceptproc
  15.       ExceptProc:=FOldExceptProc;
  16.       FOldExceptProc:=nil;
  17.     end;
  18.   end;
  19. end;
I assume that this only installs procedure forms.ExceptionOccurred() as a common Exception Handler for var 'Application'.

This got me to an idea what possibly might have to do with this AV: my program uses a couple of non modal Forms. Usually I create them like
Code: Pascal  [Select][+][-]
  1. FormH:=TFormH.CreateNew(Application);
because, if the program terminates and a non modal form is still open, then it is closed automatically. To close such a non modal form via a button or by code I call self.Close().

Question 3:
Is something wrong about this practice? Are there more things I had to respect?
And where should I add some more debug output to check - if this AV occures again - whether this suspicion is true or not?

Thanks in advance.
« Last Edit: September 16, 2025, 11:22:36 am by Hartmut »

Khrys

  • Sr. Member
  • ****
  • Posts: 342
Is it possible to find the code position, where the AV occurs, by the info "Exception at 00000000005D1B42"?

Here's how I usually do that:
  • Recompile the program with debug information enabled (don't touch any other setting - the generated assembly must remain identical)
  • Run the program under  gdb - e.g.  gdb -q --args Program.exe
  • Use  x/i <ADDRESS>  or  disas <ADDRESS>  or  info sym <ADDRESS>  on the relevant addresses to show the corresponding symbol name from the debug information

Example:

Code: Pascal  [Select][+][-]
  1. program Test;
  2.  
  3. procedure IntentionalSegfault();
  4. begin
  5.   PChar(Nil)^ := #0;
  6. end;
  7.  
  8. begin
  9.   IntentionalSegfault();
  10. end.

Compile & run normally:  fpc -Pi386 test.pas && test.exe


Quote from: test.exe
Runtime error 216 at $00401593
  $00401593
  $004015AD
  $00407497

Recompile with debug information:  fpc -g -Pi386 test.pas

Run with  gdbgdb -q --args test.exe

Check instruction & symbol at  0x00401593x/i 0x00401593

Quote from: gdb
0x401593 <INTENTIONALSEGFAULT+3>:    mov    BYTE PTR ds:0x0,0x0

Hartmut

  • Hero Member
  • *****
  • Posts: 1000
Thanks a lot Khrys for your reply. I'm not very familiar with the Lazarus debugger and I use 'gdb' today for the very 1st time. As far as I can see, my program is/was always compiled with debug information enabled (see screenshot), so I used now for 'gdb' a copy of the executable, which caused the AV, without recompiling it again.

Code: Text  [Select][+][-]
  1. hg6@i3300:/hg/utis$ gdb -q azul487
  2. Reading symbols from azul487...
  3. (gdb) x/i 0x05D1B42
  4.    0x5d1b42 <DESTROY+434>:      mov    0x10(%rax),%eax
  5.  
  6. (gdb) info sym 0x05D1B42
  7. CONTROLS$_$TCONTROL_$__$$_DESTROY + 434 in section .text
  8.  
  9. (gdb) disas 0x05D1B42
  10. Dump of assembler code for function DESTROY:
  11.    0x00000000005d1990 <+0>:     push   %rbp
  12.    0x00000000005d1991 <+1>:     mov    %rsp,%rbp
  13.    0x00000000005d1994 <+4>:     lea    -0x50(%rsp),%rsp
  14.    0x00000000005d1999 <+9>:     mov    %rbx,-0x48(%rbp)
  15.    0x00000000005d199d <+13>:    mov    %r12,-0x40(%rbp)
  16.    0x00000000005d19a1 <+17>:    mov    %rdi,-0x10(%rbp)
  17.    0x00000000005d19a5 <+21>:    mov    %rsi,-0x8(%rbp)
  18.    0x00000000005d19a9 <+25>:    cmpq   $0x0,-0x8(%rbp)
  19.    0x00000000005d19ae <+30>:    jg     0x5d19b2 <DESTROY+34>
  20.    0x00000000005d19b0 <+32>:    jmp    0x5d19c6 <DESTROY+54>
  21.    0x00000000005d19b2 <+34>:    mov    -0x10(%rbp),%rax
  22.    0x00000000005d19b6 <+38>:    mov    -0x10(%rbp),%rdx
  23.    0x00000000005d19ba <+42>:    mov    (%rdx),%rdx
  24.    0x00000000005d19bd <+45>:    mov    %rax,%rdi
  25.    0x00000000005d19c0 <+48>:    call   *0x90(%rdx)
  26.    0x00000000005d19c6 <+54>:    mov    -0x10(%rbp),%rdi
  27.    0x00000000005d19ca <+58>:    mov    $0x0,%sil
  28.    0x00000000005d19cd <+61>:    call   0x5cca30 <SETMOUSECAPTURE>
  29.    0x00000000005d19d2 <+66>:    mov    -0x10(%rbp),%rsi
  30.    0x00000000005d19d6 <+70>:    lea    0x4bdfc3(%rip),%rax        # 0xa8f9a0 <TC_$CONTROLS_$$_DRAGMANAGER>
  31.    0x00000000005d19dd <+77>:    mov    (%rax),%rdi
  32.    0x00000000005d19e0 <+80>:    mov    $0x1,%edx
  33.    0x00000000005d19e5 <+85>:    lea    0x4bdfb4(%rip),%rax        # 0xa8f9a0 <TC_$CONTROLS_$$_DRAGMANAGER>
  34.    0x00000000005d19ec <+92>:    mov    (%rax),%rax
  35.    0x00000000005d19ef <+95>:    mov    (%rax),%rax
  36.    0x00000000005d19f2 <+98>:    call   *0x118(%rax)
  37.    0x00000000005d19f8 <+104>:   mov    -0x10(%rbp),%rsi
  38.    0x00000000005d19fc <+108>:   lea    0x42654d(%rip),%rax        # 0x9f7f50 <TC_$FORMS_$$_APPLICATION>
  39.    0x00000000005d1a03 <+115>:   mov    (%rax),%rdi
  40.    0x00000000005d1a06 <+118>:   call   0x457a90 <CONTROLDESTROYED>
  41.    0x00000000005d1a0b <+123>:   mov    -0x10(%rbp),%rax
  42.    0x00000000005d1a0f <+127>:   cmpq   $0x0,0x1a0(%rax)
  43.    0x00000000005d1a17 <+135>:   jne    0x5d1a1e <DESTROY+142>
  44.    0x00000000005d1a19 <+137>:   jmp    0x5d1abb <DESTROY+299>
  45.    0x00000000005d1a1e <+142>:   mov    -0x10(%rbp),%rax
  46.    0x00000000005d1a22 <+146>:   mov    0x1a0(%rax),%rax
  47. --Type <RET> for more, q to quit, c to continue without paging--
  48.    0x00000000005d1a29 <+153>:   testl  $0x8,0x50(%rax)
  49.    0x00000000005d1a30 <+160>:   je     0x5d1a37 <DESTROY+167>
  50.    0x00000000005d1a32 <+162>:   jmp    0x5d1abb <DESTROY+299>
  51.    0x00000000005d1a37 <+167>:   mov    -0x10(%rbp),%rax
  52.    0x00000000005d1a3b <+171>:   mov    0x1a0(%rax),%rdi
  53.    0x00000000005d1a42 <+178>:   mov    -0x10(%rbp),%rdx
  54.    0x00000000005d1a46 <+182>:   mov    $0x0,%rsi
  55.    0x00000000005d1a4d <+189>:   mov    -0x10(%rbp),%rax
  56.    0x00000000005d1a51 <+193>:   mov    0x1a0(%rax),%rax
  57.    0x00000000005d1a58 <+200>:   mov    (%rax),%rax
  58.    0x00000000005d1a5b <+203>:   call   *0x738(%rax)
  59.    0x00000000005d1a61 <+209>:   mov    -0x10(%rbp),%rdi
  60.    0x00000000005d1a65 <+213>:   mov    $0x0,%rsi
  61.    0x00000000005d1a6c <+220>:   mov    -0x10(%rbp),%rax
  62.    0x00000000005d1a70 <+224>:   mov    (%rax),%rax
  63.    0x00000000005d1a73 <+227>:   call   *0x450(%rax)
  64.    0x00000000005d1a79 <+233>:   mov    -0x10(%rbp),%rdi
  65.    0x00000000005d1a7d <+237>:   call   0x5c9180 <GETBOUNDSRECT>
  66.    0x00000000005d1a82 <+242>:   mov    %rax,-0x38(%rbp)
  67.    0x00000000005d1a86 <+246>:   mov    %rdx,-0x30(%rbp)
  68.    0x00000000005d1a8a <+250>:   mov    -0x38(%rbp),%rdx
  69.    0x00000000005d1a8e <+254>:   mov    -0x30(%rbp),%rcx
  70.    0x00000000005d1a92 <+258>:   mov    -0x10(%rbp),%rdi
  71.    0x00000000005d1a96 <+262>:   mov    $0x0,%rsi
  72.    0x00000000005d1a9d <+269>:   mov    -0x10(%rbp),%rax
  73.    0x00000000005d1aa1 <+273>:   mov    (%rax),%rax
  74.    0x00000000005d1aa4 <+276>:   call   *0x530(%rax)
  75.    0x00000000005d1aaa <+282>:   mov    -0x10(%rbp),%rax
  76.    0x00000000005d1aae <+286>:   movq   $0x0,0x1a0(%rax)
  77.    0x00000000005d1ab9 <+297>:   jmp    0x5d1b24 <DESTROY+404>
  78.    0x00000000005d1abb <+299>:   mov    -0x10(%rbp),%rax
  79.    0x00000000005d1abf <+303>:   cmpq   $0x0,0x1a0(%rax)
  80.    0x00000000005d1ac7 <+311>:   jne    0x5d1acb <DESTROY+315>
  81.    0x00000000005d1ac9 <+313>:   jmp    0x5d1b0c <DESTROY+380>
  82.    0x00000000005d1acb <+315>:   mov    -0x10(%rbp),%rax
  83.    0x00000000005d1acf <+319>:   mov    0x1a0(%rax),%rax
  84.    0x00000000005d1ad6 <+326>:   cmpq   $0x0,0x490(%rax)
  85. --Type <RET> for more, q to quit, c to continue without paging--
  86.    0x00000000005d1ade <+334>:   jne    0x5d1ae2 <DESTROY+338>
  87.    0x00000000005d1ae0 <+336>:   jmp    0x5d1b0c <DESTROY+380>
  88.    0x00000000005d1ae2 <+338>:   mov    -0x10(%rbp),%rax
  89.    0x00000000005d1ae6 <+342>:   mov    0x1a0(%rax),%rax
  90.    0x00000000005d1aed <+349>:   mov    0x490(%rax),%rdi
  91.    0x00000000005d1af4 <+356>:   mov    -0x10(%rbp),%rsi
  92.    0x00000000005d1af8 <+360>:   call   0x50d210 <CLASSES$_$TFPLIST_$__$$_REMOVE$POINTER$$LONGINT>
  93.    0x00000000005d1afd <+365>:   mov    -0x10(%rbp),%rax
  94.    0x00000000005d1b01 <+369>:   movq   $0x0,0x1a0(%rax)
  95.    0x00000000005d1b0c <+380>:   mov    -0x10(%rbp),%rdi
  96.    0x00000000005d1b10 <+384>:   mov    $0x0,%rsi
  97.    0x00000000005d1b17 <+391>:   mov    -0x10(%rbp),%rax
  98.    0x00000000005d1b1b <+395>:   mov    (%rax),%rax
  99.    0x00000000005d1b1e <+398>:   call   *0x450(%rax)
  100.    0x00000000005d1b24 <+404>:   mov    -0x10(%rbp),%rax
  101.    0x00000000005d1b28 <+408>:   cmpq   $0x0,0xa0(%rax)
  102.    0x00000000005d1b30 <+416>:   jne    0x5d1b37 <DESTROY+423>
  103.    0x00000000005d1b32 <+418>:   jmp    0x5d1be4 <DESTROY+596>
  104.    0x00000000005d1b37 <+423>:   mov    -0x10(%rbp),%rax
  105.    0x00000000005d1b3b <+427>:   mov    0xa0(%rax),%rax
  106.    0x00000000005d1b42 <+434>:   mov    0x10(%rax),%eax          // address of Exception
  107.    0x00000000005d1b45 <+437>:   lea    -0x1(%eax),%ebx
  108.    0x00000000005d1b49 <+441>:   cmp    $0x0,%ebx
  109.    0x00000000005d1b4c <+444>:   jge    0x5d1b53 <DESTROY+451>
  110.    0x00000000005d1b4e <+446>:   jmp    0x5d1bd4 <DESTROY+580>
  111. ...
  112. (gdb)

From
Code: Text  [Select][+][-]
  1. (gdb) x/i 0x05D1B42
  2.    0x5d1b42 <DESTROY+434>:      mov    0x10(%rax),%eax
I assumed that we are in a DESTROY procedure.

From
Code: Text  [Select][+][-]
  1. (gdb) info sym 0x05D1B42
  2. CONTROLS$_$TCONTROL_$__$$_DESTROY + 434 in section .text
I guessed that it might be in unit Controls in procedure TControl.Destroy(). I found this in <installdir>/lazarus/lcl/include/control.inc:
Code: Pascal  [Select][+][-]
  1. destructor TControl.Destroy;
  2. var
  3.   HandlerType: TControlHandlerType;
  4.   Side: TAnchorKind;
  5.   i: Integer;
  6.   CurAnchorSide: TAnchorSide;
  7. begin
  8.   //DebugLn('[TControl.Destroy] A ',Name,':',ClassName);
  9.   // make sure the capture is released
  10.   MouseCapture := False;
  11.   // explicit notification about component destruction. this can be a drag target
  12.   DragManager.Notification(Self, opRemove);
  13.   Application.ControlDestroyed(Self);
  14. ...

From the code in this destructor I understand quite nothing - this is far beyond my horizon. I never used 'TControl', but I found out, that it is an ancestor of TForm and many other classes. But without knowing the name of this TForm (or name and type of other descendant class) I don't see how this can help me...

Has anybody an idee how to proceed? As I said, I found no way to reproduce this AV.

440bx

  • Hero Member
  • *****
  • Posts: 5805
Has anybody an idee how to proceed? As I said, I found no way to reproduce this AV.
This is how I would go about it, be forewarned that it takes being at least somewhat comfortable with assembler.

1. start debugging your program using the IDE.
2. when you are at the program's "begin" statement, open the breakpoints window and select "+" (add breakpoint).
3. you'll get the option of "address breakpoint..."  select that one
4. place a breakpoint at the address $5d1b42 (which is where the AV is occurring)

when the debugger breaks there, _before_ executing that instruction, look at the value in rax.  That value is the reason an AV is happening.  IF by inspecting the value in rax you can tell what it is (it might be a pointer to a class or some memory that has been allocated) that will give you some idea of what is causing the AV and why it may be occasionally happening.

To be thorough, there is one additional thing I'd do, which is place a breakpoint at the entry to the function that contains that address.  Placing a breakpoint there is very helpful because that may allow the debugger to locate the source line, allowing you to step in source instead of assembly.   It would also give you much better context of what is going on.

based on the gdb info you posted, it looks like the function's entry point is at $5d1990, therefore I'd place a breakpoint there too (at the start of the program, in addition to the first breakpoint I suggested.)

With just a little luck, the debugger will break at the function's entry point and show you source code, which you can then step into.  Once you're there, the "trick" is to figure out where the value in "rax" comes from.  It may take you a few times of single stepping through the code to figure it out but, with a little determination you'll probably manage to figure it out.

Once you know what the value in rax means, you will know that is causing the AV since that value is responsible for the AV.

If you have additional questions, I'll do my best to help but, expect having to get your hands dirty with assembly listings and memory addresses.

HTH.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

alpine

  • Hero Member
  • *****
  • Posts: 1410
An AV in TControl.Destroy is a pretty nasty thing.

Since you have a stack dump you can use also the IDE > View > Leaks and Traces to find the source line addresses, see: https://forum.lazarus.freepascal.org/index.php/topic,72034.msg563178.html#msg563178

According to your disassembly my guess is that the line is:
Code: Pascal  [Select][+][-]
  1. ...
  2.   if FAnchoredControls <> nil then
  3.   begin
  4.     for i := 0 to FAnchoredControls.Count - 1 do
  5.       for Side := Low(TAnchorKind) to High(TAnchorKind) do
  6.       begin
  7. ...
 
Perhaps FAnchoredControls is dangling/corrupted.

The thing is that TControls are usually automatically freed by their owner control, and even they're Freed explicitly, that is foreseen and nothing bad will happen (I'm assuming that the destructor Destroy is never called explicitly, as it should be).

So, it looks a bit like memory corruption, especially since it can't be easily reproduced. Did you try with a range checking turned on? 
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Hartmut

  • Hero Member
  • *****
  • Posts: 1000
Sorry for the late response, got a lot of homework from you ;-)

Thank you very much 440bx for your detailed instructions.

Test 1)
I started the program via IDE debugger with 1 breakpoint at $5d1b42 (where the AV occured). I played a little with the program and then ended it. The BP (breakpoint) at $5d1b42 was reached:
 - the value in rax was 140737304095136 = $00007FFFF50469A0 (but I have no idea what to do with this value)
 - the value of FAnchoredControls.Count was 1
 - because of source line "//DebugLn('[TControl.Destroy] A ',Name,':',ClassName);" I checked these 2 values:
Name = ""
ClassName = <Error: Type TCONTROL has no component named CLASSNAME.>

 - the assembler window showed:
Code: Text  [Select][+][-]
  1. ...
  2. include/control.inc:5111                  for i := 0 to FAnchoredControls.Count - 1 do
  3. 00000000005D1B37 488b45f0                 mov    rax,QWORD PTR [rbp-0x10]
  4. 00000000005D1B3B 488b80a0000000           mov    rax,QWORD PTR [rax+0xa0]
  5. 00000000005D1B42 8b4010                   mov    eax,DWORD PTR [rax+0x10]
  6. 00000000005D1B45 678d58ff                 lea    ebx,[eax-0x1]
  7. 00000000005D1B49 83fb00                   cmp    ebx,0x0
  8. 00000000005D1B4C 7d05                     jge    0x5d1b53 <DESTROY+451>
  9. 00000000005D1B4E e981000000               jmp    0x5d1bd4 <DESTROY+580>
  10. 00000000005D1B53 c745e4ffffffff           mov    DWORD PTR [rbp-0x1c],0xffffffff
  11. 00000000005D1B5A 660f1f440000             nop    WORD PTR [rax+rax*1+0x0]
  12. 00000000005D1B60 8b45e4                   mov    eax,DWORD PTR [rbp-0x1c]
  13. 00000000005D1B63 678d4001                 lea    eax,[eax+0x1]
  14. 00000000005D1B67 8945e4                   mov    DWORD PTR [rbp-0x1c],eax
  15. include/control.inc:5112                  for Side := Low(TAnchorKind) to High(TAnchorKind) do
  16. ...

 - then I pressed F8 often, until the end of destructor TControl.Destroy() was reached and short time later I got lost in a pure assembler window (without source code)
 - when the program had finished, BP $5d1b42 was not reached again and the AV did not occur.

Test 2)
I added a 2nd BP at $5d1990 (the beginning of destructor TControl.Destroy). Then:
 - I started the program and opened a simple dialog (with 1 label and 1 button), which caused BP $5d1990 to occur 5x
 - I closed this simple dialog, which caused BP $5d1990 to occur again 5x
 - I ended the program, which caused BP $5d1990 to occur some hundred times
 - BP $5d1b42 was not reached and the AV did not occur.

Test 3)
Both BP ($5d1b42 + $5d1990) were active. I started the program and ended it immediately (without doing anything with the program):
 - BP $5d1990 was reached about 830x (!)
 - BP $5d1b42 was not reached and the AV did not occur.

From this experience I think: a BP at $5d1990 is not manageable, because it is called too often (830x, if the program is only started and immediately ended).

The BP at $5d1b42 was called only once, after playing with the program a little. But when the AV occured (2 days ago), I had used the program for about 90 minutes intensively. Because 'TControl' is the ancestor of many classes, I assume, that after such an intensive usage the BP $5d1b42 will be called (much) more often. But if then the AV does not occur (I tried it about 40..50 times, with no success), I don't see how this can help me.

As said before, from the source code in this destructor I understand quite nothing - this is far beyond my horizon. Same is for trying to figure out where the value in "rax" comes from.

This debugger adventure was "interesting", but for me very time consuming. Because I can not assume that the AV occurs again in a reasonable time (it was the 1st time in more than 1 year), I fear, that this debugging approach has a too bad ratio of effort to success, because TControl.Destroy() is called too often, even if the AV not occurs.



Thanks a lot alpine too.

Since you have a stack dump you can use also the IDE > View > Leaks and Traces to find the source line addresses, see: https://forum.lazarus.freepascal.org/index.php/topic,72034.msg563178.html#msg563178
I did not manage to start 'Leaks and Traces'. I attached below a screenshot of this dialog. When I press buttons 'Update' or 'Resolve', nothing happens. 'Paste Clipboard' displays 3 lines in the lower panel. There is no Help-button. https://wiki.freepascal.org/leakview says, it requires a heaptrc output file, but I don't have one. And your link https://forum.lazarus.freepascal.org/index.php/topic,72034.msg563178.html#msg563178 says:
Quote
When ... you can in the IDE use Menu View > Leaks and traces => button resolve: select the copy [of the executable] that still has debug info.
But button 'Resolve' does nothing.

But this inspired me to the idea, to use gdb (as described by Khrys in reply #1) to dissolve the addresses shown by the AV and got:
Code: Text  [Select][+][-]
  1. [FORMS.PP] ExceptionOccurred
  2.   Sender=EAccessViolation
  3.   Exception=Access violation
  4.   Stack trace:
  5.   $00000000005D1B42 = CONTROLS$_$TCONTROL_$__$$_DESTROY + 434
  6.   $00000000005BEF09 = CONTROLS$_$TWINCONTROL_$__$$_DESTROY + 457
  7.   $000000000062C62E = STDCTRLS$_$TCUSTOMEDIT_$__$$_DESTROY + 78
  8.   $000000000062B12E = STDCTRLS$_$TCUSTOMMEMO_$__$$_DESTROY + 110
  9.   $000000000051ABB6 = CLASSES$_$TCOMPONENT_$__$$_DESTROYCOMPONENTS + 54
  10.   $00000000005D1C6E = CONTROLS$_$TCONTROL_$__$$_DESTROY + 734
  11.   $00000000005BEF09 = CONTROLS$_$TWINCONTROL_$__$$_DESTROY + 457
  12.   $00000000005D476E = CONTROLS$_$TCUSTOMCONTROL_$__$$_DESTROY + 78
  13.   $0000000000448ADE = FORMS$_$TSCROLLINGWINCONTROL_$__$$_DESTROY + 94
  14.   $000000000044A4F6 = FORMS$_$TCUSTOMFORM_$__$$_DESTROY + 278
  15.   $0000000000430ACB = SYSTEM$_$TOBJECT_$__$$_FREE + 27
  16.   $0000000000445886 = FORMS_$$_BEFOREFINALIZATION + 22
  17.   $00000000004355A9 = SYSTEM_$$_INTERNALEXIT + 89

For the last address gdb shows:

Code: Text  [Select][+][-]
  1. (gdb) disas 0x4355A9
  2. Dump of assembler code for function SYSTEM_$$_INTERNALEXIT:
  3.    0x0000000000435550 <+0>:     push   %rbx
  4.    0x0000000000435551 <+1>:     push   %r12
  5.    0x0000000000435553 <+3>:     lea    -0x108(%rsp),%rsp
  6.    0x000000000043555b <+11>:    jmp    0x4355a9 <SYSTEM_$$_INTERNALEXIT+89>
  7.    0x000000000043555d <+13>:    nopl   (%rax)
  8.    0x0000000000435560 <+16>:    lea    0x7f6049(%rip),%rax        # 0xc2b5b0 <FPC_THREADVAR_RELOCATE>
  9.    0x0000000000435567 <+23>:    mov    (%rax),%rax
  10.    0x000000000043556a <+26>:    test   %rax,%rax
  11.    0x000000000043556d <+29>:    je     0x43557c <SYSTEM_$$_INTERNALEXIT+44>
  12.    0x000000000043556f <+31>:    lea    0x7f4f6a(%rip),%rdx        # 0xc2a4e0 <U_$SYSTEM_$$_INOUTRES>
  13.    0x0000000000435576 <+38>:    mov    (%rdx),%edi
  14.    0x0000000000435578 <+40>:    call   *%rax
  15.    0x000000000043557a <+42>:    jmp    0x435587 <SYSTEM_$$_INTERNALEXIT+55>
  16.    0x000000000043557c <+44>:    lea    0x7f4f5d(%rip),%rax        # 0xc2a4e0 <U_$SYSTEM_$$_INOUTRES>
  17.    0x0000000000435583 <+51>:    add    $0x8,%rax
  18.    0x0000000000435587 <+55>:    movw   $0x0,(%rax)
  19.    0x000000000043558c <+60>:    lea    0x5b20cd(%rip),%rax        # 0x9e7660 <TC_$SYSTEM_$$_EXITPROC>
  20.    0x0000000000435593 <+67>:    mov    (%rax),%rbx
  21.    0x0000000000435596 <+70>:    lea    0x5b20c3(%rip),%rax        # 0x9e7660 <TC_$SYSTEM_$$_EXITPROC>
  22.    0x000000000043559d <+77>:    movq   $0x0,(%rax)
  23.    0x00000000004355a4 <+84>:    mov    %rbx,%rax
  24.    0x00000000004355a7 <+87>:    call   *%rax
  25.    0x00000000004355a9 <+89>:    lea    0x5b20b0(%rip),%rax        # 0x9e7660 <TC_$SYSTEM_$$_EXITPROC>  <== here AV occurs!
  26.    0x00000000004355b0 <+96>:    cmpq   $0x0,(%rax)
  27.    0x00000000004355b4 <+100>:   jne    0x435560 <SYSTEM_$$_INTERNALEXIT+16>
  28.    0x00000000004355b6 <+102>:   lea    0x5b2193(%rip),%rax        # 0x9e7750 <TC_$SYSTEM_$$_WRITEERRORSTOSTDERR>
  29. ...

I found the source for this in file <installdir>/fpcsrc/rtl/inc/system.inc:
Code: Pascal  [Select][+][-]
  1. Procedure InternalExit;
  2. var
  3.   current_exit : Procedure;
  4. {$ifdef FPC_HAS_FEATURE_CONSOLEIO}
  5.   pstdout : ^Text;
  6. {$endif}
  7. {$if defined(MSWINDOWS) or defined(OS2)}
  8.   i : longint;
  9. {$endif}
  10. Begin
  11. {$ifdef SYSTEMDEBUG}
  12.   writeln('InternalExit');
  13. {$endif SYSTEMDEBUG}
  14. {$ifndef CPUAVR}
  15.   while exitProc<>nil Do
  16.    Begin
  17.      InOutRes:=0;
  18.      current_exit:=tProcedure(exitProc);
  19.      exitProc:=nil;
  20.      current_exit();
  21.    End;
  22. {$endif CPUAVR}
  23. ...

I remember var system.exitProc from good old Turbo Pascal times. I found some usages of this var in some of my common units, but I checked twice, that they all are never used in this project. Then I improved my program to display the value of system.exitProc at the beginning and the end of my program and got both times 'system.ExitProc=4479168'.

For this adress gdb finds procedure BeforeFinalization() in unit Forms, which only calls Application.DoBeforeFinalization():
Code: Pascal  [Select][+][-]
  1. procedure TApplication.DoBeforeFinalization;
  2. var
  3.   i: Integer;
  4. begin
  5.   if Self=nil then exit;
  6.   for i := ComponentCount - 1 downto 0 do
  7.   begin
  8.     // DebugLn('TApplication.DoBeforeFinalization ',DbgSName(Components[i]));
  9.     if i < ComponentCount then
  10.       Components[i].Free;
  11.   end;
  12. end;

This inspired me to create this procedure:
Code: Pascal  [Select][+][-]
  1. procedure show_Application_Components;
  2.    {displays the names of all Application.Components[]}
  3.  
  4.    function get_Component_Name(p: TObject): ansistring;
  5.       begin
  6.       if p=nil then exit('nil');
  7.       if p is TComponent then exit(TComponent(p).Name + ':' + p.ClassName);
  8.       exit(p.ClassName);
  9.       end;
  10.  
  11.    var i: integer;
  12.    begin
  13.    writeln('show_Application_Components => ');
  14.    for i:=Application.ComponentCount-1 downto 0 do
  15.       writeln(i:2, ' ', get_Component_Name(Application.Components[i]));
  16.    end;
and to call it in my main source file, directly after line Application.Run(), which is the very last line of this file.

The result after playing a little with my program is this terminal output:
Code: Text  [Select][+][-]
  1. Event INI_save / Form1: Left=0 Top=0
  2. show_Application_Components =>
  3.  6) :TFormZ
  4.  5) :TFormMM
  5.  4) :TFormH
  6.  3) :TFormR
  7.  2) :THintWindow
  8.  1) :TCustomTimer
  9.  0) Form1:TForm1
So if the AV will occur again, then I can see which classes have been involved.

Question:
Has somebody an idea how to improve this code, to try to detect, which of those 'Components[].Free' will fail?

Quote
(I'm assuming that the destructor Destroy is never called explicitly, as it should be).
Yes, a Destroy() is called nowhere in my program and all used units.

Quote
Did you try with a range checking turned on?
Yes, my program and all used units always are compiled with $R+ (since the time I work on this program).

440bx

  • Hero Member
  • *****
  • Posts: 5805
Something isn't quite right somewhere...

You mentioned address $0000 7FFF F504 69A0 and that's not possible.  The highest address in user mode is $0000 07ff fffe 0000


$0000 7FFF F504 69A0  <- your address
$0000 07ff fffe 0000  <- highest possible address


Presuming there is simply one "F" too many in there, the address is still too high to be valid in a normal application program as it would fall somewhere in the PEB or a TEB (presuming it falls somewhere valid), the interesting thing is, that address is not valid in that context (some loop or a destroy method.)

It is actually very surprising you didn't get an AV with that address.  It might be just a coincidence that it happened to point to a valid spot in memory.

Looking at the code you posted, the value in rax is the qword at [rbp - 0x10], there is no way that value is valid.  As I mentioned previously, it's very surprising an AV didn't take occur.

The next thing I'd do is get a stack frame at that breakpoint to determine which code made the call.  The idea is to determine how rax got this invalid value.  Unfortunately, from what you mentioned,  that routine is called a lot.  One thought that comes to mind is to add a condition to the breakpoint.  Something along the lines of "rax > $07FF 0000 0000" (without the spaces.)

Disclaimer: I don't know if the condition I suggested above is acceptable to FpDebug.  Hopefully, @Martin will read this and add pertinent details to how to set the breakpoint properly.

Again, the one thing I don't understand is how you did not get an AV given that invalid address.  It is really unlikely (but, I admit, possible) for that address to be accessible.

Anyway, in that situation, I'd just try a few "off the cuff" things to get an idea of what's happening.  This is obviously not very practical as a "plan of action" for someone else to follow but, I don't have anything better.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Hartmut

  • Hero Member
  • *****
  • Posts: 1000
Hello 440bx, I verified the value of rax when reaching BP $5d1b42 with a new debug run. See attached screenshot: it's value is now not exactly the same as before, but very close to it. Now it is $0000 7FFF F504 6900. After reaching this BP, I continued the program until it's very end, and no AV occured. My PC is old and has 8 GB RAM = 8.589.934.592 = $2.0000.0000 Bytes. I agree, that this rax value is very strange.

Quote
Looking at the code you posted, the value in rax is the qword at [rbp - 0x10], there is no way that value is valid.
But from my understanding, when reaching the BP at $5d1b42, rax contains the qword at [rax+0xa0]. Am I wrong?
Code: Text  [Select][+][-]
  1. include/control.inc:5111                  for i := 0 to FAnchoredControls.Count - 1 do
  2. 00000000005D1B37 488b45f0                 mov    rax,QWORD PTR [rbp-0x10]
  3. 00000000005D1B3B 488b80a0000000           mov    rax,QWORD PTR [rax+0xa0]
  4. 00000000005D1B42 8b4010                   mov    eax,DWORD PTR [rax+0x10]

Quote
The next thing I'd do is get a stack frame at that breakpoint to determine which code made the call.  The idea is to determine how rax got this invalid value.
That would be far beyond my horizon and I doubt that this will be enough to see the root cause (see following chapter).

Quote
Unfortunately, from what you mentioned,  that routine is called a lot.
That routine is called a lot, but the BP at $5d1b42 was called only once, after having played only a little with the program. As I said, I assume, that after having worked with the program much longer and more intensive, that then this BP will be called more often. But from looking at the source file positions, which I added to the "Stack trace:" table in reply #5, I assume, that the caller(s) from that BP in TControl.Destroy() will be a longer chain of other Destroy() procedures. And we know from that table, that the AV occured, in procedure when the top most executed procedure was system.InternalExit(). I see not so much sense in fighting more at this BP, as long as the AV does not occur. I see there a too bad ratio of effort to success.

I hope for more benefits by adding some more debug outputs at meaningful source positions, so that I get more detailed informations, if the AV occors again some day.
E.g. by improving procedure show_Application_Components() in reply #5, e.g. by trying to detect, which of those following 'Components[].Free' in procedure TApplication.DoBeforeFinalization() - also shown in reply #5 - will fail.
Are there any ideas for?
« Last Edit: August 31, 2025, 12:54:40 pm by Hartmut »

440bx

  • Hero Member
  • *****
  • Posts: 5805
I have to admit that I am at a loss... if the value in rax is what the debugger says it is then the instruction at $5D1B42 should always cause an AV because the address in rax is not even in the range that is accessible in user mode.

based on the code you posted, [rax + $a0] is where FAnchoredControls resides.  IOW, at an offset $a0 from the beginning of the class that contains it.

Succinctly, I cannot explain why the instruction at $5D1B3B does not cause an AV since, if rax is the value shown by the debugger then, it definitely should AV on that.

Just in case, I'd like you to verify that the highest accessible address in that program's process is $7ff fffe 0000 (without spaces.)  To verify that, use the attached VirtualQueryEx (Windows only of course) example program.  Find your program on the left hand side, click on it, scroll the right hand side to the bottom, the last address should be $7ff fffe 0000.  if it isn't then I have some re-thinking to do.

The attachment includes an executable but, the entire source code is there.  You can create your own executable if you wish.,



FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11793
  • Debugger - SynEdit - and more
    • wiki
Just scanned through the posts. Hopefully not missed anything important...

About the resolved address: The line is only correct, if nothing, absolutely nothing else changed for the recompilation, except for adding debug info. Any other compiler setting changed, there is a chance that setting change where addresses end up.

rax in conditions of FpDebug %RAX

But given that the exception occurred only once, I would assume either a race condition (if threads are involved) or otherwise a dangling pointer at some other point.

I.e. The error may manifest at the address found, but it may not originate there. A dangling pointer anywhere else could have shot the data that get accessed by that found code.

If that is the case, then running on Linux, and under valgrind (memcheck) would be a good start. On Window you can try heaptrc with "keepreleased" in env (check docs for heaptrc). But heaptrc, is only at best 10% of what valgrind offers.

Of course, if you are on FPC 3.2.2, there is a bug in the compiler. That bug can (very very very rare) on Windows 64bit create bad code, causing a dangling pointer where there is non). Well it may be on other targets too, but that has afaik not been reproduced by anyone yet.
If you could reproduce the issue, I would say try with all optimization OFF, including for packages, including LCL.

But since you can't reproduce... On the other hand, if the error is in the Pascal code, then valgrind can detect it, even if it does not crash.
« Last Edit: August 30, 2025, 08:41:25 am by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11793
  • Debugger - SynEdit - and more
    • wiki
7fff.... could be code in the kernel.

It wouldn't be data. It wouldn't be the expected data that the code wants to access. But it could be readable.

But then, either you are operating on freed mem, then heaptrc with keepreleased may help.
Or there is a dangling pointer elsewhere. Then you need valgrind.

Hartmut

  • Hero Member
  • *****
  • Posts: 1000
Thanks to 440bx and Martin_fr for your new posts.

Just in case, I'd like you to verify that the highest accessible address in that program's process is $7ff fffe 0000 (without spaces.)  To verify that, use the attached VirtualQueryEx (Windows only of course) example program.
The AV occured only 1 time on Linux. The only Windows which I have on this computer is a very old Win7 32-bit, which can only address 4 GB from the existing 8 GB installed RAM. Are you sure that it makes sense to run VirtualQueryEx on this Win7 32-bit?



About the resolved address: The line is only correct, if nothing, absolutely nothing else changed for the recompilation, except for adding debug info.
For the recompilation nothing had to be changed, because debug info had already been active all the time (see screenshot in reply #2). With gdb I used exactly the same executable which caused the AV.

Quote
But given that the exception occurred only once, I would assume either a race condition (if threads are involved) or otherwise a dangling pointer at some other point.
This program uses no threads.

Quote
If that is the case, then running on Linux, and under valgrind (memcheck) would be a good start.
I know absolute nothing about valgrind and it's not installed on my system. I found https://wiki.freepascal.org/Debugging_with_Valgrind. I fear that it would cost plenty of time to learn everything until it works really. Because the AV is not reprocuceable and occured only once in more than 1 year I want to wait with this effort, if this AV occurs more often. But I checked with heaptrc (see below).

Quote
Of course, if you are on FPC 3.2.2, there is a bug in the compiler.
For this older project I use Lazarus 2.0.10 and FPC 3.2.0.

But then, either you are operating on freed mem, then heaptrc with keepreleased may help.
I enabled 'heaptrc' in Compiler Options / Debugging (there is no Option for 'keepreleased', see screenshot in reply #2). Then I started my program several times and played with a lot of modal and non modal dialogs but 'heaptrc' was always without complaints.

jamie

  • Hero Member
  • *****
  • Posts: 7302
My 2 Cents worth.

You have a control linked to another, like Actions for example that are not getting notified that the control no longer exists.

 I suppose even the anchor list of controls could be attempting to address some controls when it should not be?

Just a thought.

Jamie
The only true wisdom is knowing you know nothing

440bx

  • Hero Member
  • *****
  • Posts: 5805
The AV occured only 1 time on Linux. The only Windows which I have on this computer is a very old Win7 32-bit, which can only address 4 GB from the existing 8 GB installed RAM. Are you sure that it makes sense to run VirtualQueryEx on this Win7 32-bit?
I thought this occurred in Windows, likely my mistake.  Anyway, if it had been under Windows then that address is not accessible in user mode. Since this is happening in Linux, I simply have no idea of what is ok or not ok because my knowledge of Linux is extremely deficient.

As far as the VirtualQueryEx program I posted, it will run in 32 bit Windows 7 and quite likely even XP (it was originally written on Win 95.)

One thought, which I don't know if it will help but, maybe running your program under that old 32bit Win 7 might shed some light as to what went wrong in the Linux environment.  It's a bit of a long shot but, if it could prove useful.

Given that the problem you're experiencing is on Linux, I don't know that I can be of much help.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 18306
  • Here stood a man who saw the Elbe and jumped it.
About your remark about exitproc:

You should manage your own exitproc correctly, not by setting it directly, but by using AddExitProc instead.
That does not break the exitproc chain and prevents that an exitproc can not be executed.
This can very well be the cause of your problem.(If you use your own exitproc, that is)
If you want to use exitproc, use it properly: AddExitProc. That was already the case in TP by the way.
Never ever use the exitproc procedure pointer directly.

The InternalExit walks all exitprocs and executes them in lifo order. If you assign your own exitproc to exitproc directly, all other exitprocs disappear, will not be executed.
« Last Edit: August 30, 2025, 02:30:33 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

 

TinyPortal © 2005-2018