Recent

Author Topic: really sad sigsegv in freemem. perhaps even tragic.  (Read 5449 times)

yogo1212

  • New Member
  • *
  • Posts: 22
really sad sigsegv in freemem. perhaps even tragic.
« on: November 16, 2014, 04:04:03 pm »
Hi :-)

I'm experiencing weird problems with my game engine.

There is a SIGSEGV in FreeMem. Very simple code:
Code: [Select]
GetMem(tmpbytes, len);
Move(Data[i * esize], tmpbytes[0], len);
Move(tmpbytes[0], Data[(i + 1) * esize], len);
Freemem(tmpbytes, len);

It works about twenty times. Then, when the method is called from one particular resource-loader, it crashes.

The only change i made before compiling (and before that it worked) was to wrap the data being stored in another type. (TVec3 -> TCol3, both records with three floats. One with x,y,z the other with r,g,b) and i can't see why this caused the error to appear

If the error occured in one of the moves, i would probably start looking at my indices. but ...
You know...

i already checked that tmpbytes and len don't change and really i am quite puzzled  %)

This guy appears to have had the same error:
http://forum.lazarus.freepascal.org/index.php?topic=20403.0

Could one of you help find the cause?
« Last Edit: November 16, 2014, 04:15:20 pm by yogo1212 »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #1 on: November 16, 2014, 04:26:44 pm »
Quote
Code: [Select]
Move(tmpbytes[0], Data[(i + 1) * esize], len);

Are you sure that there is enough space in data?

If you write behind the end of data, then that will cause a crash.

If you write behind the end of data and tmpbytes was allocated in memory right after data, then you get an error when freeing tmpbytes

yogo1212

  • New Member
  • *
  • Posts: 22
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #2 on: November 16, 2014, 04:35:52 pm »
Are you sure that there is enough space in data?
Hmm, maybe I should have posted the complete function:
Code: [Select]
procedure TContinuousMemoryManager.Insert(i: cardinal);
var
tmpbytes: PByte;
len: cardinal;
begin
len := (used - i) * esize;
if used = capacity then
begin
Inc(capacity, bsize);
tmpbytes := Data;
GetMem(Data, capacity * esize);
Move(tmpbytes[0], Data[0], i * esize);
Move(tmpbytes[i * esize], Data[(i + 1) * esize], len);
Freemem(tmpbytes);
end
else if len <> 0 then
begin
GetMem(tmpbytes, len);
Move(Data[i * esize], tmpbytes[0], len);
Move(tmpbytes[0], Data[(i + 1) * esize], len);
Freemem(tmpbytes);
end;
Inc(used);
end;   

Quote
If you write behind the end of data, then that will cause a crash.
'can cause'. but i can step past both moves.

Quote
If you write behind the end of data and tmpbytes was allocated in memory right after data, then you get an error when freeing tmpbytes
that's a really nice idea! just give me a moment to check (though i'm sure data is big enough - because i wrote the code  :P ).


UPDATE:
tmpbytes: pbyte($00007FFFF7FDF260)  ' co'
Data: pbyte($00007FFFF7F812B0)  #16'ddk]'#182#168#11#16'ddk'#1
used: 9
capacity: 50
esize: 32
i: 0
len: 288

hmm, doesn't look like it :-(
is there a possibilty that i accidently destroyed fpc's internal allocation table?
is this worth a bug-report?
how do i debug fpc-internals?
« Last Edit: November 16, 2014, 04:44:57 pm by yogo1212 »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #3 on: November 16, 2014, 06:17:49 pm »
Quote
but i can step past both moves.
Quote
is there a possibilty that i accidently destroyed fpc's internal allocation table?

If any of the moves writes outside boundaries of the allocated memory, then it may destroy other data.

The move will not crash, unless you write to memory not owned by your app (then the OS will trap it). Most times crossing the boundaries of one mem block, will keep you in mem owned by your app.

However, at some later time the memory you overwrote will be accessed, and then just anything can happen.

If at any move you happen to write into fpc internal structures, then the next, or second next, or maybe 10 or 20 get/freemem later is going to crash.

Since very likely after the allocated block there will be other mem managed by fpc, there will be an internal fpc structure. So if you write to much, then it is only a question of time until fpc accesses the node that was overwritten.


Quote
is this worth a bug-report?
I highly doubt the bug is in free mem. But if it is, you will probably need better proof than your current code. (The error could be in some other procedure, if "move" is used elsewhere.)

Couple of things:

Use heaptrc -gh
It adds a few checks. Like checksum to freed mem. Id does however  no detect the line whent the error happens. If will (if it detects your case) warn you at some later time.


Add asserts. plenty of them.

Code: [Select]
len := (used - i) * esize;what if negative?

Code: [Select]
records with three floatsJust 3 floats, nothing else?

You are aware that if you use "move" on data, that contains ansistring or array, then you will be in for trouble too?

yogo1212

  • New Member
  • *
  • Posts: 22
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #4 on: November 16, 2014, 07:01:41 pm »

The move will not crash, unless you write to memory not owned by your app (then the OS will trap it). Most times crossing the boundaries of one mem block, will keep you in mem owned by your app.

Since very likely after the allocated block there will be other mem managed by fpc, there will be an internal fpc structure. So if you write to much, then it is only a question of time until fpc accesses the node that was overwritten.

and i thought i was aware of all this stuff.. does internal and application-memory get mixed when not using c-mem also?

ok, i wrapped getmem, move and freemen in order to be sure i wasn't doing anything bad with illegal access:
Code: [Select]
data getmem: 00007FFFF7F812B0 1600
// ....
tmpbytes getmem: 00007FFFF7EBD180 288
move 00007FFFF7F812B0 to 00007FFFF7EBD180: 288
move 00007FFFF7EBD180 to 00007FFFF7F812D0: 288
freemem 00007FFFF7EBD180
EDIT:
maybe i should explain what the code does..
it adds a slot between elements at index i by moving all data at and behind i to i+1.
in this case, it shifts the whole data by 32 bytes. (B0->D0)

Quote
I highly doubt the bug is in free mem. But if it is, you will probably need better proof than your current code. (The error could be in some other procedure, if "move" is used elsewhere.)
i know that usually it's your own fault when something is broken but i have seen errors in library, the windows-api or gnu-libc.
that's why i provided a link to a thread with some guy who also had segfaults in freemem (i couldn't spot the error in his code either)

Quote
Use heaptrc -gh
i have no idea what that is but i've set up valgrind. the output is attached. (wtf ? 0x161ab4c701047125 !!)

Quote
Add asserts. plenty of them.
naah, you're right there is hardly any debug-output in my code... i will have to think about doing that by default.
but so far, i haven't been able to reproduce the error reliably in smaller test-applications.

Quote
Code: [Select]
len := (used - i) * esize;what if negative?
all types are cardinal. and i must always be <= used. i'm adding checks for big and absurd numbers anyway :-)

Quote
Just 3 floats, nothing else?
nothing else. packed, even.

Quote
You are aware that if you use "move" on data, that contains ansistring or array, then you will be in for trouble too?
that is the downside of having highly-integrated strings; but i can assure you, i always copy strings in arrays of char before getting nasty and make sure i pass the pointer to the first element to c-libraries, for instance.
« Last Edit: November 16, 2014, 07:10:49 pm by yogo1212 »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #5 on: November 16, 2014, 07:55:29 pm »
I am not sure about cmem.

valgrind:
Code: [Select]
Invalid read of size 16
==25055==  Address 0xd4a75d0 is 5,088 bytes inside a block of size 5,100 alloc'd

This is normal. When there are 12 bytes needed, it may be more efficient to read 16.

Code: [Select]
Address 0x161ab4c701047125 is not stack'd, malloc'd or (recently) free'd
And that is not good.

So at
enginememory.pas:384
you probably use a pointer that is dangling or uninitialized.


Find out what variable it is.

Then (if it is NOT a local var) you can set a watchpoint to break whenever it changes.

if it is a member of an object, ensure that the object has not been destroyed

-- edit /Correction

Code: [Select]
procedure waitfree_fixed(pmc: pmemchunk_fixed; poc: poschunk);
begin
{$ifdef FPC_HAS_FEATURE_THREADING}
  entercriticalsection(heap_lock);
{$endif}
  pmc^.next_fixed := poc^.freelists^.waitfixed;
  poc^.freelists^.waitfixed := pmc;
{$ifdef FPC_HAS_FEATURE_THREADING}
  leavecriticalsection(heap_lock);
{$endif}
end;

Probably while folowing the linked list of mem chunks. So something (your code) has overriden a part of the fpc mem managment structure.
« Last Edit: November 16, 2014, 08:14:30 pm by Martin_fr »

yogo1212

  • New Member
  • *
  • Posts: 22
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #6 on: November 17, 2014, 01:21:14 am »
nice clues!

sadly, i have already spent too much time with this problem for today :-p

i'll resume the witch-hunt tomorrow

EDIT:
one more thing before i go:
the attached file is a screenshot of my debug-view.
i'm way to tired..
« Last Edit: November 17, 2014, 01:57:47 am by yogo1212 »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #7 on: November 17, 2014, 01:39:49 am »
Use watchpoints (data break points)
http://wiki.lazarus.freepascal.org/IDE_Window:Breakpoints

Always define them as global scope. write access only

Use one to determine when the pointer in Data changes.

use one to determine, if and when the last byte after data changes. data[(capacity+1)*esize]

When data changes, you need to delete, and recreate the 2nd watchpoint

yogo1212

  • New Member
  • *
  • Posts: 22
Re: really sad sigsegv in freemem. perhaps even tragic.
« Reply #8 on: December 01, 2014, 11:21:09 am »
OK

I found it!

in some earlier code i had used three arrays and i wrote over the boundaries of the second.
that left the control-structure of the third array in a broken state and 10 or so allocations later everything brakes.

the only reason i was able to find that was the git log :-D

 

TinyPortal © 2005-2018