Forum > General

Circular referenced interfaces automatic collector


Hi folks. Here is a unit I've written today. Can anybody look at it and say if it worth any note. I am not a pro and am not offended by reviews containing words like "shit" if there is something more specific in them ;).

Many people enjoy using interfaces since they are reference-counted and destroy themselves automatically  8-) . But there is a problem when one interface refers to another which refers (maybe indirectly) to the former: they will never expel :'( .

People tried to write a garbage collector for Pascal, but it requires replacing standard memory manager which kills heaptrc, let alone the total bulk of the task and overheads of automatic GC over manual one; some programmers just think that if you chose Pascal, you chose old good manual memory management.

But in some applications you can't avoid structures with impredictable cross-referring. There are reference matrices and some other ways of keeping all the reference map in sight, but in many means nothing can replace for us mere objects connected by pointers to each other. For such structures, we instead of tracing garbage collector inside memory manager can have a reference-analyzing collector: it can be easily, firstly, dedicted to manage only some groups of objects we want  it to without slowing down others; and, secondary, combined with such an easy but powerful procedure as reference counting.

If we want to keep an eye on references, the first Free Pascal tool we are thinking about are interfaces.

The presented module, cref5cgc (which means "Cyclical-REFerence algorithm of 5 Colors Garbage Collecting tool"), provides basical instrumentary for creating interfaces which will destroy themselves despite of being circulary cross-referred. An interface I5cCollected is exported for them, and an abstract class T5cCollected realizes functionality of this interface.

The core of the algorithm is taken from this research. It was modified to support accurate calling of destructors. Managed interfaces, additionally to reference counter, have an attribute of a color (exported ic... constants) and buffered flag (which requires adding just 1 byte field per object). Normally, all of them are coloured black. If an interface dereferred but not to 0 to be destroyed, it's _Release method paints it purple and puts it into inner cache (currently, it's an inner threadvar of type TSetOfPointers — red-black tree, implemented by FCL-STL unit gset). It can be repainted black again if an _AddRef shows it that it is not yet forgotten. When T5cCollected.collectCycles() class method is invoked (it happens automatically when cache achieves a specific number of objects, and on thread end), all purple objects are tested for if they are in a set of objects referring only each other. If they are, method predestroy is invoked for each of the interfaces except ones already unreferred from others; for the latter, destructor is called. While predestroys and destroys derefer objects of the dead loop, reference counts of others of them reachs 0, which as usual forces their destruction (which mostly happens after pre-destruction, which must unlink all what possible but leave the object useable for its parents' destructors needs, at least for performing on them _Release).

Writing a descendant for T5cCollected, one must override virtual methods nextChild(out current: pointer;var buf: pointer) (iterates by all outcoming references to I5cCollected interfaces) and predestroy, as well as destructor destroy. Remember that you don't know if predestroy will be called prior to destroy; to inform, use e.g. standard FreeAndNil() function in pre-destructor on the resources on which you use method Free() in destructor (see example code).

For reducing the tracing load, you can paint objects which are guaranteed not to have cyclical references green. Moreover, if some method is guaranteed not to produce any new cycles (while might keep existing ones), wrap it into pair of special class methods as following:

--- Code: ---procedure TMyClass.MyFastMethod(a,b,c:I5cCollected{furtherly we see, we should type them pointer});
var v:boolean=true;

--- End code ---

One has to know that destructors and pre-destructors are called from such "green-time" blocks, so if one needs a cycle-unsafe block in them (it is not desired but fully possible), it must be wrapped in the same manner but with initially false buffer variable. That is also notable that the example above won't protect a,b and c from becoming purple: Pascal derefers interface parameters on exiting any function; more lurking trap is that it also sometimes derefer temporary variables created inside a function (e.g. when you use Obj as intf expression) only after it exits. To contend with it, one should pass interfaces as either objects or pointers, and wrap all object modifying stuff into special included procedure (often inline, as can be see in the realization of collectCycles inside the unit).

Details and limitations

* You can safely have references from and to unmanaged interfaces, but it's on your own responsibility to avoid cycles in them.
* It seems that we can do significantly less traversing in many applications if we introduce generations of managed interfaces. If all links to interfaces of specific generation which existed in some prior collect phase are alive, then all these objects are also still alive. The mechanism requires not only generation field in each object, but also some way to notify an unlinked object that it is unlinked from a newer generation object and thus is not yet possible garbage.
* The automatical operations on interfaces involves them into multiple unnecessary _AddRef..._Release steps, which fastly overpopulate the collector cache. They are performed also in "green time" mode, and even in collecting time (the methods are overloaded to be switched off by an inner switch for this time, but they are still called). Maybe we should consider replacing the interfaces with classes and do counting manually, which of course is laborious and bug-attracting.
* The collector is stop-the-world. It traverses indetermined number of variables each collecting phase and then releases indetermined volume of dynamic stuff which can slow down your application significantly.
* The collector is not multithread. You can use managed objects in different threads if you have created the threads with T5cCollected.beginThread method, but use each object in exactly one thread. Look into the research paper mentioned above to see 7-color concurrent algorithm.

First, you get a WOW!
Second you get a WTG!
Good understandable Comments, a nice example, that made you afford clear.
I am impressed.
If I could I'd grant you a (BEER) and give a (Y) and a big +1

Thanks  :)


[0] Message Index

Go to full version