I would appreciate a pointer or two on how to traverse the domhtmldoc manually.
There is
http://wiki.lazarus.freepascal.org/GeckoPort, the examples and a long thread on this forum
http://forum.lazarus.freepascal.org/index.php?topic=15352.0 that went in to the dark "bowels" of TGeckoBrowser.
A piece of advise to gain quite some time figuring out how things work:
- install firebug in firefox
- load a page you want to analyze
- open firebug, go to console and browse the DOM from the command line starting with 'document'. Code completion works nicely. For example, to try what GetElementsByTagName('*') gives, type document.GetEleGetElementsByTagName("*") in the console.
- when you find what you need, translate to pascal. nsIDOMElement has a lot of descendants and some methods you've used in the firebug console will be in the descendants of nsIDOMElement. firebug uses late binding a discovers all methods of the interface (it talks to the descendant) while TGeckoBrowser uses early binding and you need to tell the compiler which descendant you are talking to. This is done by an explicit "cast". Example: set the value of an 'input" element becomes '(nsIDOMElement as nsIDOMHTMLInputElement).SetValue(s.AString);' . The file nsXPCOM.pas has all the interface declarations and a quick search for the method will give you the name of the descendant(s). The thread mentioned earlier has many examples.
-warning: TGeckoBrowser uses an old version of XULrunner. You'll encounter methods in firebug that are not implemented yet in the old XULrunner.
-above method works for more than browsing. You can actually test out modifying attributes, adding/removing elements, etc.
Traverse the domhtmldoc manually to form the source code back (formatting would be gone of course, and the attributes are adjusted to their current state).
Everything is adjusted to their current state. If there is a piece of javascript that adds or removes a few items then you see only the result and no way to find out what was done in the html and what was done in the script. If in the OnLoad of the body a new page is loaded (script redirection), you"ll get the new page only and won't find any trace of the initial page.