Recent

Author Topic: TGeckoBrowser's GetElementsByTagName func.  (Read 41909 times)

anna

  • Sr. Member
  • ****
  • Posts: 426
TGeckoBrowser's GetElementsByTagName func.
« on: November 28, 2011, 06:21:27 am »
Hallo. Can anyone explain me what must be in first argument of GetElementsByTagName. It some nsAString type. What is it?
http://img507.imageshack.us/img507/7918/31619505.png
http://img31.imageshack.us/img31/4683/39765490.png

I'm trying to code
var s,s2:string;
begin

{xulrunner unpacking into project folder}

      GeckoBrowser1:=TGeckoBrowser.Create(Form1);
      GeckoBrowser1.parent := Panel1           ;
      GeckoBrowser1.Left := 1                  ;
      GeckoBrowser1.Top := 1                   ;
      GeckoBrowser1.Width := 903                ;
      GeckoBrowser1.Height := 632               ;
      {GeckoBrowser1.u}
      GeckoBrowser1.LoadURI('http://google.com');
      s:='img';
GeckoBrowser1.ContentDocument.GetElementsByTagName(@s).Item(1).FirstChild.GetNodeValue(@s2);
       

WinXP SP3 Pro Russian 32-bit (5.1.2600)

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #1 on: November 28, 2011, 06:27:14 am »
Is it nessesary to use TIStringImpl. I complitely don't understand but think TIStringImpl is a converter normal string to nsAString.
WinXP SP3 Pro Russian 32-bit (5.1.2600)

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #2 on: November 28, 2011, 12:46:41 pm »
Use IInterfacedString when passing strings to Gecko. TIStringImpl is a container for IInterfacedString adding ref counting so that you don't have do destroy the object explicitly. You can't access TIStringImpl directly. Use Newstring that creates a TIStringImpl and returns the IInterfacedString. 
Simple example getting all tags in the document:

Code: [Select]
uses
  GeckoBrowser,nsGeckoStrings,nsXPCOM;
....

var
  s,s2:IInterfacedString;
  NL:nsIDOMNodeList;
  i:integer;
  sName:string;
begin
  GeckoBrowser1.LoadURI('http://google.com');
  s:=newstring(widestring('*'));
  s2:=newstring;
  NL:=GeckoBrowser1.ContentDocument.GetElementsByTagName(s.AString);
  i:=NL.GetLength();
  while i>0 do
    begin
    NL.Item(i-1).GetNodeName(s2.AString);
    sName:=s2.ToString;
    i:=i-1;
    end;
end;

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #3 on: November 28, 2011, 07:11:32 pm »
Use IInterfacedString when passing strings to Gecko. TIStringImpl is a container for IInterfacedString adding ref counting so that you don't have do destroy the object explicitly. You can't access TIStringImpl directly. Use Newstring that creates a TIStringImpl and returns the IInterfacedString. 
Simple example getting all tags in the document:

Code: [Select]
uses
  GeckoBrowser,nsGeckoStrings,nsXPCOM;
....

var
  s,s2:IInterfacedString;
  NL:nsIDOMNodeList;
  i:integer;
  sName:string;
begin
  GeckoBrowser1.LoadURI('http://google.com');
  s:=newstring(widestring('*'));
  s2:=newstring;
  NL:=GeckoBrowser1.ContentDocument.GetElementsByTagName(s.AString);
  i:=NL.GetLength();
  while i>0 do
    begin
    NL.Item(i-1).GetNodeName(s2.AString);
    sName:=s2.ToString;
    i:=i-1;
    end;
end;

Your code finds only 3 tags:
BODY
HEAD
HTML

Why? What about img or frame?

Please tell how to extract tag's value (another words innerHTML). GetNodeValue returns empty string...
« Last Edit: November 29, 2011, 04:36:44 am by anna »
WinXP SP3 Pro Russian 32-bit (5.1.2600)

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #4 on: November 29, 2011, 04:36:53 am »
And NL.Item(i-1). has not GetElementsByTagName func so it's impossible to get all tags recursively.

I have try next code :

  GeckoBrowser1.LoadURI('http://lazarus.freepascal.org');
  s:=newstring(widestring('BODY'));
  s2:=newstring;
  NL:=GeckoBrowser1.ContentDocument.GetElementsByTagName(s.AString);
  NL2:=NL.Item(0).GetChildNodes();
  NL2.Item(0).GetNodename(s2.AString);
  sName:=s2.ToString;
  memo1.text:=memo1.text+sName+#$0D#$0A;

But i see ''sigseg'' error in highlighted line   
« Last Edit: November 29, 2011, 06:31:31 am by anna »
WinXP SP3 Pro Russian 32-bit (5.1.2600)

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #5 on: November 29, 2011, 10:05:32 am »
What started as a question on nsAString type became a "solve my problem" ;D

The problem is that GeckoBrowser1.LoadURI is loading in the background and when you call GeckoBrowser1.ContentDocument you get a default empty page as explained here: https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebNavigation.
Split the code 'loading page/getting tags' in 2 routines behing 2 buttons and you'll see what I mean.

Anticipating your next question: TGeckoBrowser1.OnDocumentComplete can be used to detect when the page is loaded completely.


ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #6 on: November 29, 2011, 07:27:52 pm »
Before I forget it,
Quote
  NL2.Item(0).GetNodename(s2.AString);
 
But i see ''sigseg'' error in highlighted line   

This is all late binding. All methods are pointers loaded at run-time. You should check that the pointer is valid before using it:

Code: [Select]
if  assigned(NL2) and  assigned(NL2.Item(0)) then NL2.Item(0).GetNodename(s2.AString)

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #7 on: December 02, 2011, 06:04:24 am »
What started as a question on nsAString type became a "solve my problem" ;D

The problem is that GeckoBrowser1.LoadURI is loading in the background and when you call GeckoBrowser1.ContentDocument you get a default empty page as explained here: https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebNavigation.
Split the code 'loading page/getting tags' in 2 routines behing 2 buttons and you'll see what I mean.

Anticipating your next question: TGeckoBrowser1.OnDocumentComplete can be used to detect when the page is loaded completely.

Thank you very much. But I still have difficulty with getting innerHTML. I've written code, but memo2 is empty after button1 clicking.
[spoiler][/spoiler]

unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls,
  GeckoBrowser,

  nsGeckoStrings,nsXPCOM;

type

  { TForm1 }

  TForm1 = class(TForm)
    Button1: TButton;
    Button2: TButton;
    GeckoBrowser1: TGeckoBrowser;
    Memo1: TMemo;
    Memo2: TMemo;
    procedure Button1Click(Sender: TObject);
    procedure GeckoBrowser1DocumentComplete(Sender: TObject);
  private
    { private declarations }
  public
    { public declarations }
  end;

var
  Form1: TForm1;
  IsComplete:boolean;

implementation

{$R *.lfm}

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
var
  s,s2,s3:IInterfacedString;
  NL:nsIDOMNodeList;
  i:integer;
  sName,sValue:string;
begin
  IsComplete:=false;
  GeckoBrowser1.LoadURI('http://google.com');
  while true do
    begin
      application.ProcessMessages;
      if IsComplete=true then break;
    end;
  s:=newstring(widestring('*'));
  s2:=newstring;
  s3:=newstring;
  NL:=GeckoBrowser1.ContentDocument.GetElementsByTagName(s.AString);
  i:=NL.GetLength();
  while i>0 do
    begin
    NL.Item(i-1).GetNodeName(s2.AString);
    sName:=s2.ToString;
    NL.Item(i-1).GetNodeValue(s3.AString);
    sValue:=s3.ToString;
    memo1.text:=memo1.text+sName+#$0D#$0A;
    memo2.text:=memo2.text+sValue+#$0D#$0A;
    i:=i-1;
    end;
end;

procedure TForm1.GeckoBrowser1DocumentComplete(Sender: TObject);
begin
  IsComplete := true;
end;

end.
WinXP SP3 Pro Russian 32-bit (5.1.2600)

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #8 on: December 02, 2011, 10:18:38 am »
That is normal. Look at the doc http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-1841493061.  NodeName returns tagname (memo1) for an element node and null (memo2) for nodeValue.

If you are looking for innerHTML then you are out of luck. The innerHTML method is defined in nsIDOMNSHTMLElement (see http://doxygen.db48x.net/mozilla-full/html/d5/d47/interfacensIDOMNSHTMLElement.html) and the gecko port only defines nsIDOMHTMLElement. 

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #9 on: December 02, 2011, 01:19:56 pm »
That is normal. Look at the doc http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-1841493061.  NodeName returns tagname (memo1) for an element node and null (memo2) for nodeValue.

If you are looking for innerHTML then you are out of luck. The innerHTML method is defined in nsIDOMNSHTMLElement (see http://doxygen.db48x.net/mozilla-full/html/d5/d47/interfacensIDOMNSHTMLElement.html) and the gecko port only defines nsIDOMHTMLElement.

Ok, is there some way to hook html-page receiving ? If is so, maybe it's possible to modifire the raw html as text?
WinXP SP3 Pro Russian 32-bit (5.1.2600)

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #10 on: December 02, 2011, 01:52:33 pm »
Quote
Ok, is there some way to hook html-page receiving ? If is so, maybe it's possible to modifire the raw html as text?
I haven't found anything in this gecko port that would give you access to the raw data. XPCOM is mainly intended to work with the DOM and (almost) everything can be reached by inspecting the DOM tree and looking at the attributes. Is there something you want to do in particular that can't be done with the current implementation?

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #11 on: December 02, 2011, 02:12:30 pm »
... Is there something you want to do in particular that can't be done with the current implementation?
To normal work with some multy-frame (<frameset>) site, I need to clear content inside the second <frame> -tag.
WinXP SP3 Pro Russian 32-bit (5.1.2600)

anna

  • Sr. Member
  • ****
  • Posts: 426
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #12 on: December 02, 2011, 02:28:05 pm »
Also I would like to auto - filling of some <input> text fields.
WinXP SP3 Pro Russian 32-bit (5.1.2600)

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #13 on: December 02, 2011, 04:36:00 pm »
Also I would like to auto - filling of some <input> text fields.
Here is the code that will fill the Donate $ input field with 1000$ on this page  ;D  (load http://lazarus.freepascal.org)
Code: [Select]
procedure TForm1.donate;
var s,s2:IInterfacedString;
  NL:nsIDOMNodeList;
  NNM:nsIDOMNamedNodeMap;
  i,j:integer;
  sName:string;
begin
  s:=newstring('input');
  s2:=newstring;
  NL:=GeckoBrowser1.ContentWindow.GetDocument().GetElementsByTagName(s.AString);
  i:=NL.GetLength();
  while i>0 do
    begin
    NNM:=NL.Item(i-1).GetAttributes();
    j:=NNM.GetLength();
    while j>0 do
      begin
      NNM.Item(j-1).GetNodeName(s2.AString);
      sName:=s2.ToString;
      if sName='value' then
        begin
        NNM.Item(j-1).GetNodeValue(s2.AString);
        sName:=s2.ToString;
        if sName='Donate $' then
          begin
          s2:=newstring('1000');
          NNM.Item(j-1).SetNodeValue(s2.AString);
          exit;
          end;
        end;
      j:=j-1;
      end;
    i:=i-1;
    end;
end;

The search criteria I used are:
-tag 'input'
-attribute name 'value'
-attribute value 'Donate $'

If the input field has an ID, which is common. You can do this much faster with the following code. This will fill in the username in the login field on the freepascal wiki http://wiki.freepascal.org/index.php?title=Special:Userlogin&returnto=Main_Page

Code: [Select]
procedure TForm1.login;
var s,s2:IInterfacedString;
  NE:nsIDOMElement;
begin
  s:=newstring('wpName1');
  NE:=GeckoBrowser1.ContentDocument.GetElementById(s.AString);
  if assigned(NE) then
    begin
    s:=newstring('value');
    s2:=newstring('XPCOM');
    NE.SetAttribute(s.AString,s2.AString);
    end;
end;


ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TGeckoBrowser's GetElementsByTagName func.
« Reply #14 on: December 02, 2011, 05:27:37 pm »
... Is there something you want to do in particular that can't be done with the current implementation?
To normal work with some multy-frame (<frameset>) site, I need to clear content inside the second <frame> -tag.
Following example will clear the RECIPES frame in the page http://www.htmlcodetutorial.com/frames/nestedfs.html
Code: [Select]
procedure TForm1.cleanframe;
var s,s2:IInterfacedString;
  NL:nsIDOMNodeList;
  NNM:nsIDOMNamedNodeMap;
  N:nsIDOMNode;
  i,j:integer;
  sName:string;
begin
  s:=newstring('frame');
  s2:=newstring;
  NL:=GeckoBrowser1.ContentWindow.GetDocument().GetElementsByTagName(s.AString);
  i:=NL.GetLength();
  while i>0 do
    begin
    NNM:=NL.Item(i-1).GetAttributes();
    j:=NNM.GetLength();
    while j>0 do
      begin
      NNM.Item(j-1).GetNodeName(s2.AString);
      sName:=s2.ToString;
      if sName='name' then
        begin
        NNM.Item(j-1).GetNodeValue(s2.AString);
        sName:=s2.ToString;
        if sName='RECIPES' then
          begin
          N:=NL.Item(i-1);
          NL.Item(i-1).ParentNode.RemoveChild(N);
          exit;
          end;
        end;
      j:=j-1;
      end;
    i:=i-1;
    end;
end;

 

TinyPortal © 2005-2018