Recent

Author Topic: Parse HTML document  (Read 4530 times)

russdirks

  • New Member
  • *
  • Posts: 35
Parse HTML document
« on: December 14, 2016, 04:38:39 am »
I would like to read an HTML document from disk, find an element by its "id", modify it and write it back out to disk.  Here's what I have so far:

Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.         Classes, DOM, DOM_HTML, SAX_HTML;
  7.  
  8. var
  9.         doc: THTMLDocument;
  10.         reader: THTMLReader;
  11.         converter: THTMLToDOMConverter;
  12.         stream: TFileStream;
  13.         element: TDOMElement;
  14.  
  15. begin
  16.         doc := THTMLDocument.Create;
  17.         reader := THTMLReader.Create;
  18.         converter := THTMLToDOMConverter.Create(reader, doc);
  19.         stream := TFileStream.Create('network.html', fmOpenRead);
  20.         reader.ParseStream(stream);
  21.         writeln(doc.Title);    // this works
  22.         element := doc.GetElementById('BlockNumber');   // this returns nil
  23. end.
  24.  
  25.  

Some sample HTML:

Code: Pascal  [Select][+][-]
  1. <html>
  2. <head>
  3. <title>This is a test</title>
  4. </head>
  5.  
  6. <body>
  7.  
  8.     <p id="BlockNumber">464564</p>
  9.     <p id="BlockTime">00:21</p>
  10.  
  11. </body>
  12. </html>
  13.  

As I commented in my code, I tried the GetElementById, but it returned nil.  I need to know how to access an element, modify it and then write it back out.  I've also considered using an XPath search, but I don't know how to do that either.  Hoping someone can help me.  There's not much documentation on this that I could find.

 

TinyPortal © 2005-2018