Recent

Author Topic: Is there a HTML Dom parser library for lazarus?  (Read 611 times)

wytwyt02

  • New Member
  • *
  • Posts: 44
Is there a HTML Dom parser library for lazarus?
« on: November 14, 2019, 09:46:07 pm »
I wanna to parse some html file, I do not wanna to use regex

wp

  • Hero Member
  • *****
  • Posts: 6471
Re: Is there a HTML Dom parser library for lazarus?
« Reply #1 on: November 14, 2019, 09:58:48 pm »
Maybe FastHTMLParser? I posted some examples here in the forum.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7613
Re: Is there a HTML Dom parser library for lazarus?
« Reply #2 on: November 14, 2019, 10:18:59 pm »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

dsiders

  • Full Member
  • ***
  • Posts: 238
Re: Is there a HTML Dom parser library for lazarus?
« Reply #3 on: November 14, 2019, 11:20:26 pm »
I wanna to parse some html file, I do not wanna to use regex

FPC includes fcl-xml/src/sax_html.pp. It has THTMLReader and convenience routines like ReadHTMLFile (should've been called ReadHTMLDocument) and ReadHTMLFragment.
Lazarus 2.0.4 / FPC 3.0.4 / Windows 8.1 64-bit

wytwyt02

  • New Member
  • *
  • Posts: 44
Re: Is there a HTML Dom parser library for lazarus?
« Reply #4 on: November 15, 2019, 03:25:24 am »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager

dsiders

  • Full Member
  • ***
  • Posts: 238
Re: Is there a HTML Dom parser library for lazarus?
« Reply #5 on: November 15, 2019, 05:40:43 am »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager

That's because it's not a package on its own... it's a unit. See fpc/packages/chm/src/fasthtmlparser.pas. Example usage is in fpc/packages/chm/src/htmlindexer.pas.

It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.
Lazarus 2.0.4 / FPC 3.0.4 / Windows 8.1 64-bit

Thaddy

  • Hero Member
  • *****
  • Posts: 9288
Re: Is there a HTML Dom parser library for lazarus?
« Reply #6 on: November 15, 2019, 07:38:20 am »
also related to equus asinus.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7613
Re: Is there a HTML Dom parser library for lazarus?
« Reply #7 on: November 15, 2019, 12:33:02 pm »
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom

Zoran

  • Hero Member
  • *****
  • Posts: 1468
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Is there a HTML Dom parser library for lazarus?
« Reply #8 on: November 15, 2019, 05:00:19 pm »

dsiders

  • Full Member
  • ***
  • Posts: 238
Re: Is there a HTML Dom parser library for lazarus?
« Reply #9 on: November 15, 2019, 05:47:01 pm »
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom

Yes, sax_html has THTMLToDOMConverter.
Lazarus 2.0.4 / FPC 3.0.4 / Windows 8.1 64-bit