Recent

Author Topic: Is there a HTML Dom parser library for lazarus?  (Read 3279 times)

wytwyt02

  • Jr. Member
  • **
  • Posts: 83
Is there a HTML Dom parser library for lazarus?
« on: November 14, 2019, 09:46:07 pm »
I wanna to parse some html file, I do not wanna to use regex

wp

  • Hero Member
  • *****
  • Posts: 11855
Re: Is there a HTML Dom parser library for lazarus?
« Reply #1 on: November 14, 2019, 09:58:48 pm »
Maybe FastHTMLParser? I posted some examples here in the forum.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Is there a HTML Dom parser library for lazarus?
« Reply #2 on: November 14, 2019, 10:18:59 pm »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

dsiders

  • Hero Member
  • *****
  • Posts: 1052
Re: Is there a HTML Dom parser library for lazarus?
« Reply #3 on: November 14, 2019, 11:20:26 pm »
I wanna to parse some html file, I do not wanna to use regex

FPC includes fcl-xml/src/sax_html.pp. It has THTMLReader and convenience routines like ReadHTMLFile (should've been called ReadHTMLDocument) and ReadHTMLFragment.
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

wytwyt02

  • Jr. Member
  • **
  • Posts: 83
Re: Is there a HTML Dom parser library for lazarus?
« Reply #4 on: November 15, 2019, 03:25:24 am »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager

dsiders

  • Hero Member
  • *****
  • Posts: 1052
Re: Is there a HTML Dom parser library for lazarus?
« Reply #5 on: November 15, 2019, 05:40:43 am »
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager

That's because it's not a package on its own... it's a unit. See fpc/packages/chm/src/fasthtmlparser.pas. Example usage is in fpc/packages/chm/src/htmlindexer.pas.

It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Is there a HTML Dom parser library for lazarus?
« Reply #6 on: November 15, 2019, 07:38:20 am »
Specialize a type, not a var.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Is there a HTML Dom parser library for lazarus?
« Reply #7 on: November 15, 2019, 12:33:02 pm »
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom

Zoran

  • Hero Member
  • *****
  • Posts: 1829
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Is there a HTML Dom parser library for lazarus?
« Reply #8 on: November 15, 2019, 05:00:19 pm »

dsiders

  • Hero Member
  • *****
  • Posts: 1052
Re: Is there a HTML Dom parser library for lazarus?
« Reply #9 on: November 15, 2019, 05:47:01 pm »
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom

Yes, sax_html has THTMLToDOMConverter.
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

 

TinyPortal © 2005-2018