Lazarus

Programming => General => Topic started by: wytwyt02 on November 14, 2019, 09:46:07 pm

Title: Is there a HTML Dom parser library for lazarus?
Post by: wytwyt02 on November 14, 2019, 09:46:07 pm
I wanna to parse some html file, I do not wanna to use regex
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: wp on November 14, 2019, 09:58:48 pm
Maybe FastHTMLParser? I posted some examples here in the forum.
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: marcov on November 14, 2019, 10:18:59 pm
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: dsiders on November 14, 2019, 11:20:26 pm
I wanna to parse some html file, I do not wanna to use regex

FPC includes fcl-xml/src/sax_html.pp. It has THTMLReader and convenience routines like ReadHTMLFile (should've been called ReadHTMLDocument) and ReadHTMLFragment.
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: wytwyt02 on November 15, 2019, 03:25:24 am
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: dsiders on November 15, 2019, 05:40:43 am
The chm compiler and compilelatexchm.pp in fpcdoc/ repo resp use fasthtml and fcl-xml's sax_html

How to use fasthtml?, I cannot see it in the package list and online package manager

That's because it's not a package on its own... it's a unit. See fpc/packages/chm/src/fasthtmlparser.pas. Example usage is in fpc/packages/chm/src/htmlindexer.pas.

It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: Thaddy on November 15, 2019, 07:38:20 am
this is also an option https://benibela.de/sources_en.html#internettools
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: marcov on November 15, 2019, 12:33:02 pm
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: Zoran on November 15, 2019, 05:00:19 pm
this is also an option https://benibela.de/sources_en.html#internettools

Only if your application is GPL.
Title: Re: Is there a HTML Dom parser library for lazarus?
Post by: dsiders on November 15, 2019, 05:47:01 pm
It's not a DOM parser though, as you requested. It signals events for tags and text. It does not build a DOM tree.

That's the sax_html, despite the name afaik the sax parser default feeds the dom

Yes, sax_html has THTMLToDOMConverter.