Forum > Networking and Web Programming

Is there any library that would make process of making a website scrapper easier

(1/1)

Rave:
Basically I want to make a web scraper for one of the sites I love that has awful UI so that I can interact with it better. Is there any Lazarus/Free Pascal library that would make this process easier? I'd rather not parse HTML by hand.

Thaddy:
Our member benibela has a good library for that:
https://www.benibela.de/sources_en.html#internettools
Simple example that extracts all hrefs from a page:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---uses simpleinternet, xquery; var a: IXQValue;begin  for a in process('https://freepascal.org', '//a/@href') do    writeln(a.toString);end.You need to undefine USE_PASDBLSTRUTILS_FOR_JSON in internettoolsconfig.inc.

Navigation

[0] Message Index

Go to full version