Forum > Networking and Web Programming

Is there any library that would make process of making a website scrapper easier


Basically I want to make a web scraper for one of the sites I love that has awful UI so that I can interact with it better. Is there any Lazarus/Free Pascal library that would make this process easier? I'd rather not parse HTML by hand.

Our member benibela has a good library for that:
Simple example that extracts all hrefs from a page:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---uses simpleinternet, xquery; var a: IXQValue;begin  for a in process('', '//a/@href') do    writeln(a.toString);end.You need to undefine USE_PASDBLSTRUTILS_FOR_JSON in


[0] Message Index

Go to full version