* * *

Author Topic: Component for stripping HTML?  (Read 233 times)

HatForCat

  • Sr. Member
  • ****
  • Posts: 250
Component for stripping HTML?
« on: April 21, 2017, 06:07:06 pm »
Subject says it all.

I just need to strip all the HTML from incoming data. Mostly incoming data it is all in plain text, but sometimes I get an HTML page. I have no control over what is coming in, so I'd just like to strip out all the HTML and just see the text parts.

Short of including the "HtmlViewer-11.7" from GitHub, I have not yet found anything. I installed that HtmlViewer but it is too complicated for my simple needs. I do not want to spend a few days figuring it out just so I can strip out the HTML. :)

Helpful thoughts?

Thanks

Acer-i5, 2.6GHz, 6GB, 256-SSD, Ubuntu 14.04-LTS, Lazarus 1.6.2, SQLite3 -- Retired: Programming for my use on Ubuntu.

Bart

  • Hero Member
  • *****
  • Posts: 2625
    • Bart en Mariska's Webstek
Re: Component for stripping HTML?
« Reply #1 on: April 21, 2017, 06:12:57 pm »
Simple approach: accept data until '<', drop data until '>'. Repeat until end of data.
You'll end up with plain text.

You could also use fasthtmlparser (it comes with fpc) to parse the text and use OnTag and OnText events to filter at will.

Bart

wp

  • Hero Member
  • *****
  • Posts: 3335
Re: Component for stripping HTML?
« Reply #2 on: April 21, 2017, 06:13:44 pm »
The fasthtmlparser from fpc packages\chm\src will do the job. Write a handler for its OnFoundText which fires whenever a text node is found within the html tree.

I once posted sample code for your task here, use the forum search
Lazarus trunk / fpc 3.0.0 / Win32

HatForCat

  • Sr. Member
  • ****
  • Posts: 250
Re: Component for stripping HTML?
« Reply #3 on: April 22, 2017, 12:23:58 am »
Thanks, but I am also getting some other stuff in some of HTML data. It is showing C++ style Remarks etc. Probably F# or worse.

I have had to mess with the HtmlViewer from GitHub.

Acer-i5, 2.6GHz, 6GB, 256-SSD, Ubuntu 14.04-LTS, Lazarus 1.6.2, SQLite3 -- Retired: Programming for my use on Ubuntu.

wp

  • Hero Member
  • *****
  • Posts: 3335
Re: Component for stripping HTML?
« Reply #4 on: April 22, 2017, 12:26:15 am »
Can you zip one of these html files and upload it here to have a look?
Lazarus trunk / fpc 3.0.0 / Win32

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus