Forum > Networking and Web Programming
HTML to text
pcurtis:
Does anyone know how or have some snippet on how to remove all HTML tags from a file, but leave the text?
Thanks.
speter:
Included below is a quick and dirty version of what you want...
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure clean(fn : string);const magic = '!#%&';var f : textfile; s,t : string; a,b : integer;begin assignfile(f,fn); reset(f); s := ''; repeat readln(f,t); s += t+' '+magic; until eof(f); closefile(f); repeat a := pos('<',s); b := pos('>',s,a); if (a > 0) and (b > 0) then delete(s,a,b-a+1); until (a=0); repeat a := pos(magic,s); if a > 0 then begin memo1.append(copy(s,1,a-1)); delete(s,1,a+3); end; until (a=0);end;
Note that this code preserves line-endings and things like tab characters in the original html file.
If you don't care about the line-endings you can leave that out by changing line #14 to
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---s += t+' ';and the last loop to
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---memo1.append(s);or similar. :)
SymbolicFrank:
A good "magic" character sequence to use is #0. It's legal and won't be in the string.
wp:
There's a ready-made function for this task in unit HTML2TextRender:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- function RenderHTML2Text(const AHTML: String): String;
pcurtis:
Thanks. How to use it?
Navigation
[0] Message Index
[#] Next page