Author Topic: [solved] Get the Text of a HTML-Page (like it's shown on your screen) into a txt  (Read 2916 times)


  • New member
  • *
  • Posts: 13
Make a little app to open the link in your preferred browser, and use MouseAndKeyInput unit (search the forum) to send your keyboard combination and sequence. This involves waiting enough for the page to load, or knowing when it finishes loading.

Or use WP example to retrieve the text from another website that provides the service of turning your website into text (search Google for these sites).

Instead of the link you were using, try this in WP's code:

You can parse the result using RegEx or InternetTools ... etc.

WOW, THANK YOU VERY MUCH!!! Thats exactly what I was looking for :)
How did you know you have to cut some parts out of the link? This made it so easy for me :)
Now I can use this for my next program, where I allready wrote down a procedure to get my needed informations in the right order and formatting ;)

Thanks to everyone who helped me! My question is solved now :)


  • Hero Member
  • *****
  • Posts: 635
  • ..... A day not Laughed is a day not Lived !!
    • Nursing With Humour
small addition to get it working on Mac OSx.
Code: Pascal  [Select]
  1. procedure THtmlTextExtractor.TagFoundHandler(NoCaseTag, ActualTag: String);
  2. var
  3.   c: Char;
  4. begin
  5.   // Use the FIgnore flag to skip some tags not needed
  6.   FIgnore := (Pos('<SCRIPT', NoCaseTag) = 1) or
  7.              (Pos('<BUTTON', NoCaseTag) = 1) or
  8.              (NoCaseTag = '</HTML>');
  9.   if FIgnore then
  10.     Exit;
  13.   c := {$IfDef UNIX}LineEnding{$Else}LineEnding[1]{$IfEnd};
  14.   // Write a line-break after these tags
  15.   if (NoCasetag = '<BR>') or (NoCaseTag = '<BR />') or (NoCaseTag = '<BR/>') or
  16.      (NoCaseTag = '</P>') or (NoCaseTag = '</DIV>') or (NoCaseTag = '</TR>')
  17.   then
  18.     FTempStream.Write(c, Sizeof(LineEnding));
  19. end;
You treat a disease, you win, you lose.
You treat a person and I guarantee you, you win, no matter the outcome.

Lazarus 2.0.0 / FPC 3.0.4
Lazarus Trunc / FPC 3.0.4
Mac OS X Mojave