Bookstore

Recent

Author Topic: Programmatically saving the text content of a displayed web page  (Read 314 times)

MaxCuriosus

  • Jr. Member
  • **
  • Posts: 66
How to save the text content of a web page using solely or mainly Pascal code?

If not possible what are the alternatives?

lucamar

  • Hero Member
  • *****
  • Posts: 2579
Re: Programmatically saving the text content of a displayed web page
« Reply #1 on: March 26, 2020, 06:02:56 pm »
Very basically: Download the page using for example fphttpclient, parse it with the xml/html DOM or sax-like classes/functions and retrieve the text nodes.

Look for "DOM parse" and/or "HTML parser" in the wiki for some more info.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.6/FPC 3.0.4 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

winni

  • Hero Member
  • *****
  • Posts: 1115
Re: Programmatically saving the text content of a displayed web page
« Reply #2 on: March 26, 2020, 06:18:59 pm »
Hi!

First search this forum: topright in the menu.

The question was already discussed:

https://forum.lazarus.freepascal.org/index.php?topic=43090.0

Winni
 

MaxCuriosus

  • Jr. Member
  • **
  • Posts: 66
Re: Programmatically saving the text content of a displayed web page
« Reply #3 on: March 27, 2020, 04:46:41 pm »
lucamar,
parsing the resultant html or plain text file is not the issue. My question regards the extraction/capture/copying of text present in a displayed page, by Pascal code preferably.

winni,
thank you for the "search" reminder, always appropriate.

The thread you mentioned doesn't seem to give a solution to my problem, here is why:

The page I'm concerned with is generated after a login to a site. It is short but contains some "run time" .js scripts. The tool "fphttpclient" will download that page (like "wget" under Linux) but not the variable displayed text. I don't want/need to re-download that page, only capture the text part of what is already there.

What I'm hoping for is a (simple) tool to copy the actual displayed text items directly to a file.

If not possible, I can use a series of simulated keys.
In Firefox: ... Ctrl+A Ctrl+C, but then how do I paste into a empty text file programmatically?

lucamar

  • Hero Member
  • *****
  • Posts: 2579
Re: Programmatically saving the text content of a displayed web page
« Reply #4 on: March 27, 2020, 05:05:21 pm »
If not possible, I can use a series of simulated keys.
In Firefox: ... Ctrl+A Ctrl+C, but then how do I paste into a empty text file programmatically?

Those simulated keys are sent from your program? Then all you'd need do is get the clipboard's text content with Clipboard.AsText or Clipboard.GetTextBuf() (in unit Clipbrd) and save that string to a file with, say, normal stream or old-style file operations.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.6/FPC 3.0.4 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

madref

  • Hero Member
  • *****
  • Posts: 772
  • ..... A day not Laughed is a day wasted !!
    • Nursing With Humour
Re: Programmatically saving the text content of a displayed web page
« Reply #5 on: March 27, 2020, 07:39:37 pm »
Take a look at this thread. https://forum.lazarus.freepascal.org/index.php/topic,44814.0.html


It has your solution in it somewhere. You just have to adapt it to your situation

You treat a disease, you win, you lose.
You treat a person and I guarantee you, you win, no matter the outcome.

Lazarus 2.0.6 / FPC 3.0.4
Lazarus Trunc / FPC Trunc
Mac OS X Mojave