* * *

Author Topic: [solved] Get the Text of a HTML-Page (like it's shown on your screen) into a txt  (Read 1175 times)

Kevin95

  • New member
  • *
  • Posts: 13
Hello :)

First things first: I am from Germany, so my English isn't the best, sorry  O:-)
I learnt the basics of FreePascal in university (engineers have to learn the basics there), so I'm not very skilled in writing some complex programs.

I would like to write a program which writes down what's written on the screen in your browser (thats the body, isn't it?). Just like as if you visited this webpage, pressed Alt+A, then Alt+C, and lastly pasted in with Alt+V in a textfile and saved it.

I read about fphttpclient, but my little knowledge is not enough to understand it. I tried the following and I allways get RunError2 or Unknown Errors...

Code: Pascal  [Select]
  1. program BodyAuslesen;
  2. {$mode objfpc}{$H+}
  3. uses
  4. sysutils,fphttpclient;
  5. var
  6.   t:textfile;
  7. begin
  8.     assignfile(t,'Weber.txt');
  9.     reset(t);
  10.     WriteLn(t,TFPHTTPClient.SimpleGet('https://trainingslager.onlineliga.de/#url=/player/overview?playerId=28056'));
  11.     readln;
  12.     closefile(t);
  13. end.  
  14.  

The error is in line 10.
I hope the problem isn't hard to solve or anything big O:-)
Thanks for your attention. I’m looking forward to your reply :)
« Last Edit: November 05, 2018, 11:05:48 am by Kevin95 »

jamie

  • Hero Member
  • *****
  • Posts: 908
Re: Get the Body of a HTML-Page in a textfile
« Reply #1 on: November 03, 2018, 03:12:24 pm »
Something tells me you miss a few steps....

Looking at your code there.

 TFP...
 ^
 tells me you need to create an instant of that object and you are calling the class directly..

Var
  FPHttpClient :TFpHttpClient;
Begin
   FpHttpClient := TFpHttpClient.Create…….

 and in your code

 Writenln(t, fpHttpClient……..

I can't say that will work but I can venture to guess it will work better! :D

P.S.
  I forgot, make sure you free it when done

FpHttpClient.Free;

« Last Edit: November 03, 2018, 03:17:28 pm by jamie »

wp

  • Hero Member
  • *****
  • Posts: 5034
Re: Get the Body of a HTML-Page in a textfile
« Reply #2 on: November 03, 2018, 03:17:41 pm »
Two problems

(1) You did not open the text file for writing. You must call "Rewrite(t)", not "Reset(t)"
(2) This is a https url. Therefore you must put the dlls "libeay32.dll" and "ssleay32.dll" into the folder which contains the exe of your program. ATM, I don't know where you can find a good download site, but just use the forum's search function, there were several post regarding using fphttpclient with SSL. Be careful to use the correct bitness: if you create a 32-bit program use the 32-bit dlls, and if you create a 64-bit program use the 64-bit dlls (IIRC, even the 64-bit dlls have the "32" in their name!)

@jamie: SimpleGet is a class function, therefore, you need not create an instance of the class.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

jamie

  • Hero Member
  • *****
  • Posts: 908
Re: Get the Body of a HTML-Page in a textfile
« Reply #3 on: November 03, 2018, 03:19:58 pm »
Well Slap my butt....
 :o

P.S.

 I am using the ICS set of files slated for my old D3 compiler. I didn't bother to attempt compile them into a installable component, But I converted what
I needed to get the files to work in Lazarus so I could port over an app..

 Currently I have HttpClient, Ftpclient, FptServer etc that all work but I don't have the SSL version...

 I wonder if the author would be interested in making a package set for Lazarus ?
« Last Edit: November 03, 2018, 03:26:44 pm by jamie »

lucamar

  • Sr. Member
  • ****
  • Posts: 316
Re: Get the Body of a HTML-Page in a textfile
« Reply #4 on: November 03, 2018, 03:21:52 pm »
I would like to write a program which writes down what's written on the screen in your browser (thats the body, isn't it?). Just like as if you visited this webpage, pressed Alt+A, then Alt+C, and lastly pasted in with Alt+V in a textfile and saved it.

Do note that your code will save the full HTML page as it's come from the server; i.e. including HEAD, all the tags, etc. You'll have to parse it to get the text as it appears in your screen.
Been there, done that ... barely kept the timelines.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 1.8.4/fpc 3.0.4 - Ubuntu 10, Kubuntu 14, Windows XP SP3 (Home & Prof.) and various DOS incarnations.

Kevin95

  • New member
  • *
  • Posts: 13
Re: Get the Body of a HTML-Page in a textfile
« Reply #5 on: November 03, 2018, 03:23:48 pm »
Thanks for your quick replies jamie and wp :)

@jamie : It's a bit embarrassing for me to admit, but I only heard of classes and objects, but I never worked with them, so I don't understand what you are trying to explain :/

@wp: Oops, I corrected the Rewrite :D Ok, so now I'm gonna search for those libraries, thank you :)


Kevin95

  • New member
  • *
  • Posts: 13
Re: Get the Body of a HTML-Page in a textfile
« Reply #6 on: November 03, 2018, 03:25:02 pm »
I would like to write a program which writes down what's written on the screen in your browser (thats the body, isn't it?). Just like as if you visited this webpage, pressed Alt+A, then Alt+C, and lastly pasted in with Alt+V in a textfile and saved it.

Do note that your code will save the full HTML page as it's come from the server; i.e. including HEAD, all the tags, etc. You'll have to parse it to get the text as it appears in your screen.

Hello lucamar, thanks for your hint. Do you know a better/easier way for my problem?

Kevin95

  • New member
  • *
  • Posts: 13
Re: Get the Body of a HTML-Page in a textfile
« Reply #7 on: November 03, 2018, 03:35:44 pm »
I copied libeay32.dll and ssleay32.dll into the folder where my .exe program is.
The program works now :)

But now I realise that the textfile doesn't show me what I wanted to see (like as if you visited this webpage, pressed Alt+A, then Alt+C, and lastly pasted in with Alt+V in a textfile and saved it), instead it shows what lucamar told ( the full HTML page as it's come from the server; i.e. including HEAD, all the tags).

When i tried to search for the parts i was looking for these parts weren't in there... Does anyone have an Idea how to write the text that appears on your screen when you visit the page into my textfile?

Edit: I changed the Subject of this Topic from Body of a HTML Page to Text of a HTML-Page (like it's shown on your screen). Should have been clearer, sorry  :-[
« Last Edit: November 03, 2018, 04:27:18 pm by Kevin95 »

lucamar

  • Sr. Member
  • ****
  • Posts: 316
Re: Get the Body of a HTML-Page in a textfile
« Reply #8 on: November 03, 2018, 05:39:36 pm »
Hello lucamar, thanks for your hint. Do you know a better/easier way for my problem?

No, not really. There may be out there some renderer with a method like "SaveDocumentAsText" but I don't know of any one. Parsing HTML yourself isn't that difficult either since it has a rather strict syntax--at least from the point of view of extracting text. In fact, IIRC, there are a couple or three HTML parsers in Lazarus/FPC: look in fcl-web (I think) and in Lazarus' components folder.

The real drawback of this approach comes when the downloaded page relies on script(s) to generate the page's content. That's a different--and more difficult--proposition.
« Last Edit: November 03, 2018, 05:41:13 pm by lucamar »
Been there, done that ... barely kept the timelines.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 1.8.4/fpc 3.0.4 - Ubuntu 10, Kubuntu 14, Windows XP SP3 (Home & Prof.) and various DOS incarnations.

wp

  • Hero Member
  • *****
  • Posts: 5034
A relatively simple way to extract the text from a html string is using the THtmlParser in unit fasthtmlparser which comes with fpc. This parser scans the html string character by character and generates an event OnTagFound whenever a html tag is detected and an event OnTextFound whenever text is found between corresponding tags. To extract the entire text you essentially must concatenate all strings reported to the OnTextFound handler. However, there are some problems, e.g. script sections might be mixed with the "real" text. This can be avoided by introducing some flag which must be set when a <script> tag is found and reset for the matching </script> tag: while the flag is set the OnTextFound handler should be bypassed. Another special treatment is for html entities, which means that the character 'ä' should be inserted into the extracted text if the string "&auml;" is found in the html text. Or line breaks must be inserted into the extracted text, for example when a <br> tag is found, or after a </p>.

Here is a simple demo with the web site of the first post; it downloads the html text as discussed above, stores it in a file "test.html", extracts all text elements and writes a file "test.txt". Please note that this site make heavy use of css which is completely ignored by the THTMLParser:
Code: Pascal  [Select]
  1. program project2;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   classes, sysutils, fphttpclient, fasthtmlparser;
  7.  
  8. type
  9.   THtmlTextExtractor = class
  10.   private
  11.     FTempStream: TStream;
  12.     FIgnore: Boolean;
  13.     function CleanWhiteSpace(AText: String): String;
  14.     function FixHtmlEntities(AText: String): String;
  15.     procedure TagFoundHandler(NoCaseTag, ActualTag: string);
  16.     procedure TextFoundHandler(AText: String);
  17.   public
  18.     function ExtractFromHtml(AHtml: String): String;
  19.   end;
  20.  
  21. function THtmlTextExtractor.CleanWhiteSpace(AText: String): String;
  22. begin
  23.   if (AText <> '') and (AText[1] = #10) then
  24.     while (AText <> '') and (AText[1] in [#10, ' ', #9]) do Delete(AText, 1, 1);
  25.   Result := AText;
  26. end;
  27.  
  28. function THtmlTextExtractor.FixHtmlEntities(AText: String): String;
  29. var
  30.   P, PEnd: PChar;
  31.   s: String;
  32. begin
  33.   Result := '';
  34.   P := @AText[1];
  35.   PEnd := P + Length(AText);
  36.   while P < PEnd do begin
  37.     if P^ = '&' then
  38.     begin
  39.       s := '';
  40.       inc(P);
  41.       while (P <= PEnd) and (P^ <> ';') do begin
  42.         s := s + P^;
  43.         inc(P);
  44.       end;
  45.       case s of
  46.         'auml' : Result := Result + 'ä';
  47.         'Auml' : Result := Result + 'Ä';
  48.         'uuml' : Result := Result + 'ü';
  49.         'Uuml' : Result := Result + 'Ü';
  50.         'ouml' : Result := Result + 'ö';
  51.         'Ouml' : Result := Result + 'Ö';
  52.         'szlig': Result := Result + 'ß';
  53.         'nbsp' : Result := Result + ' ';
  54.         'lt'   : Result := Result + '<';
  55.         'gt'   : Result := Result + '>';
  56.         'amp'  : Result := Result + '&';
  57.         // ... add more...
  58.       end;
  59.     end else
  60.       Result := Result + P^;
  61.     inc(P);
  62.   end;
  63.   s := Result;
  64. end;
  65.  
  66. procedure THtmlTextExtractor.TagFoundHandler(NoCaseTag, ActualTag: string);
  67. begin
  68.   // Use the FIgnore flag to skip some tags not needed
  69.   if (Pos('<HTML', NoCasetag) = 1) or
  70.      (NoCaseTag = '</SCRIPT>') or
  71.      (NoCaseTag = '</BUTTON>')
  72.   then
  73.     FIgnore := false
  74.   else
  75.   if (Pos('<SCRIPT', NoCaseTag) = 1) or
  76.      (Pos('<BUTTON', NoCaseTag) = 1) or
  77.      (NoCaseTag = '</HTML>')
  78.   then
  79.     FIgnore := true;
  80.  
  81.   if FIgnore then
  82.     exit;
  83.  
  84.   // Write a line-break after these tags
  85.   if (NoCasetag = '<BR>') or (NoCaseTag = '<BR />') or (NoCaseTag = '<BR/>') or
  86.      (NoCaseTag = '</P>') or (NoCaseTag = '</DIV>') or (NoCaseTag = '</TR>')
  87.   then
  88.     FTempStream.Write(LineEnding[1], Sizeof(LineEnding));
  89. end;
  90.  
  91. procedure THtmlTextExtractor.TextFoundHandler(AText: String);
  92. var
  93.   s: String;
  94. begin
  95.   if FIgnore then
  96.     exit;
  97.   s := CleanWhiteSpace(AText);
  98.   if s = '' then
  99.     exit;
  100.   s := FixHtmlEntities(s);
  101.   FTempStream.Write(s[1], Length(s));
  102. end;
  103.  
  104. function THtmlTextExtractor.ExtractFromHtml(AHtml: String): String;
  105. var
  106.   parser: THtmlParser;
  107. begin
  108.   if AHtml = '' then
  109.     exit ('');
  110.  
  111.   parser := THtmlParser.Create(AHtml);
  112.   FTempStream := TMemoryStream.Create;
  113.   try
  114.     parser.OnFoundTag := @TagFoundHandler;
  115.     parser.OnFoundText := @TextFoundHandler;
  116.     parser.Exec;
  117.     FTempStream.Position := 0;
  118.     SetLength(Result, FTempStream.Size);
  119.     FTempStream.Read(Result[1], FTempStream.Size);
  120.   finally
  121.     FTempStream.Free;
  122.     parser.Free;
  123.   end;
  124. end;
  125.  
  126. procedure SaveStringToFile(AText, AFileName: String);
  127. var
  128.   F: TextFile;
  129. begin
  130.   AssignFile(F, AFileName);
  131.   Rewrite(F);
  132.   WriteLn(F, AText);
  133.   CloseFile(F);
  134. end;
  135.  
  136. var
  137.   s: String;
  138.   extractor: THtmlTextExtractor;
  139. begin
  140.   s := TFPHTTPClient.SimpleGet('https://trainingslager.onlineliga.de/#url=/player/overview?playerId=28056');
  141.   if s <> '' then begin
  142.     SaveStringToFile(s, 'text.html');
  143.     extractor := THTMLTextExtractor.Create;
  144.     try
  145.       s := extractor.ExtractFromHtml(s);
  146.       SaveStringToFile(s, 'test.txt');
  147.     finally
  148.       extractor.Free;
  149.     end;
  150.   end;
  151. end.
« Last Edit: November 03, 2018, 08:51:22 pm by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

Kevin95

  • New member
  • *
  • Posts: 13
A relatively simple way to extract the text from a html string is using the THtmlParser in unit fasthtmlparser which comes with fpc. This parser scans the html string character by character and generates an event OnTagFound whenever a html tag is detected and an event OnTextFound whenever text is found between corresponding tags. To extract the entire text you essentially must concatenate all strings reported to the OnTextFound handler. However, there are some problems, e.g. script sections might be mixed with the "real" text. This can be avoided by introducing some flag which must be set when a <script> tag is found and reset for the matching </script> tag: while the flag is set the OnTextFound handler should be bypassed. Another special treatment is for html entities, which means that the character 'ä' should be inserted into the extracted text if the string "&auml;" is found in the html text. Or line breaks must be inserted into the extracted text, for example when a <br> tag is found, or after a </p>.

Here is a simple demo with the web site of the first post; it downloads the html text as discussed above, stores it in a file "test.html", extracts all text elements and writes a file "test.txt". Please note that this site make heavy use of css which is completely ignored by the THTMLParser:
Code: Pascal  [Select]
  1. program project2;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   classes, sysutils, fphttpclient, fasthtmlparser;
  7.  
  8. type
  9.   THtmlTextExtractor = class
  10.   private
  11.     FTempStream: TStream;
  12.     FIgnore: Boolean;
  13.     function CleanWhiteSpace(AText: String): String;
  14.     function FixHtmlEntities(AText: String): String;
  15.     procedure TagFoundHandler(NoCaseTag, ActualTag: string);
  16.     procedure TextFoundHandler(AText: String);
  17.   public
  18.     function ExtractFromHtml(AHtml: String): String;
  19.   end;
  20.  
  21. function THtmlTextExtractor.CleanWhiteSpace(AText: String): String;
  22. begin
  23.   if (AText <> '') and (AText[1] = #10) then
  24.     while (AText <> '') and (AText[1] in [#10, ' ', #9]) do Delete(AText, 1, 1);
  25.   Result := AText;
  26. end;
  27.  
  28. function THtmlTextExtractor.FixHtmlEntities(AText: String): String;
  29. var
  30.   P, PEnd: PChar;
  31.   s: String;
  32. begin
  33.   Result := '';
  34.   P := @AText[1];
  35.   PEnd := P + Length(AText);
  36.   while P < PEnd do begin
  37.     if P^ = '&' then
  38.     begin
  39.       s := '';
  40.       inc(P);
  41.       while (P <= PEnd) and (P^ <> ';') do begin
  42.         s := s + P^;
  43.         inc(P);
  44.       end;
  45.       case s of
  46.         'auml' : Result := Result + 'ä';
  47.         'Auml' : Result := Result + 'Ä';
  48.         'uuml' : Result := Result + 'ü';
  49.         'Uuml' : Result := Result + 'Ü';
  50.         'ouml' : Result := Result + 'ö';
  51.         'Ouml' : Result := Result + 'Ö';
  52.         'szlig': Result := Result + 'ß';
  53.         'nbsp' : Result := Result + ' ';
  54.         'lt'   : Result := Result + '<';
  55.         'gt'   : Result := Result + '>';
  56.         'amp'  : Result := Result + '&';
  57.         // ... add more...
  58.       end;
  59.     end else
  60.       Result := Result + P^;
  61.     inc(P);
  62.   end;
  63.   s := Result;
  64. end;
  65.  
  66. procedure THtmlTextExtractor.TagFoundHandler(NoCaseTag, ActualTag: string);
  67. begin
  68.   // Use the FIgnore flag to skip some tags not needed
  69.   if (Pos('<HTML', NoCasetag) = 1) or
  70.      (NoCaseTag = '</SCRIPT>') or
  71.      (NoCaseTag = '</BUTTON>')
  72.   then
  73.     FIgnore := false
  74.   else
  75.   if (Pos('<SCRIPT', NoCaseTag) = 1) or
  76.      (Pos('<BUTTON', NoCaseTag) = 1) or
  77.      (NoCaseTag = '</HTML>')
  78.   then
  79.     FIgnore := true;
  80.  
  81.   if FIgnore then
  82.     exit;
  83.  
  84.   // Write a line-break after these tags
  85.   if (NoCasetag = '<BR>') or (NoCaseTag = '<BR />') or (NoCaseTag = '<BR/>') or
  86.      (NoCaseTag = '</P>') or (NoCaseTag = '</DIV>') or (NoCaseTag = '</TR>')
  87.   then
  88.     FTempStream.Write(LineEnding[1], Sizeof(LineEnding));
  89. end;
  90.  
  91. procedure THtmlTextExtractor.TextFoundHandler(AText: String);
  92. var
  93.   s: String;
  94. begin
  95.   if FIgnore then
  96.     exit;
  97.   s := CleanWhiteSpace(AText);
  98.   if s = '' then
  99.     exit;
  100.   s := FixHtmlEntities(s);
  101.   FTempStream.Write(s[1], Length(s));
  102. end;
  103.  
  104. function THtmlTextExtractor.ExtractFromHtml(AHtml: String): String;
  105. var
  106.   parser: THtmlParser;
  107. begin
  108.   if AHtml = '' then
  109.     exit ('');
  110.  
  111.   parser := THtmlParser.Create(AHtml);
  112.   FTempStream := TMemoryStream.Create;
  113.   try
  114.     parser.OnFoundTag := @TagFoundHandler;
  115.     parser.OnFoundText := @TextFoundHandler;
  116.     parser.Exec;
  117.     FTempStream.Position := 0;
  118.     SetLength(Result, FTempStream.Size);
  119.     FTempStream.Read(Result[1], FTempStream.Size);
  120.   finally
  121.     FTempStream.Free;
  122.     parser.Free;
  123.   end;
  124. end;
  125.  
  126. procedure SaveStringToFile(AText, AFileName: String);
  127. var
  128.   F: TextFile;
  129. begin
  130.   AssignFile(F, AFileName);
  131.   Rewrite(F);
  132.   WriteLn(F, AText);
  133.   CloseFile(F);
  134. end;
  135.  
  136. var
  137.   s: String;
  138.   extractor: THtmlTextExtractor;
  139. begin
  140.   s := TFPHTTPClient.SimpleGet('https://trainingslager.onlineliga.de/#url=/player/overview?playerId=28056');
  141.   if s <> '' then begin
  142.     SaveStringToFile(s, 'text.html');
  143.     extractor := THTMLTextExtractor.Create;
  144.     try
  145.       s := extractor.ExtractFromHtml(s);
  146.       SaveStringToFile(s, 'test.txt');
  147.     finally
  148.       extractor.Free;
  149.     end;
  150.   end;
  151. end.

WOW! Thank you very much for your effort! I tried your program and the result is amazing, its very clear. I could have never made this, I have never learnt this in university as the main goal was to teach us logical planning with the help of FreePascal...

I feel bad for asking, but could you have a look at the website i try to get the text from? Here it is again: https://trainingslager.onlineliga.de/#url=/player/overview?playerId=28056
Funnily your program kinda 'missed out' the part i was looking for. If you look at the Page there is kind of a profile. I am looking for these informations. It is strange that exactly these informations are missing, for example if i search for NORMAN WEBER then the textfile can't find anything like that. Its this part I am highly looking for:

Code: Pascal  [Select]
  1. NAMENorman Weber
  2. NATIONALITÄT Deutschland
  3. AKTUELLES TEAM  Sieger aus Spiel 4
  4. ALTER20 Jahre
  5. POSITIONDefensives Mittelfeld
  6. GEWICHT90 kg
  7. GRÖSSE1,85 Meter
  8. FUSSRechts
  9. MARKTWERT400.892 €
  10. VERTRAGSLAUFZEIT Ende Saison 9 (4 Saisons, 36 Wochen)
  11. IM VEREIN SEIT Saison 2 (Saison 1, Saisonende, Woche 41)
  12. GESAMT6. ONLINELIGA5. ONLINELIGA4. ONLINELIGA
  13. 3. ONLINELIGA2. ONLINELIGA1. ONLINELIGAFRIENDLIES
  14. SPIELE
  15. 9595--
  16. ----
  17. TORE
  18. 333300
  19. 0000
  20. ASSISTS
  21. 373700
  22. 0000
  23. KARTEN (G/GR/R)
  24. 12 / 1 / 112 / 1 / 10 / 0 / 00 / 0 / 0
  25. 0 / 0 / 00 / 0 / 00 / 0 / 00 / 0 / 0
  26. FÄHIGKEITENØ-Gesamt
  27. 39%
  28. Fitness
  29. 54%
  30. Kondition
  31. 22%
  32. Schnelligkeit
  33. 7%
  34. Technik
  35. 31%
  36. Schusstechnik
  37. 38%
  38. Schusskraft
  39. 58%
  40. Kopfball
  41. 23%
  42. Zweikampf
  43. 77%
  44. Taktikverst.
  45. 77%
  46. Athletik
  47. 39%
  48. Talent
  49. 41% (wird noch ermittelt)
  50.  
  51.  

When you load the page this part isnt loaded yet at the start, it kinda like appears after a very shor time, so I think the program couldn't write these things down because they weren't loaded yet, is this possible? Sorry if I am completely wrong, I dont have (enough) experience on it ...

Do you have any ideas how to fix our program to make it include the part I am missing? Or do you thing there is no way and it is intentional made so this part cannot be read by a program?

Thanks to everyone for helping me so far :)

lucamar

  • Sr. Member
  • ****
  • Posts: 316
All those parts you're looking for are most probably fetched by some script(s) after the browser loads the page, not in the page itself. To get them you'd have to do whatever it's the JS script(s) is doing. Not extremely difficult but neither and easy walk in the park. Sorry.
Been there, done that ... barely kept the timelines.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 1.8.4/fpc 3.0.4 - Ubuntu 10, Kubuntu 14, Windows XP SP3 (Home & Prof.) and various DOS incarnations.

Kevin95

  • New member
  • *
  • Posts: 13
All those parts you're looking for are most probably fetched by some script(s) after the browser loads the page, not in the page itself. To get them you'd have to do whatever it's the JS script(s) is doing. Not extremely difficult but neither and easy walk in the park. Sorry.

Thanks for all your help so far. Can you tell me where I can look up how to do this? I would like to try it :)

lucamar

  • Sr. Member
  • ****
  • Posts: 316
Can you tell me where I can look up how to do this? I would like to try it :)

Not really, sorry. I've only done something similar once and it was in Delphi 5 using the TWebBrowser component (which encapsulated Internet Explorer). Do a search in the wiki: there are some pointers there for similar controls based on Gecko, Webkit, etc. engines.

Another solution is to find what the page is doing and mimic it if it's possible: it may be as simple as a secondary http call with some parameters.

Best of lucks to you.
Been there, done that ... barely kept the timelines.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 1.8.4/fpc 3.0.4 - Ubuntu 10, Kubuntu 14, Windows XP SP3 (Home & Prof.) and various DOS incarnations.

engkin

  • Hero Member
  • *****
  • Posts: 2115
Make a little app to open the link in your preferred browser, and use MouseAndKeyInput unit (search the forum) to send your keyboard combination and sequence. This involves waiting enough for the page to load, or knowing when it finishes loading.

Or use WP example to retrieve the text from another website that provides the service of turning your website into text (search Google for these sites).

Edit:
Instead of the link you were using, try this in WP's code:
https://trainingslager.onlineliga.de/player/overview?playerId=28056

You can parse the result using RegEx or InternetTools ... etc.
« Last Edit: November 04, 2018, 04:37:15 pm by engkin »

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus