Recent

Author Topic: Saving web page to a file  (Read 4798 times)

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Saving web page to a file
« on: May 17, 2020, 01:02:08 am »
This works perfectly to view a web page in my default browser:

    URL :=  'http://www.bridgecaptain.com/downloadDD.html';
    OpenDocument(URL);   

How can I save the page to a file in the Download folder. I'd really like to use the OpenDocument approach if possible. I've tried
    URL :=  'http://www.bridgecaptain.com/downloadDD.html downloads=c:/User/Bob/Downloads/test.html';
and
    URL :=  'http://www.bridgecaptain.com/downloadDD.html  file:///c:/Users/Bob/test.html ';

but both give me 404 errors. What's the solution?
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2032
  • Former Delphi 1-7, 10.2 user
Re: Saving web page to a file
« Reply #1 on: May 17, 2020, 01:40:44 am »
Have you looked at the declaration for opendocument() ?

Code: Pascal  [Select][+][-]
  1. function OpenDocument( APath: string):Boolean;

You obviously cannot do what you want with that function!

Consider using fphttpclient instead.

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #2 on: May 17, 2020, 02:34:41 am »
I tried

Code: Pascal  [Select][+][-]
  1. function GetRedirectedURL(const fileURL: string): string;
  2. begin
  3.   Result := '';
  4.   with TFPHTTPClient.Create(nil) do
  5.     try
  6.       HTTPMethod('HEAD', fileURL, nil, [301]);
  7.       ResponseHeaders.NameValueSeparator := ':'; // HTTP header uses colon as separator
  8.       Result := Trim(ResponseHeaders.Values['Location']);
  9.       // there will be a space in front of the value, hence the Trim()
  10.     finally
  11.       Free;
  12.     end;
  13. end;
  14.  
  15.     URL :=  'http://www.bridgecaptain.com/downloadDD.html';
  16.     str := GetRedirectedURL(URL);
  17.  
  18.  
but got "Unexpected response status code : 200"
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2032
  • Former Delphi 1-7, 10.2 user
Re: Saving web page to a file
« Reply #3 on: May 17, 2020, 03:00:05 am »
Why are you looking for a redirected page (code 301) when the page you are retrieving is not a redirected page? Code 200 = success :)

Code: Pascal  [Select][+][-]
  1. program http;
  2.  
  3. {$MODE OBJFPC}{$H+}
  4.  
  5. Uses
  6.   fpHttpClient;
  7.  
  8. begin
  9.   WriteLn(TFPHttpClient.SimpleGet('http://www.bridgecaptain.com/downloadDD.html'));
  10. end.
  11.  

The above program prints the web page to the terminal.

jamie

  • Hero Member
  • *****
  • Posts: 7602
Re: Saving web page to a file
« Reply #4 on: May 17, 2020, 03:08:16 am »
Don't you first need to create the Component so you can get the AllowRedirect property := true ?

 I may have the prop name wrong but its in there.

Then of course you need to free it..
The only true wisdom is knowing you know nothing

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #5 on: May 17, 2020, 03:34:46 am »
I get a compilation error with @ClientURI  -   which I have defined. This codiing works when I don't need a redirect (and I comment out that line) but I normally need redirects.

Code: Pascal  [Select][+][-]
  1. function WebToFile(URL: string; out HTMLDoc: string): boolean;
  2. var
  3.   Client: TFPHttpClient;
  4. begin
  5.   Result := False;
  6.   //  InitSSLInterface; // this is fixed in trunk
  7.   Client := TFPHttpClient.Create(nil);
  8.   try
  9.     Client.AllowRedirect := True;      
  10.     Client.OnRedirect    := @CheckURI;  
  11.     HTMLDoc := Client.Get(URL);
  12.     Result := True;
  13.   finally
  14.     Client.Free;
  15.   end;
  16. end;
  17.  
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

jamie

  • Hero Member
  • *****
  • Posts: 7602
Re: Saving web page to a file
« Reply #6 on: May 17, 2020, 03:40:46 am »
do you need to need to deal with the redirection ?

also use SimpleGet , not get because otherwise you need to do some more setups..

as for the compiler error, you ether have the wrong prototype defined for the event of you don't need the @ in front..

 you don't always need the @.
The only true wisdom is knowing you know nothing

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #7 on: May 17, 2020, 04:24:01 am »
With or without the @ I get a compile error.

I do need redirects most of the time.

Code: Pascal  [Select][+][-]
  1. function WebToFile(URL: string; Filename: string): boolean;
  2.   procedure CheckURI(Sender: TObject; const ASrc: string; var ADest: string);
  3.   var
  4.     newURI: TURI;
  5.     OriginalURI: TURI;
  6.   begin
  7.     newURI := ParseURI(ADest, False);
  8.     if (newURI.Host = '') then
  9.     begin                   // NewURI does not contain protocol or host
  10.       OriginalURI := ParseURI(ASrc, False); // use the original URI...
  11.       OriginalURI.Path := newURI.Path; // ... with the new subpage (path)...
  12.       OriginalURI.Document := newURI.Document; // ... and the new document info...
  13.       ADest := EncodeURI(OriginalURI); // ... and return the complete redirected URI
  14.     end;
  15.   end;
  16. var
  17.   Client: TFPHttpClient;
  18. begin
  19.   Result := False;
  20.   //  InitSSLInterface; // this is fixed in trunk
  21.   Client := TFPHttpClient.Create(nil);
  22.   try
  23.     Client.AllowRedirect := True;        //  Allow redirections
  24.     Client.OnRedirect    := CheckURI;   // this tells the Client how to handle redirection;
  25.     Filename := Client.Get(URL);
  26.     Result := True;
  27.   finally
  28.     Client.Free;
  29.   end;
  30. end;  
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

TRon

  • Hero Member
  • *****
  • Posts: 4377
Re: Saving web page to a file
« Reply #8 on: May 17, 2020, 05:38:48 am »
With or without the @ I get a compile error.
Take a closer look at the error message (it will tell you the same as i write below)

fpHttpclient https://github.com/graemeg/freepascal/blob/master/packages/fcl-web/src/base/fphttpclient.pp

Property OnRedirect : TRedirectEvent Read FOnRedirect Write FOnRedirect;

TRedirectEvent = Procedure (Sender : TObject; Const ASrc : String; Var ADest: String) of object;
Today is tomorrow's yesterday.

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #9 on: May 17, 2020, 07:01:21 am »
This seemed like the obvious solution, but I got basically the same err message

    Client.OnRedirect    := CheckURI(Client, URL, FileName);
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

TRon

  • Hero Member
  • *****
  • Posts: 4377
Re: Saving web page to a file
« Reply #10 on: May 17, 2020, 07:25:45 am »
This seemed like the obvious solution, but I got basically the same err message

    Client.OnRedirect    := CheckURI(Client, URL, FileName);

This is describing a function/procedure: https://castle-engine.io/modern_pascal_introduction.html#_functions_procedures_primitive_types

This is describing classes: https://castle-engine.io/modern_pascal_introduction.html#_classes
Note the list where it says: methods (which is fancy name for "a procedure or function inside a class"),

Or to be more related to your error: "procedure of object".

Your procedure CheckURI isn't part of an object/class. You can solve that by creating a class and adding a CheckURI method to that.
Today is tomorrow's yesterday.

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #11 on: May 17, 2020, 06:45:42 pm »
This is what I've ended up with and it seems to work beautifully when a redirect is not necessary, although the code isn't all that beautiful. This seems like a cumbersome way to get the job done. How can I improve this code and get it to work with a redirect?

Code: Pascal  [Select][+][-]
  1. type
  2.   TMyHTTPClient = class(TFPHTTPClient)
  3.   public
  4.     procedure CheckURI(Sender: TObject; const ASrc: string; var ADest: string);
  5.   end;
  6.  
  7. procedure TMyHTTPClient.CheckURI(Sender: TObject; const ASrc: string; var ADest: string);
  8. var
  9.   newURI: TURI;
  10.   OriginalURI: TURI;
  11. begin
  12.   newURI := ParseURI(ADest, False);
  13.   if (newURI.Host = '') then
  14.   begin                   // NewURI does not contain protocol or host
  15.     OriginalURI := ParseURI(ASrc, False); // use the original URI...
  16.     OriginalURI.Path := newURI.Path; // ... with the new subpage (path)...
  17.     OriginalURI.Document := newURI.Document; // ... and the new document info...
  18.     ADest := EncodeURI(OriginalURI); // ... and return the complete redirected URI
  19.   end;
  20. end;
  21.  
  22. function URLtoStr(URL: string; out WebString: string): boolean;
  23. var
  24.   Client: TMyHttpClient;
  25. begin
  26.   Result := False;
  27.   //  InitSSLInterface; // this is fixed in trunk
  28.   Client := TMyHttpClient.Create(nil);
  29.   try
  30.     Client.AllowRedirect := True;        //  Allow redirections
  31.     Client.CheckURI(Client, URL, WebString);
  32.     WebString := Client.Get(URL);
  33.     Result := True;
  34.   finally
  35.     Client.Free;
  36.   end;
  37. end;
  38.  
  39.  
« Last Edit: May 17, 2020, 07:00:43 pm by bobonwhidbey »
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #12 on: May 17, 2020, 11:44:03 pm »
I've cleaned up my code. Here's the problem I'm now facing.  A web page that needs redirection is processed correctly with this code; otherwise not. All pages (that I have seen) have a newURI.Host that = '' and only pages with redirects need to have the ASrc URL encoded. Pages without redirects do not need any "encoding" of their URL.

How can I tell when the web page needs to go through this loop. It doesn't seem to be when
(newURI.Host = '')

Code: Pascal  [Select][+][-]
  1. type
  2.   TMyHTTPClient = class(TFPHTTPClient)
  3.   public
  4.     procedure CheckURI(Sender: TObject; const ASrc: string; var ADest: string);
  5.   end;
  6.  
  7. procedure TMyHTTPClient.CheckURI(Sender: TObject; const ASrc: string; var ADest: string);
  8. var
  9.   newURI: TURI;
  10.   OriginalURI: TURI;
  11. begin
  12.   newURI := ParseURI(ADest, False);
  13.   if (newURI.Host = '') then
  14.   begin                   // NewURI does not contain protocol or host
  15.     OriginalURI := ParseURI(ASrc, False); // use the original URI...
  16.     OriginalURI.Path := newURI.Path; // ... with the new subpage (path)...
  17.     OriginalURI.Document := newURI.Document; // ... and the new document info...
  18.     ADest := EncodeURI(OriginalURI); // ... and return the complete redirected URI
  19.   end
  20.   else
  21.     ADest := ASrc;
  22. end;
  23.  
  24. function URLtoStr(URL: string; out WebString: string): boolean;
  25. var
  26.   Client: TMyHttpClient;
  27.   NewURL : string;
  28. begin
  29.   Result := False;
  30.   Client := TMyHttpClient.Create(nil);
  31.   try
  32.     Client.AllowRedirect := True;        //  Allow redirections
  33.     Client.CheckURI(Client, URL, NewURL);
  34.     WebString := Client.Get(NewURL);
  35.     Result := True;
  36.   finally
  37.     Client.Free;
  38.   end;
  39. end;
  40.  
  41.  
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2032
  • Former Delphi 1-7, 10.2 user
Re: Saving web page to a file
« Reply #13 on: May 18, 2020, 02:13:04 am »
This is what I use across FreeBSD, Linux, macOS and Windows - it handles pages with redirects and pages without redirects:

Code: Pascal  [Select][+][-]
  1.     {$IFDEF UNIX}
  2.     function GetMicrochipPage(const URL: string): string;
  3.     var
  4.       Client: TFPHttpClient;
  5.       {$IFDEF DARWIN}
  6.       MsgStr: String;
  7.       {$ENDIF}
  8.     begin
  9.       Client := TFPHttpClient.Create(nil);
  10.      
  11.       Try
  12.         Client.AllowRedirect := true;
  13.         Client.AddHeader('User-Agent', 'Mozilla/5.0(compatible; fpweb)');
  14.         Result := Client.Get(URL);
  15.       except
  16.           on E: Exception do
  17.                {$IFDEF DARWIN}
  18.                begin
  19.                    MsgStr := 'Retrieval of: ' + URL + LineEnding
  20.                            + 'Failed with error: ' + E.Message + LineEnding
  21.                            + 'HTTP code: ' + IntToSTr(Client.ResponseStatusCode);
  22.      
  23.                    ShowAlertSheet(Form1_Main.Handle, 'Alert', MsgStr);
  24.                end;
  25.                {$ENDIF}
  26.                {$IFNDEF DARWIN}
  27.                ShowMessage('Retrieval of: ' + URL + LineEnding
  28.                            + 'Failed with error: ' + E.Message + LineEnding
  29.                            + 'HTTP code: ' + IntToSTr(Client.ResponseStatusCode));
  30.                {$ENDIF}
  31.       end;
  32.     end;
  33.     {$ENDIF}
  34.      
  35.     {$IFDEF WINDOWS}
  36.     // Need to use Windows WinInet to avoid issue with HTTPS
  37.     // needing two OpenSSL DLLs to be provided with application
  38.     // if using TFPHttpClient.
  39.     // The WinINet API also gets any connection and proxy settings
  40.     // set by Internet Explorer. Blessing or curse?
  41.      
  42.     function GetMicrochipPage(const Url: string): string;
  43.     var
  44.       NetHandle: HINTERNET;
  45.       UrlHandle: HINTERNET;
  46.       Buffer: array[0..1023] of Byte;
  47.       BytesRead: dWord;
  48.       StrBuffer: UTF8String;
  49.     begin
  50.       Result := '';
  51.       NetHandle := InternetOpen('Mozilla/5.0(compatible; WinInet)', INTERNET_OPEN_TYPE_PRECONFIG, nil, nil, 0);
  52.      
  53.       // NetHandle valid?
  54.       if Assigned(NetHandle) then
  55.         Try
  56.           UrlHandle := InternetOpenUrl(NetHandle, PChar(Url), nil, 0, INTERNET_FLAG_RELOAD, 0);
  57.      
  58.           // UrlHandle valid?
  59.           if Assigned(UrlHandle) then
  60.             Try
  61.               repeat
  62.                 InternetReadFile(UrlHandle, @Buffer, SizeOf(Buffer), BytesRead);
  63.                 SetString(StrBuffer, PAnsiChar(@Buffer[0]), BytesRead);
  64.                 Result := Result + StrBuffer;
  65.               until BytesRead = 0;
  66.             Finally
  67.               InternetCloseHandle(UrlHandle);
  68.             end
  69.           // o/w UrlHandle invalid
  70.           else
  71.             ShowMessage('Cannot open URL: ' + Url);
  72.         Finally
  73.           InternetCloseHandle(NetHandle);
  74.         end
  75.       // NetHandle invalid
  76.       else
  77.         raise Exception.Create('Unable to initialize WinInet');
  78.     end;
  79.     {$ENDIF}
  80.  

Note: the non-Windows code requires FPC trunk to handle HTTPS urls.

bobonwhidbey

  • Hero Member
  • *****
  • Posts: 630
    • Double Dummy Solver - free download
Re: Saving web page to a file
« Reply #14 on: May 18, 2020, 04:42:06 pm »
The web site I'm trying to retrieve relies on a password being saved in the default browser's cookies. That cookie doesn't seemed to be retrieved with your GetMicrochipPage code. Any idea?
Lazarus 4.6 FPC 3.2.2 x86_64-win64-win32/win64

 

TinyPortal © 2005-2018