Lazarus

Programming => General => Topic started by: bobonwhidbey on June 09, 2019, 11:38:14 pm

Title: [Solved] Workaround for HTTPSender.ResultCode=403
Post by: bobonwhidbey on June 09, 2019, 11:38:14 pm
When I run the LCLIntf
Code: Pascal  [Select]
  1.   OpenDocument(Url)

The correct web page appears in my default browser, as expected. But I want the html contents of the web page sent to a file. When I use
Code: Pascal  [Select]
  1. function DownloadHTTP(URL, TargetFile: string): boolean;
  2. var
  3.   HTTPSender: THTTPSend;
  4.   k: integer;
  5. begin
  6.   Result := False;
  7.   HTTPSender := THTTPSend.Create;
  8.   try
  9.     HTTPSender.HTTPMethod('GET', URL);
  10.     k := HTTPSender.ResultCode;
  11.  
  12.     if (k >= 100) and (k <= 299) then
  13.     begin
  14.       HTTPSender.Document.SaveToFile(TargetFile);
  15.       Result := True;
  16.     end;
  17.   finally
  18.     HTTPSender.Free;
  19.   end;
  20. end;

I get an ResultCode = 403 for the same URL. Is there a way to use the default browser but have the result sent to a file which could then be accessed and parsed?
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: Leledumbo on June 10, 2019, 07:02:15 am
The possibility is theoretically infinite, but I would guess the server expects some well known user agent header. Try to pass chrome one.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: bobonwhidbey on June 10, 2019, 05:49:39 pm
The OpenDocument approach utilizes my default browser (Chrome) and works - but puts the output to my monitor. It would seem that I have two roads to a solution:
As you say, there are an unlimited number of potential problems with the 2nd approach. Is there some way to pipe monitor output to a file, perhaps via a ShellExecute?
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: Thaddy on June 11, 2019, 08:01:55 am
I would first try LeleDumbo's suggestion, because the default useragent from synapse is rather old. It was also my first guess.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: bobonwhidbey on June 11, 2019, 08:33:06 am
I have no idea how to follow through on that suggestion. Can you point me to an article or give me an idea.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: sstvmaster on June 11, 2019, 08:05:34 pm
Code: Pascal  [Select]
  1. function DownloadHTTP(URL, TargetFile: string): boolean;
  2. var
  3.   HTTPSender: THTTPSend;
  4.   k: integer;
  5. begin
  6.   Result := False;
  7.   HTTPSender := THTTPSend.Create;
  8.   // This is the UserAgent !!! This is only an example
  9.   HTTPSender.UserAgent := 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0';
  10.   try
  11.     HTTPSender.HTTPMethod('GET', URL);
  12.     k := HTTPSender.ResultCode;
  13.      
  14.     if (k >= 100) and (k <= 299) then
  15.     begin
  16.       HTTPSender.Document.SaveToFile(TargetFile);
  17.       Result := True;
  18.     end;
  19.   finally
  20.     HTTPSender.Free;
  21.   end;
  22. end;

More information:
- https://en.wikipedia.org/wiki/User_agent
- https://developers.whatismybrowser.com/useragents/explore/
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: sash on June 12, 2019, 09:09:39 am
I almost absolutely (I don't have an access to server's code after all) sure you have nothing to do with UserAgent.

If you're getting 403 - most typical cause - you're visiting non-public page and logged in previously, so you got session cookie in your browser, while your HTTPClient lacks one.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: bobonwhidbey on June 16, 2019, 02:07:21 am
Thank you SSTVMASTER. Your idea worked perfectly. I merely added this:

Code: Pascal  [Select]
  1.     HTTPSender.UserAgent := 'Mozilla/5.0';

after the create, and all went smoothly.

Sorry I didn't get back to you earlier, but I was away from my PC.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: Leledumbo on June 17, 2019, 07:48:22 am
I almost absolutely (I don't have an access to server's code after all) sure you have nothing to do with UserAgent.

If you're getting 403 - most typical cause - you're visiting non-public page and logged in previously, so you got session cookie in your browser, while your HTTPClient lacks one.
Seems like it does ;)
Some sites optiimizes its view depending on the browser requesting the page (or simply wants to always know what it is requested with) and when it's unknown, rather than giving possibly broken render, send permission denied instead. Github does this as well, so it's kind of common (ab?)use of such a response code.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: Thaddy on June 17, 2019, 09:04:36 am
so it's kind of common (ab?)use of such a response code.
No, a  page may not be rendered correctly for older browsers, so the 403 is correct. It is not abuse. Hence as a result, simply upgrade (acually spoof!) your useragent to something more recent.
The useragent identifies the feature set of the client at serverside. Even if you do not use a browser the server expects support for a bottom-line feature set to give a valid response.
This is also called "content negotiation". See https://en.wikipedia.org/wiki/User_agent
Title: Re: [Solved] Workaround for HTTPSender.ResultCode=403
Post by: trev on June 17, 2019, 09:44:30 am
Quote
6.5.3.  403 Forbidden

   The 403 (Forbidden) status code indicates that the server understood
   the request but refuses to authorize it.  A server that wishes to
   make public why the request has been forbidden can describe that
   reason in the response payload (if any).
Source: RFC 7231

So, yes, not abuse, but also not very friendly when the reason for the response can be included in the payload. microchip.com is a not very friendly one I encountered using fphttpclient.
Title: Re: Workaround for HTTPSender.ResultCode=403
Post by: sash on June 17, 2019, 09:53:58 am
No, a  page may not be rendered correctly for older browsers, so the 403 is correct.

How do they know if it is "older"? They simply don't know this Useragent string and refuse to care about actual features set, which actually should be "content-negotiated" with Accept* headers.

The problem is that 403 is very generic status and probably ok, if there would be a meaningful description along with the body of 403's response.
Title: Re: [Solved] Workaround for HTTPSender.ResultCode=403
Post by: Thaddy on June 17, 2019, 09:58:11 am
No they know the useragent string, but their site is not capable of rendering for old browsers. Mozilla4 is really old, so they throw an error page :o 8-) O:-).
I will submit a patch to synapse to change the useragent string to something that is 10 years old rather than 20.. Usually these are still applied.
Actually this is useragent spoofing, since no browser is used. See the link to wikipedia.
Anyway, this way you can obtain a fully rendered page.
Title: Re: [Solved] Workaround for HTTPSender.ResultCode=403
Post by: bobonwhidbey on June 17, 2019, 05:56:39 pm
This is a really old  website. The page in question was last updated in 2001
Title: Re: [Solved] Workaround for HTTPSender.ResultCode=403
Post by: Thaddy on June 17, 2019, 06:27:19 pm
It is not a question of the page but a question of the last server update...... It maybe hosted, in which case the hoster will keep the server software up to date....In fact everybody keeps their servers up to date because of security.