Recent

Author Topic: Error 403 downloading with Synapse and SSL  (Read 744 times)

tintinux

  • Sr. Member
  • ****
  • Posts: 353
    • Gestinux
Error 403 downloading with Synapse and SSL
« on: January 20, 2025, 12:10:21 pm »
Hi

I'm trying to download pages from a particular website. For example, this page.

I'm using Synapse, Lazarus 3.6, FPC 3.2.2 on Ubuntu 24.04 with SSL 3.x

My code is :
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2. {$mode objfpc}{$H+}
  3. interface
  4. uses
  5.   Classes,
  6.   SysUtils,
  7.   Forms,
  8.   Controls,
  9.   Graphics,
  10.   Dialogs,
  11.   StdCtrls,
  12.   httpsend,
  13.   ssl_openssl3;
  14.  
  15. [...]
  16. procedure TForm1.Button1Click(Sender: TObject);
  17. var
  18.   HTTPSender: THTTPSend;
  19. begin
  20.   HTTPSender := THTTPSend.Create;
  21.   try
  22.     HTTPSender.UserAgent := 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0';
  23.     HTTPSender.HTTPMethod('GET', 'https://gw.geneanet.org/sstrebler?n=letscher+lescher&p=marie+elisabeth&oc=0&lang=fr&type=fiche');
  24.     ShowMessage ('HTTPSender.ResultCode='+inttostr(HTTPSender.ResultCode));
  25.   finally
  26.     HTTPSender.Free;
  27.   end;
  28. end;
  29.  

With the this URL and this program, I always get a 403 error.
I understand that there is something specific in the web site, and they might try to avoid such download.

However, with Firefox I can get the content, without error, even when I am not identified on the site, or when it is the very first request and cookies are disabled. You can try easily...

I wonder whether I'm missing something or I'm doing wrong ?
Is there anything I can do to download pages on this site ?

Thanks for your help.


« Last Edit: January 20, 2025, 12:12:58 pm by tintinux »
Initiator of gestinux, open-source, multi-database and multilingual accounting and billing software made with LAZARUS.

You can help to develop, to make and improve translations, and to provide examples of legal charts and reports from more countries.

rvk

  • Hero Member
  • *****
  • Posts: 6658
Re: Error 403 downloading with Synapse and SSL
« Reply #1 on: January 20, 2025, 12:35:01 pm »
I wonder whether I'm missing something or I'm doing wrong ?
Is there anything I can do to download pages on this site ?
If you get the page with curl you also get a 403 error.
BUT... you do get some content for the page.

This page has javascript... so probably in that javascript you get redirected to the correct page.
If querying this page via a browser it does not seem to give you the 403 with javascript.
So there might be something the browser passes, which you are not passing.

Code: Bash  [Select][+][-]
  1. $ curl -i "https://gw.geneanet.org/sstrebler?n=letscher+lescher&p=marie+elisabeth&oc=0&lang=fr&type=fiche"
  2. HTTP/2 403
  3. date: Mon, 20 Jan 2025 11:15:54 GMT
  4. content-type: text/html; charset=UTF-8
  5. accept-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
  6. critical-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
  7. cross-origin-embedder-policy: require-corp
  8. cross-origin-opener-policy: same-origin
  9. cross-origin-resource-policy: same-origin
  10. origin-agent-cluster: ?1
  11. permissions-policy: accelerometer=(),autoplay=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
  12. referrer-policy: same-origin
  13. x-content-options: nosniff
  14. x-frame-options: SAMEORIGIN
  15. cf-mitigated: challenge
  16. cf-chl-out: gj+A6m65NdLZYw7z5WOsQRENLN44DhQB5llm05YdHOv8kiH/pLrBTqC8zJ4DTDM74+n4OPZdMy0pSYYyUHxRPnEqSHiq3Z2+xYtU/l9KNJuEoVvfY5NcvSsJ1QS2VNsBjQrR/GfzotwEYL7haSpd/A==$AaeCfN6KgroiTuNV2tbfpQ==
  17. cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  18. expires: Thu, 01 Jan 1970 00:00:01 GMT
  19. set-cookie: __cf_bm=OKVe3GbSbG.sphCl9rXallHGEg6q5cU8g_bmgK1Pa_U-1737371754-1.0.1.1-NPUuPFZji97MwgsEUXwsRPL9bQRSWaqeipty_VjjYba0AzKBX8FEfukMwt27lkJQ11gBrCIy7y5s21vFRU70WA; path=/; expires=Mon, 20-Jan-25 11:45:54 GMT; domain=.geneanet.org; HttpOnly; Secure; SameSite=None
  20. strict-transport-security: max-age=31536000; includeSubDomains; preload
  21. server: cloudflare
  22. cf-ray: 904ea63ace796643-AMS
  23.  
  24. <!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  25. <meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta name="robots" content="noindex,nofollow">
  26. <meta name="viewport" content="width=device-width,initial-scale=1">
  27. <style>*{box-sizing:border-box;margin:0;padding:0}html .....

rvk

  • Hero Member
  • *****
  • Posts: 6658
Re: Error 403 downloading with Synapse and SSL
« Reply #2 on: January 20, 2025, 01:50:45 pm »
BTW. What you are encountering is the cloudflare firewall.
You probably seen sometimes the page "checking if you are human" from cloudflare.
Like on this page: https://www.scrapingcourse.com/cloudflare-challenge

You can't just bypass this by passing some headers (they tried).
You can, temporarily, use a cookie generated by a computer (via human interaction in the browser) but that cookie is only temporarily valid.

For examples see here:
https://www.zenrows.com/blog/curl-bypass-cloudflare#bypass-cloudflare-in-curl

So yeah. this is is not simple to bypass.

Thaddy

  • Hero Member
  • *****
  • Posts: 16520
  • Kallstadt seems a good place to evict Trump to.
Re: Error 403 downloading with Synapse and SSL
« Reply #3 on: January 20, 2025, 03:09:31 pm »
Yes, I also tried with a carefully crafted requestheader and TfpHttpClient.
Code: Pascal  [Select][+][-]
  1. // does not work with cloudflare protected sites.
  2. {$mode objfpc}{$H+}
  3. uses
  4.  sysutils,
  5.  fphttpclient,
  6.  opensslsockets;
  7.  
  8. var
  9.   client:TfpHttpClient;
  10. begin
  11.   client := TfpHttpClient.Create(nil);
  12.   try
  13.     try
  14.       client.AllowRedirect := true;
  15.       client.RequestHeaders.Add('User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3');
  16.       client.RequestHeaders.Add('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8');
  17.       client.RequestHeaders.Add('Accept-Language: en-US,en;q=0.5');
  18.       client.RequestHeaders.Add('Accept-Encoding: gzip, deflate, br');
  19.       client.RequestHeaders.Add('Connection: keep-alive');
  20.       client.RequestHeaders.Add('Upgrade-Insecure-Requests: 1');
  21.       client.RequestHeaders.add('Sec-Fetch-Dest: document' );
  22.       client.RequestHeaders.add('Sec-Fetch-Mode: navigate' );
  23.       client.RequestHeaders.add('Sec-Fetch-Site: none' );
  24.       writeln(client.Get('https://gw.geneanet.org/sstrebler?n=letscher+lescher&p=marie+elisabeth&oc=0&lang=fr&type=fiche'));
  25.     except
  26.       On E:EHttpClient do
  27.         writeln (E.message)
  28.     else
  29.       raise;
  30.     end;
  31.   finally
  32.     client.free;
  33.   end;
  34. end.
No luck.
I have read that puppeteer would work and if that works CEF should also work in headless mode. (puppeteer is basically chromium+js) Not tried yet.
The attachment is wishful thinking for now. With CEF this should be doable.
« Last Edit: January 20, 2025, 04:03:14 pm by Thaddy »
But I am sure they don't want the Trumps back...

tintinux

  • Sr. Member
  • ****
  • Posts: 353
    • Gestinux
Re: Error 403 downloading with Synapse and SSL
« Reply #4 on: January 21, 2025, 03:01:48 pm »
BTW. What you are encountering is the cloudflare firewall.
[...]
For examples see here:
https://www.zenrows.com/blog/curl-bypass-cloudflare#bypass-cloudflare-in-curl
So yeah. this is is not simple to bypass.

Thanks for these useful informations.
I was successful in bypassing CloudFlare from Lazarus, only with the service provided, but it is probably too expensive for my needs,
I'll try other ways, but, yes, it don't look easy.
Regards 
Initiator of gestinux, open-source, multi-database and multilingual accounting and billing software made with LAZARUS.

You can help to develop, to make and improve translations, and to provide examples of legal charts and reports from more countries.

Thaddy

  • Hero Member
  • *****
  • Posts: 16520
  • Kallstadt seems a good place to evict Trump to.
Re: Error 403 downloading with Synapse and SSL
« Reply #5 on: January 21, 2025, 03:29:23 pm »
The CEF way should work. I was not able to test that, though.
But I am sure they don't want the Trumps back...

 

TinyPortal © 2005-2018