Recent

Author Topic: [Again] [Synapse] Download file from Sourceforge  (Read 12525 times)

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
[Again] [Synapse] Download file from Sourceforge
« on: June 09, 2011, 12:32:26 pm »
Hi,

Probably a fairly stupid question, but rather a stupid question than no chance of an answer  :)
I'm trying to extend the LazUpdater program (http://forge.lazarusforum.de/projects/lazupdater), which deals with installing/updating Lazarus/FPC SVN.

If there's no SVN client on Windows, I'm trying to download some SVN components from Sourceforge, but it doesn't work - probably because of redirection to a mirror.
The code below always gives an empty document (Buffer.Size=0)

Does anybody know a fix for this?

Code: [Select]
function TForm_Main.DoSVNClientDownload(): Boolean;
const
  SourceUrl= 'http://downloads.sourceforge.net/project/win32svn/1.6.17/svn-win32-1.6.17.zip?r=&ts=1307454210&use_mirror=kent';
var
  Buffer: TMemoryStream;
  TargetFile: string;
begin
TargetFile := ExtractFilePath(ParamStr(0)) +
'svn-win32-1.6.17.zip';
  try
    Buffer := TMemoryStream.Create;
      if not HttpGetBinary(SourceURL, Buffer) then
        raise Exception.Create('Cannot load document from remote server');
      Application.ProcessMessages;
      if Buffer.Size=0 then raise Exception.Create('Downloaded document is empty.');
      Buffer.Position := 0;
      Buffer.SaveToFile(TargetFile);
  finally
    FreeAndNil(Buffer);
  end;
  Result:=True;
end;
« Last Edit: February 19, 2012, 09:33:54 am by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

AmatCoder

  • Jr. Member
  • **
  • Posts: 57
    • My site
Re: [Synapse] Download file from Sourceforge
« Reply #1 on: June 09, 2011, 12:49:09 pm »
Have you tried a direct download? Something like:

Code: [Select]
SourceUrl= 'http://heanet.dl.sourceforge.net/project/win32svn/1.6.17/svn-win32-1.6.17.zip'

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
[Solved] [Synapse] Download file from Sourceforge
« Reply #2 on: June 09, 2011, 01:08:51 pm »
 :-[

No, I haven't; just fiddlexd with the parameters sent to the HTTP server.

Thanks, this seems to work brilliantly. I knew the solution was simple, just too tired and frustrated to see it...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: [Solved] [Synapse] Download file from Sourceforge
« Reply #3 on: June 14, 2011, 02:59:15 pm »
SF change their redirection mode this year. The first snippet did work well until the modification but needed some ugly hard constants as in the second.

Code: [Select]
while not Result do
  begin
    HTTPMethod('GET', aURL);
    case Resultcode of
      301,302,307 : for i:=0 to Headers.Count-1 do
                    if FindPart('Location: ',Headers.Strings[i])>0 then
                    begin
                      aURL:=StringReplace(Headers.Strings[i],'Location: ','',[]);
                      self.Clear;//httpsend
                      break;
                    end;
      100..200 : Result:=true;
      else exit;
    end; //case
  end;
 
Code: [Select]
 while not Result do
  begin
    HTTPMethod('GET', aURL);
    case Resultcode of
      301,302,307 : for i:=0 to Headers.Count-1 do
                    if (FindPart('Location: ',Headers.Strings[i])>0) or
                       (FindPart('location: ',Headers.Strings[i])>0) then
                    begin
                      j:=Pos('use_mirror=',Headers.Strings[i]);
                      if j>0 then
                        aURL:='http://'+RightStr(Headers.Strings[i],length(Headers.Strings[i])-j-10)+'.dl.sourceforge.net/project/scrabble/'+aDir+aFileName else
                        aURl:=StringReplace(Headers.Strings[i],'Location: ','',[]);
                      self.Clear;//httpsend
                      break;
                    end;
      100..200 : Result:=true;
      500:raise EDownloadError.Create('No internet connection available');//Internal Server Error ('+aURL+')');
      else raise EDownloadError.Create('Download failed with error code '+inttostr(ResultCode)+' ('+ResultString+')');
    end;//case
  end;//while

PS: IMHO, it's not a good idea to access a specific server like heanet or kent.
« Last Edit: June 14, 2011, 03:02:37 pm by Ocye »
Lazarus 1.7 (SVN) FPC 3.0.0

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
[Again] [Synapse] Download file from Sourceforge
« Reply #4 on: February 19, 2012, 09:33:30 am »
Edit: updated with new code... that doesn't work...

Sorry to dig up an old thread, but I'm having trouble downloading again and thought it might be my mediocre understanding of your code, Ocye.

Demo project attached; I currently have this code (using Synapse, infoln is basically writeln):
Code: [Select]
const
  SFProjectPart = '//sourceforge.net/projects/';
  SFFilesPart = '/files/';
  SFDownloadPart ='/download';
  MaxRetries = 3;
var
  Buffer: TMemoryStream;
  HTTPGetResult: boolean;
  i, j: integer;
  HTTPSender: THTTPSend;
  IsSFDownload: boolean;
  SFProjectBegin: integer;
  RetryAttempt: integer;
  SFDirectory: string; //Sourceforge directory
  SFDirectoryBegin: integer;
  SFFileBegin: integer;
  SFFilename: string; //Sourceforge name of file
  SFProject: string;
begin
  Result := False;
  IsSFDownload:=false;

  // Detect SourceForge download; e.g. from URL
  //          1         2         3         4         5         6         7         8         9
  // 1234557890123456789012345578901234567890123455789012345678901234557890123456789012345578901234567890
  // http://sourceforge.net/projects/base64decoder/files/base64decoder/version%202.0/b64util.zip/download
  //                                 ^^^project^^^       ^^^directory............^^^ ^^^file^^^
  i:=Pos(SFProjectPart, URL);
  if i>0 then
  begin
    // Possibly found project; now extract project, directory and filename parts.
    SFProjectBegin:=i+Length(SFProjectPart);
    j := PosEx(SFFilesPart, URL, SFProjectBegin);
    if (j>0) then
    begin
      SFProject:=Copy(URL, SFProjectBegin, j-SFProjectBegin);
      SFDirectoryBegin:=PosEx(SFFilesPart, URL, SFProjectBegin)+Length(SFFilesPart);
      if SFDirectoryBegin>0 then
      begin
        // Find file
        // URL might have trailing arguments... so: search for first
        // /download coming up from the right, but it should be after
        // /files/
        i:=RPos(SFDownloadPart, URL);
        // Now look for previous / so we can make out the file
        // This might perhaps be the trailing / in /files/
        SFFileBegin:=RPosEx('/',URL,i-1)+1;

        if SFFileBegin>0 then
        begin
          SFFilename:=Copy(URL,SFFileBegin, i-SFFileBegin);
          //Include trailing /
          SFDirectory:=Copy(URL, SFDirectoryBegin, SFFileBegin-SFDirectoryBegin);
          IsSFDownload:=true;
        end;
      end;
    end;
  end;

  if IsSFDownload then
  begin
    try
      // Rewrite URL if needed for Sourceforge download redirection
      HTTPSender := THTTPSend.Create;
      while not Result do
      begin
        HTTPSender.HTTPMethod('GET', URL);
        infoln('debug: headers:');
        infoln(HTTPSender.Headers.Text);
        case HTTPSender.Resultcode of
          301, 302, 307: for i := 0 to HTTPSender.Headers.Count - 1 do
              if (Pos('Location: ', HTTPSender.Headers.Strings[i]) > 0) or
                (Pos('location: ', HTTPSender.Headers.Strings[i]) > 0) then
              begin
                j := Pos('use_mirror=', HTTPSender.Headers.Strings[i]);
                if j > 0 then
                  URL :=
                    'http://' + RightStr(HTTPSender.Headers.Strings[i],
                    length(HTTPSender.Headers.Strings[i]) - j - 10) +
                    '.dl.sourceforge.net/project/' +
                    SFProject + '/' + SFDirectory + SFFilename
                else
                  URl :=
                    StringReplace(HTTPSender.Headers.Strings[i], 'Location: ', '', []);
                HTTPSender.Clear;//httpsend
                break;
              end;
          100..200: Result := True; //No changes necessary
          500: raise Exception.Create('No internet connection available');
            //Internal Server Error ('+aURL+')');
          else
            raise Exception.Create('Download failed with error code ' +
              IntToStr(HTTPSender.ResultCode) + ' (' + HTTPSender.ResultString + ')');
        end;//case
      end;//while
      infoln('debug: resulting url after sf redir: *' + URL + '*');
    finally
      HTTPSender.Free;
    end;
  end;

It tries to detects a sourceforge download; assigns the project name, the file wanted for download and the directory. Then follows redirects until it hits the right page.
However, in my case, it immediately gives a 200 OK on the first HTTP GET.

After this code, It then goes on to try and download the revised UR using HTTPGetBinary.
Attached in zip (DownloadedPage.html) is what I get.

I suspect I need to actually download the resulting file after 200 ok, then
1. see if it is HTML, not binary (presumably by searching the headers for Content-Type: text/html?)
2. Take the body, look for either direct link</a> or class="direct-download", which will find:
Quote
            <a href="http://downloads.sourceforge.net/project/base64decoder/base64decoder/version%202.0/b64util.zip?r=&amp;ts=1329639480&amp;use_mirror=garr" class="direct-download">
3. Take that line, look for <a href="http; get the URL from there until the ?, and start over with the function??

Suggestions welcome ;)

Thanks,
BigChimp
« Last Edit: February 20, 2012, 03:51:30 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: [Again] [Synapse] Download file from Sourceforge
« Reply #5 on: February 19, 2012, 12:04:43 pm »
Request for help with Sourceforge downloads; see previous post in thread....
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: [Again] [Synapse] Download file from Sourceforge
« Reply #6 on: February 19, 2012, 06:24:46 pm »
You're getting the download page that starts auto downloading from a browser in 5 secs and that provides a direct link to a mirror in case the script is not working. This is not a header re-direction. In your case you'll need to scrape the page to get the "direct link". A quick and dirty one :
Quote
delete(html,1,pos('<meta http-equiv="refresh"',html));
delete(html,1,pos('url=',html)+3);
delete(html,pos('"',html),length(html));
html is the page you downloaded and becomes the new url after these 3 lines.

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: [Again] [Synapse] Download file from Sourceforge
« Reply #7 on: February 20, 2012, 02:11:59 pm »
Thanks Ludo.
This seems to work - I'm sure it can be trimmed down, but seems to do the job for now.

Edit: attached new project...
Code: [Select]
function DownloadHTTPStream(URL: string; Buffer: TStream): boolean;
  // Download file; retry if necessary.
const
  MaxRetries = 3;
var
  RetryAttempt: integer;
  HTTPGetResult: boolean;
begin
  Result:=false;
  RetryAttempt := 1;
  HTTPGetResult := False;
  while ((HTTPGetResult = False) and (RetryAttempt < MaxRetries)) do
  begin
    HTTPGetResult := HttpGetBinary(URL, Buffer);
    //Application.ProcessMessages;
    Sleep(100 * RetryAttempt);
    RetryAttempt := RetryAttempt + 1;
  end;
  if HTTPGetResult = False then
    raise Exception.Create('Cannot load document from remote server');
  Buffer.Position := 0;
  if Buffer.Size = 0 then
    raise Exception.Create('Downloaded document is empty.');
  Result := True;
end;

function SFDirectLinkURL(URL: string; Document: TMemoryStream): string;
{
Transform this part of the body:
<noscript>
<meta http-equiv="refresh" content="5; url=http://downloads.sourceforge.net/project/base64decoder/base64decoder/version%202.0/b64util.zip?r=&amp;ts=1329648745&amp;use_mirror=kent">
</noscript>
into a valid URL:
http://downloads.sourceforge.net/project/base64decoder/base64decoder/version%202.0/b64util.zip?r=&amp;ts=1329648745&amp;use_mirror=kent
}
const
  Refresh='<meta http-equiv="refresh"';
  URLMarker='url=';
var
  Counter: integer;
  HTMLBody: TStringList;
  RefreshStart: integer;
  URLStart: integer;
begin
  HTMLBody:=TStringList.Create;
  try
    HTMLBody.LoadFromStream(Document);
    for Counter:=0 to HTMLBody.Count-1 do
    begin
      // This line should be between noscript tags and give the direct download locations:
      RefreshStart:=Ansipos(Refresh, HTMLBody[Counter]);
      if RefreshStart>0 then
      begin
        URLStart:=AnsiPos(URLMarker, HTMLBody[Counter])+Length(URLMarker);
        if URLStart>RefreshStart then
        begin
          // Look for closing "
          URL:=Copy(HTMLBody[Counter],
            URLStart,
            PosEx('"',HTMLBody[Counter],URLStart+1)-URLStart);
          infoln('debug: new url after sf noscript:');
          infoln(URL);
          break;
        end;
      end;
    end;
  finally
    HTMLBody.Free;
  end;
  result:=URL;
end;

function SourceForgeURL(URL: string): string;
// Detects sourceforge download and tries to deal with
// redirection, and extracting direct download link.
// Thanks to
// Ocye: http://lazarus.freepascal.org/index.php/topic,13425.msg70575.html#msg70575
const
  SFProjectPart = '//sourceforge.net/projects/';
  SFFilesPart = '/files/';
  SFDownloadPart ='/download';
var
  HTTPSender: THTTPSend;
  i, j: integer;
  FoundCorrectURL: boolean;
  SFDirectory: string; //Sourceforge directory
  SFDirectoryBegin: integer;
  SFFileBegin: integer;
  SFFilename: string; //Sourceforge name of file
  SFProject: string;
  SFProjectBegin: integer;
begin
  // Detect SourceForge download; e.g. from URL
  //          1         2         3         4         5         6         7         8         9
  // 1234557890123456789012345578901234567890123455789012345678901234557890123456789012345578901234567890
  // http://sourceforge.net/projects/base64decoder/files/base64decoder/version%202.0/b64util.zip/download
  //                                 ^^^project^^^       ^^^directory............^^^ ^^^file^^^
  FoundCorrectURL:=true; //Assume not a SF download
  i:=Pos(SFProjectPart, URL);
  if i>0 then
  begin
    // Possibly found project; now extract project, directory and filename parts.
    SFProjectBegin:=i+Length(SFProjectPart);
    j := PosEx(SFFilesPart, URL, SFProjectBegin);
    if (j>0) then
    begin
      SFProject:=Copy(URL, SFProjectBegin, j-SFProjectBegin);
      SFDirectoryBegin:=PosEx(SFFilesPart, URL, SFProjectBegin)+Length(SFFilesPart);
      if SFDirectoryBegin>0 then
      begin
        // Find file
        // URL might have trailing arguments... so: search for first
        // /download coming up from the right, but it should be after
        // /files/
        i:=RPos(SFDownloadPart, URL);
        // Now look for previous / so we can make out the file
        // This might perhaps be the trailing / in /files/
        SFFileBegin:=RPosEx('/',URL,i-1)+1;

        if SFFileBegin>0 then
        begin
          SFFilename:=Copy(URL,SFFileBegin, i-SFFileBegin);
          //Include trailing /
          SFDirectory:=Copy(URL, SFDirectoryBegin, SFFileBegin-SFDirectoryBegin);
          FoundCorrectURL:=false;
        end;
      end;
    end;
  end;

  if not FoundCorrectURL then
  begin
    try
      // Rewrite URL if needed for Sourceforge download redirection
      // Detect direct link in HTML body and get URL from that
      HTTPSender := THTTPSend.Create;
      //Who knows, this might help:
      HTTPSender.UserAgent:='curl/7.21.0 (i686-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18';
      while not FoundCorrectURL do
      begin
        HTTPSender.HTTPMethod('GET', URL);
        infoln('debug: headers:');
        infoln(HTTPSender.Headers.Text);
        case HTTPSender.Resultcode of
          301, 302, 307:
            begin
              for i := 0 to HTTPSender.Headers.Count - 1 do
                if (Pos('Location: ', HTTPSender.Headers.Strings[i]) > 0) or
                  (Pos('location: ', HTTPSender.Headers.Strings[i]) > 0) then
                begin
                  j := Pos('use_mirror=', HTTPSender.Headers.Strings[i]);
                  if j > 0 then
                    URL :=
                      'http://' + RightStr(HTTPSender.Headers.Strings[i],
                      length(HTTPSender.Headers.Strings[i]) - j - 10) +
                      '.dl.sourceforge.net/project/' +
                      SFProject + '/' + SFDirectory + SFFilename
                  else
                    URL:=StringReplace(
                      HTTPSender.Headers.Strings[i], 'Location: ', '', []);
                  HTTPSender.Clear;//httpsend
                  FoundCorrectURL:=true;
                  break; //out of rewriting loop
              end;
            end;
          100..200:
            begin
              //Assume a sourceforge timer/direct link page
              URL:=SFDirectLinkURL(URL, HTTPSender.Document); //Find out
              FoundCorrectURL:=true; //We're done by now
            end;
          500: raise Exception.Create('No internet connection available');
            //Internal Server Error ('+aURL+')');
          else
            raise Exception.Create('Download failed with error code ' +
              IntToStr(HTTPSender.ResultCode) + ' (' + HTTPSender.ResultString + ')');
        end;//case
      end;//while
      infoln('debug: resulting url after sf redir: *' + URL + '*');
    finally
      HTTPSender.Free;
    end;
  end;
  result:=URL;
end;

function DownloadHTTP(URL, TargetFile: string): boolean;
// Download file; retry if necessary.
// Deals with SourceForge download links
var
  Buffer: TMemoryStream;
begin
  result:=false;
  URL:=SourceForgeURL(URL); //Deal with sourceforge URLs
  try
    Buffer := TMemoryStream.Create;
    DownloadHTTPStream(URL, Buffer);
    Buffer.SaveToFile(TargetFile);
    result:=true;
  finally
    FreeAndNil(Buffer);
  end;
end;
« Last Edit: February 20, 2012, 03:52:39 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

lainz

  • Hero Member
  • *****
  • Posts: 4468
    • https://lainz.github.io/
Re: [Again] [Synapse] Download file from Sourceforge
« Reply #8 on: August 22, 2014, 11:54:12 pm »
Thankyou is what I was looking for.

Still working today Aug 22 2014
« Last Edit: August 22, 2014, 11:56:56 pm by 007 »

minesadorada

  • Sr. Member
  • ****
  • Posts: 452
  • Retired
Re: [Again] [Synapse] Download file from Sourceforge
« Reply #9 on: August 23, 2014, 08:46:38 am »
This code is incorporated into LazAutoUpdater as a drop-in component for simplicity:

http://wiki.lazarus.freepascal.org/LazAutoUpdater
GPL Apps: Health MonitorRetro Ski Run
OnlinePackageManager Components: LazAutoUpdate, LongTimer, PoweredBy, ScrollText, PlaySound, CryptINI

 

TinyPortal © 2005-2018