Recent

Author Topic: GZip library  (Read 1522 times)

geraldholdsworth

  • Full Member
  • ***
  • Posts: 195
GZip library
« on: May 28, 2021, 03:19:26 pm »
Hi all,

I have been using the GZip library (found in ZStream) to decompress GZip files. Most of the files I have come across only appear to have the single block in the file, so this works (the basis for I found on this forum):
Code: Pascal  [Select][+][-]
  1. function Inflate(Source: String): TDIByteArray;
  2. var
  3.  GZ     : TGZFileStream;
  4.  chunk  : TDIByteArray;
  5.  cnt,
  6.  i,
  7.  buflen : Integer;
  8. const
  9.   ChunkSize=4096; //4K chunks
  10. begin
  11.  //Initialise the variables
  12.  Result:=nil;
  13.  chunk:=nil;
  14.  //Open the stream
  15.  try
  16.   GZ:=TGZFileStream.Create(Source,gzOpenRead);
  17.   //This is our length counter
  18.   buflen:=0;
  19.   //We'll be reading it in chunks
  20.   SetLength(chunk,ChunkSize);
  21.   repeat
  22.    //Read in the next chunk
  23.    cnt:=GZ.Read(chunk[0],ChunkSize);
  24.    //Extend the buffer accordingly
  25.    SetLength(Result,buflen+cnt);
  26.    //Copy the chunk into the buffer
  27.    for i:=0 to cnt-1 do Result[buflen+i]:=chunk[i];
  28.    //Increase the buffer length counter
  29.    inc(buflen,cnt);
  30.    //Until we are done
  31.   until cnt<ChunkSize;
  32.   //Free up the stream
  33.  except
  34.  end;
  35.  GZ.Free;
  36. end;
  37.  
(TDIByteArray is defined as 'array of Byte')

However, I have come across a file which has multiple blocks in the file (nearly 150). The above procedure falls over in decompressing this file (it actually only decompresses the first two). At the time I never knew the format of GZip files, but after researching I found out that they can be stored in blocks, each block having a header and footer. Therefore, each block can act as a GZip file in it's own right.

With this in mind I hatched a plan to get around this failing by separating such files into their own separate files, each only having a single block, decompressing each one in turn and patching them back together. So, this appears to work on the file I found (and single block ones too, as well as files which are not GZipped):
Code: Pascal  [Select][+][-]
  1. function Inflate(filename: String): TDIByteArray;
  2.  function L_Inflate(Source: String): TDIByteArray;
  3.  var
  4.   GZ     : TGZFileStream;
  5.   chunk  : TDIByteArray;
  6.   cnt,
  7.   i,
  8.   buflen : Integer;
  9.  const
  10.    ChunkSize=4096; //4K chunks
  11.  begin
  12.   //Initialise the variables
  13.   Result:=nil;
  14.   chunk:=nil;
  15.   //Open the stream
  16.   try
  17.    GZ:=TGZFileStream.Create(Source,gzOpenRead);
  18.    //This is our length counter
  19.    buflen:=0;
  20.    //We'll be reading it in chunks
  21.    SetLength(chunk,ChunkSize);
  22.    repeat
  23.     //Read in the next chunk
  24.     cnt:=GZ.Read(chunk[0],ChunkSize);
  25.     //Extend the buffer accordingly
  26.     SetLength(Result,buflen+cnt);
  27.     //Copy the chunk into the buffer
  28.     for i:=0 to cnt-1 do Result[buflen+i]:=chunk[i];
  29.     //Increase the buffer length counter
  30.     inc(buflen,cnt);
  31.     //Until we are done
  32.    until cnt<ChunkSize;
  33.    //Free up the stream
  34.   except
  35.   end;
  36.   GZ.Free;
  37.  end;
  38. var
  39.  F        : TFileStream;
  40.  buffer,
  41.  inflated : TDIByteArray;
  42.  ptr,i,old: Cardinal;
  43.  blockptrs: array of Cardinal;
  44.  fn       : String;
  45. begin
  46.  buffer   :=nil;
  47.  blockptrs:=nil;
  48.  inflated :=nil;
  49.  Result   :=nil;
  50.  //Read in the entire file
  51.  try
  52.   F:=TFileStream.Create(filename,fmOpenRead or fmShareDenyNone);
  53.   SetLength(buffer,F.Size);
  54.   F.Read(buffer[0],F.Size);
  55.  except
  56.  end;
  57.  F.Free;
  58.  //First, is it actually a GZip file?
  59.  if(buffer[$00]=$1F)and(buffer[$01]=$8B)and(buffer[$02]=$08)then
  60.  begin
  61.   //Count how many blocks and make note of their positions
  62.   for ptr:=0 to Length(buffer)-10 do
  63.    if(buffer[ptr]=$1F)and(buffer[ptr+1]=$8B)and(buffer[ptr+2]=$08)then
  64.    begin
  65.     //Make a note of the position
  66.     SetLength(blockptrs,Length(blockptrs)+1);
  67.     blockptrs[Length(blockptrs)-1]:=ptr;
  68.    end;
  69.  end;
  70.  //Separate each block, if more than one
  71.  if Length(blockptrs)>1 then
  72.  begin
  73.   //Add the file end to the end of the block pointers
  74.   SetLength(blockptrs,Length(blockptrs)+1);
  75.   blockptrs[Length(blockptrs)-1]:=Length(buffer);
  76.   //Set up the container for the inflated file
  77.   SetLength(Result,0);
  78.   //Get a temporary filename
  79.   fn:=GetTempDir+ExtractFileName(filename);
  80.   //Iterate through the pointers
  81.   for i:=0 to Length(blockptrs)-2 do
  82.   begin
  83.    //Create the temporary file and write the block to it
  84.    try
  85.     F:=TFileStream.Create(fn,fmCreate);
  86.     F.Write(buffer[blockptrs[i]],blockptrs[i+1]-blockptrs[i]);
  87.    except
  88.    end;
  89.    F.Free;
  90.    //Inflate the block
  91.    inflated:=L_Inflate(fn);
  92.    old:=Length(Result); //Previous length of the inflated file
  93.    //Increase the inflated file buffer to accomodate
  94.    SetLength(Result,Length(Result)+Length(inflated));
  95.    //Move the inflated data across
  96.    for ptr:=0 to Length(inflated)-1 do Result[old+ptr]:=inflated[ptr];
  97.   end;
  98.   //Delete the temporary file
  99.   if FileExists(fn) then DeleteFile(fn);
  100.  end;
  101.  //If just the one block, then don't bother splitting
  102.  if Length(blockptrs)=1 then Result:=L_Inflate(filename);
  103.  //If there are no blocks, then just return the entire file
  104.  if Length(blockptrs)=0 then Result:=buffer;
  105. end;

As you can see, the original code is now a sub-function of this one. Incidentally, don't forget:
Code: Pascal  [Select][+][-]
  1. uses ZStream,StrUtils,SysUtils,Classes
(I think that's all you need)

Just thought I'd 'park' this here for anyone else experiencing the same issue, or if anyone wants to improve on it.

Cheers,

Gerald.

ChrisR

  • Full Member
  • ***
  • Posts: 247
Re: GZip library
« Reply #1 on: May 28, 2021, 04:04:42 pm »
I believe this issue is related to
  https://bugs.freepascal.org/view.php?id=36822
Your solution may help others who encounter this.

geraldholdsworth

  • Full Member
  • ***
  • Posts: 195
Re: GZip library
« Reply #2 on: May 29, 2021, 01:47:09 pm »
It certainly does sound like it. Although I never got an exception after the second block was inflated, but that could've been handled by the try...except block in the original code.

 

TinyPortal © 2005-2018