Programming => LCL => Topic started by: geraldholdsworth on May 28, 2021, 03:19:26 pm

Title: GZip library
Post by: geraldholdsworth on May 28, 2021, 03:19:26 pm
Hi all,

I have been using the GZip library (found in ZStream) to decompress GZip files. Most of the files I have come across only appear to have the single block in the file, so this works (the basis for I found on this forum):
Code: Pascal  [Select][+][-]
  1. function Inflate(Source: String): TDIByteArray;
  2. var
  3.  GZ     : TGZFileStream;
  4.  chunk  : TDIByteArray;
  5.  cnt,
  6.  i,
  7.  buflen : Integer;
  8. const
  9.   ChunkSize=4096; //4K chunks
  10. begin
  11.  //Initialise the variables
  12.  Result:=nil;
  13.  chunk:=nil;
  14.  //Open the stream
  15.  try
  16.   GZ:=TGZFileStream.Create(Source,gzOpenRead);
  17.   //This is our length counter
  18.   buflen:=0;
  19.   //We'll be reading it in chunks
  20.   SetLength(chunk,ChunkSize);
  21.   repeat
  22.    //Read in the next chunk
  23.    cnt:=GZ.Read(chunk[0],ChunkSize);
  24.    //Extend the buffer accordingly
  25.    SetLength(Result,buflen+cnt);
  26.    //Copy the chunk into the buffer
  27.    for i:=0 to cnt-1 do Result[buflen+i]:=chunk[i];
  28.    //Increase the buffer length counter
  29.    inc(buflen,cnt);
  30.    //Until we are done
  31.   until cnt<ChunkSize;
  32.   //Free up the stream
  33.  except
  34.  end;
  35.  GZ.Free;
  36. end;
(TDIByteArray is defined as 'array of Byte')

However, I have come across a file which has multiple blocks in the file (nearly 150). The above procedure falls over in decompressing this file (it actually only decompresses the first two). At the time I never knew the format of GZip files, but after researching I found out that they can be stored in blocks, each block having a header and footer. Therefore, each block can act as a GZip file in it's own right.

With this in mind I hatched a plan to get around this failing by separating such files into their own separate files, each only having a single block, decompressing each one in turn and patching them back together. So, this appears to work on the file I found (and single block ones too, as well as files which are not GZipped):
Code: Pascal  [Select][+][-]
  1. function Inflate(filename: String): TDIByteArray;
  2.  function L_Inflate(Source: String): TDIByteArray;
  3.  var
  4.   GZ     : TGZFileStream;
  5.   chunk  : TDIByteArray;
  6.   cnt,
  7.   i,
  8.   buflen : Integer;
  9.  const
  10.    ChunkSize=4096; //4K chunks
  11.  begin
  12.   //Initialise the variables
  13.   Result:=nil;
  14.   chunk:=nil;
  15.   //Open the stream
  16.   try
  17.    GZ:=TGZFileStream.Create(Source,gzOpenRead);
  18.    //This is our length counter
  19.    buflen:=0;
  20.    //We'll be reading it in chunks
  21.    SetLength(chunk,ChunkSize);
  22.    repeat
  23.     //Read in the next chunk
  24.     cnt:=GZ.Read(chunk[0],ChunkSize);
  25.     //Extend the buffer accordingly
  26.     SetLength(Result,buflen+cnt);
  27.     //Copy the chunk into the buffer
  28.     for i:=0 to cnt-1 do Result[buflen+i]:=chunk[i];
  29.     //Increase the buffer length counter
  30.     inc(buflen,cnt);
  31.     //Until we are done
  32.    until cnt<ChunkSize;
  33.    //Free up the stream
  34.   except
  35.   end;
  36.   GZ.Free;
  37.  end;
  38. var
  39.  F        : TFileStream;
  40.  buffer,
  41.  inflated : TDIByteArray;
  42.  ptr,i,old: Cardinal;
  43.  blockptrs: array of Cardinal;
  44.  fn       : String;
  45. begin
  46.  buffer   :=nil;
  47.  blockptrs:=nil;
  48.  inflated :=nil;
  49.  Result   :=nil;
  50.  //Read in the entire file
  51.  try
  52.   F:=TFileStream.Create(filename,fmOpenRead or fmShareDenyNone);
  53.   SetLength(buffer,F.Size);
  54.   F.Read(buffer[0],F.Size);
  55.  except
  56.  end;
  57.  F.Free;
  58.  //First, is it actually a GZip file?
  59.  if(buffer[$00]=$1F)and(buffer[$01]=$8B)and(buffer[$02]=$08)then
  60.  begin
  61.   //Count how many blocks and make note of their positions
  62.   for ptr:=0 to Length(buffer)-10 do
  63.    if(buffer[ptr]=$1F)and(buffer[ptr+1]=$8B)and(buffer[ptr+2]=$08)then
  64.    begin
  65.     //Make a note of the position
  66.     SetLength(blockptrs,Length(blockptrs)+1);
  67.     blockptrs[Length(blockptrs)-1]:=ptr;
  68.    end;
  69.  end;
  70.  //Separate each block, if more than one
  71.  if Length(blockptrs)>1 then
  72.  begin
  73.   //Add the file end to the end of the block pointers
  74.   SetLength(blockptrs,Length(blockptrs)+1);
  75.   blockptrs[Length(blockptrs)-1]:=Length(buffer);
  76.   //Set up the container for the inflated file
  77.   SetLength(Result,0);
  78.   //Get a temporary filename
  79.   fn:=GetTempDir+ExtractFileName(filename);
  80.   //Iterate through the pointers
  81.   for i:=0 to Length(blockptrs)-2 do
  82.   begin
  83.    //Create the temporary file and write the block to it
  84.    try
  85.     F:=TFileStream.Create(fn,fmCreate);
  86.     F.Write(buffer[blockptrs[i]],blockptrs[i+1]-blockptrs[i]);
  87.    except
  88.    end;
  89.    F.Free;
  90.    //Inflate the block
  91.    inflated:=L_Inflate(fn);
  92.    old:=Length(Result); //Previous length of the inflated file
  93.    //Increase the inflated file buffer to accomodate
  94.    SetLength(Result,Length(Result)+Length(inflated));
  95.    //Move the inflated data across
  96.    for ptr:=0 to Length(inflated)-1 do Result[old+ptr]:=inflated[ptr];
  97.   end;
  98.   //Delete the temporary file
  99.   if FileExists(fn) then DeleteFile(fn);
  100.  end;
  101.  //If just the one block, then don't bother splitting
  102.  if Length(blockptrs)=1 then Result:=L_Inflate(filename);
  103.  //If there are no blocks, then just return the entire file
  104.  if Length(blockptrs)=0 then Result:=buffer;
  105. end;

As you can see, the original code is now a sub-function of this one. Incidentally, don't forget:
Code: Pascal  [Select][+][-]
  1. uses ZStream,StrUtils,SysUtils,Classes
(I think that's all you need)

Just thought I'd 'park' this here for anyone else experiencing the same issue, or if anyone wants to improve on it.


Title: Re: GZip library
Post by: ChrisR on May 28, 2021, 04:04:42 pm
I believe this issue is related to
Your solution may help others who encounter this.
Title: Re: GZip library
Post by: geraldholdsworth on May 29, 2021, 01:47:09 pm
It certainly does sound like it. Although I never got an exception after the second block was inflated, but that could've been handled by the try...except block in the original code.
TinyPortal © 2005-2018