Recent

Author Topic: Error in TdecompressionStream?  (Read 1228 times)

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 344
Error in TdecompressionStream?
« on: September 23, 2023, 09:34:54 pm »
I'm having trouble reading the attached file (a gzipped json file). If I read it this way:

Code: Pascal  [Select][+][-]
  1. uses zStream;
  2.  
  3. function inflateCheck(gzipfile:string):string;
  4. var
  5.   gz: TGZFileStream;
  6.   chunk:string;
  7.   cnt:integer;
  8. const
  9.   CHUNKSIZE=4096;
  10. begin
  11. gz:= TGZFileStream.create(gzipfile,gzopenread);
  12. result:='';
  13. setlength(chunk,CHUNKSIZE);
  14. repeat
  15.   cnt:=gz.read(chunk[1],CHUNKSIZE);
  16.   if cnt<CHUNKSIZE then
  17.     setlength(chunk,cnt);
  18.   result:=result+chunk;
  19. until cnt<CHUNKSIZE;
  20. end;    
  21.  

Then I can read the file no problems. But if I read it this way

Code: Pascal  [Select][+][-]
  1. uses zStream;
  2.  
  3. function inflateCheck(gzipfile:string):string;
  4. var
  5.   f1 : TFileStream;
  6.   b2 : TBytesStream;
  7.   gz: Tdecompressionstream;
  8.   cnt:integer;
  9. const
  10.   CHUNKSIZE=4096;
  11. begin
  12.   f1 := TFileStream.create(gzipfile, fmOpenRead);
  13.   b2 := TBytesStream.create;
  14.   gz:= Tdecompressionstream.create(f1, false);
  15.   try
  16.     b2.CopyFrom(gz, gz.size);
  17.     result := TEncoding.UTF8.GetString(b2.Bytes);
  18.   finally
  19.     gz.free;
  20.     f1.free;
  21.     b2.free;
  22.   end;
  23. end;          
  24.  

Then I get "seek in deflate compressed stream failed". Why? I can't see that I'm doing anything wrong in the second approach, and I really don't want to read a file - this is a blob in a database.

Version: Lazarus 2.2.7 (rev lazarus_2_2_6-1-gada7a90f86) FPC 3.2.3 aarch64-darwin-cocoa

The gzipped bytes, btw, were produced using this java code:

Code: Java  [Select][+][-]
  1.  public static byte[] gzip(byte[] bytes) throws IOException {
  2.    
  3.     GzipParameters gp = new GzipParameters();
  4.     gp.setCompressionLevel(Deflater.BEST_COMPRESSION);
  5.     GzipCompressorOutputStream gzip = new GzipCompressorOutputStream(bOut, gp);
  6.     gzip.write(bytes);
  7.     gzip.flush();
  8.     gzip.close();
  9.     return bOut.toByteArray();
  10.   }
  11.  

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #1 on: September 23, 2023, 09:51:00 pm »
There are 3 gz formats
- gzdeflate (pure data)
- gzcompress (header added, ZLIB format)
- gzencode (header and footer checksum added, GZIP, used in .gz files)

Provide a sample compressed by your Java code
« Last Edit: September 23, 2023, 09:54:08 pm by Fibonacci »

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 344
Re: Error in TdecompressionStream?
« Reply #2 on: September 23, 2023, 09:57:25 pm »
I did - that's the attachment

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #3 on: September 23, 2023, 10:33:34 pm »
Didnt notice :D

decompressed 959 bytes to 3103 bytes

Code: Pascal  [Select][+][-]
  1. uses ZStream, Classes;
  2.  
  3. procedure DecompressStream(src, dst: TStream);
  4. var
  5.   ds: TDecompressionStream;
  6.   d: dword;
  7.   buff: array[0..1023] of byte;
  8. begin
  9.   ds := TDecompressionStream.Create(src, true);
  10.   try
  11.     repeat
  12.       d := ds.Read(buff, 1024);
  13.       dst.Write(buff, d);
  14.     until
  15.       d = 0;
  16.   finally
  17.     ds.Free;
  18.   end;
  19. end;
  20.  
  21. var
  22.   ss1, ss2: TStringStream;
  23. begin
  24.   ss1 := TStringStream.Create;
  25.   ss1.LoadFromFile('gzip.gz');
  26.   ss1.Position := 10; //SKIP GZIP HEADER
  27.  
  28.   ss2 := TStringStream.Create;
  29.  
  30.   DecompressStream(ss1, ss2);
  31.  
  32.   writeln('decompressed ', ss1.Size, ' bytes to ', ss2.Size, ' bytes');
  33.  
  34.   ss2.Position := 0;
  35.   ss2.SaveToFile('out.txt');
  36.  
  37.   ss1.Free;
  38.   ss2.Free;
  39.  
  40.   readln;
  41. end.
« Last Edit: September 23, 2023, 10:36:46 pm by Fibonacci »

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #4 on: September 23, 2023, 10:51:55 pm »
Code: Pascal  [Select][+][-]
  1. ss1.Position := 10; //SKIP GZIP HEADER

At the end of the file there is:
- 4 bytes CRC checksum
- 4 bytes decompressed data size

You should cut those too, and maybe check if checksum is valid

TRon

  • Hero Member
  • *****
  • Posts: 1864
Re: Error in TdecompressionStream?
« Reply #5 on: September 23, 2023, 11:35:52 pm »
...and I really don't want to read a file - this is a blob in a database.
A blobfield can be loaded into a stream, for example a memorystream with savetostream.

Fibonacci's code can simply be adjust to accommodate.

jamie

  • Hero Member
  • *****
  • Posts: 5851
Re: Error in TdecompressionStream?
« Reply #6 on: September 24, 2023, 12:14:51 am »
Code: Pascal  [Select][+][-]
  1.     b2.CopyFrom(gz, gz.size);
  2.     result := TEncoding.UTF8.GetString(b2.Bytes);
  3.  

Not sure but, that may have worked if the B2 stream was reset back to the start, I think most encoding is done at the start?

The only true wisdom is knowing you know nothing

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 344
Re: Error in TdecompressionStream?
« Reply #7 on: September 24, 2023, 01:47:12 am »
Thanks for this - I appreciate it. But I don't understand it. Isn't that what skipHeader is about? How would you know whether you need to manually skip some bytes at the start, and how dod you know how many bytes to skip? is it always 10?

TRon

  • Hero Member
  • *****
  • Posts: 1864
Re: Error in TdecompressionStream?
« Reply #8 on: September 24, 2023, 01:53:17 am »
Thanks for this - I appreciate it. But I don't understand it. Isn't that what skipHeader is about? How would you know whether you need to manually skip some bytes at the start, and how dod you know how many bytes to skip? is it always 10?
By (manually) analyzing the file that you posted.

Some gzip send data contains a header. html gzip compression does something similar for example (took me ages to figure out that it included a header). So yes, if your files always come from the same source or are obtained in the same manner then you would have to assume that the header is present.

see also: https://en.wikipedia.org/wiki/Gzip#File_format
« Last Edit: September 24, 2023, 02:04:59 am by TRon »

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #9 on: September 24, 2023, 02:07:48 am »
For GZIP its always 10 bytes header and 8 bytes footer. Actually in some cases it may contain optional header for eg filename, but if you compress RAW data then it will be 10 bytes.

In first example you used "TGZFileStream", its for files and it knows the file has header and footer.

In your second example you used "Tdecompressionstream" which does things at lower level, it doesnt expect header.

I am not 100% sure but ASkipHeader I did set to true was to skip ZLIB header (2 bytes I think), because (not sure) Tdecompressionstream is for ZLIB with 2 byte header, so you would need to cut 10 bytes GZIP header and add 2 bytes ZLIB header.
« Last Edit: September 24, 2023, 02:09:56 am by Fibonacci »

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 344
Re: Error in TdecompressionStream?
« Reply #10 on: September 24, 2023, 11:16:14 pm »
So (a) would it be good if TDecompressionStream could handle this header? (b) Would it be good if this stuff was documented in the unit? Or somewhere?

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #11 on: September 25, 2023, 07:01:53 am »
would it be good if TDecompressionStream could handle this header?

No, because this is a different header. Other than different header, ZLIB has also Adler32 checksum at the end, while GZIP uses CRC32.

TDecompressionStream is for ZLIB stream, but you can skip the header and decompress pure deflated data.

Would it be good if this stuff was documented in the unit? Or somewhere?

I dont know, maybe it could be explained better somewhere. I am not entitled to write FPC documentation.

Documentation states
"If ASkipHeader is true, then the gzip data header is skipped"

That is not true, what is skipped is ZLIB header.

If its feasible I would suggest you using pure deflated data without any headers. Then in FPC use TDecompressionStream with SkipHeaders - its the same as you would just do "inflate".
« Last Edit: September 25, 2023, 07:12:10 am by Fibonacci »

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #12 on: September 25, 2023, 07:05:34 am »
Just to make sure about ZLIB header, I compressed a string with TCompressionStream, then I decompressed it using SkipHeaders, and

Code: Pascal  [Select][+][-]
  1. ss1.Position := 2; //SKIP ZLIB HEADER

So yes, its 2 bytes + adler32 at the end.

GZIP is for files. Use either ZLIB (with Adler32 checksum) or pure deflate.

Grahame Grieve

  • Sr. Member
  • ****
  • Posts: 344
Re: Error in TdecompressionStream?
« Reply #13 on: September 25, 2023, 10:23:57 am »
It's the GZip file header per the documentation here. https://docs.fileformat.com/compression/gz/

I think that TdecompressionStream should be able to read it just like the file variant, but if it can't, it should at least document the difference. (IMHO)

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Error in TdecompressionStream?
« Reply #14 on: September 25, 2023, 10:50:44 am »
I looked at ZStream unit and here you go, improved version for your use case

Quote
decompressed 959 bytes to 3103 bytes
part of decompressed data
{"resourceType":"ConceptMap","

Code: Pascal  [Select][+][-]
  1. uses ZStream, Classes;
  2.  
  3. var
  4.   sinput, soutput: TStringStream;
  5.   gz: TGZipDecompressionStream;      
  6.   buff: array[0..1024*32-1] of byte;
  7.   d: dword;
  8.  
  9. begin
  10.   //input
  11.   sinput := TStringStream.Create;
  12.   sinput.LoadFromFile('gzip.gz');
  13.  
  14.   //output
  15.   soutput := TStringStream.Create;
  16.  
  17.   //decompress gzip blob
  18.   gz := TGZipDecompressionStream.Create(sinput);
  19.   repeat
  20.     d := gz.Read(buff, sizeof(buff));
  21.     soutput.Write(buff, d);
  22.   until d = 0;    
  23.   gz.Free;
  24.  
  25.   writeln('decompressed ', sinput.Size, ' bytes to ', soutput.Size, ' bytes');
  26.   writeln('part of decompressed data');
  27.   writeln(copy(soutput.DataString, 1, 30));
  28.  
  29.   sinput.Free;
  30.   soutput.Free;
  31.  
  32.   readln;
  33. end.
« Last Edit: September 25, 2023, 10:53:52 am by Fibonacci »

 

TinyPortal © 2005-2018