Recent

Author Topic: PDF decompression / zlib  (Read 2049 times)

zxmwc24

  • New member
  • *
  • Posts: 7
PDF decompression / zlib
« on: February 13, 2024, 01:35:34 pm »
I need to read a compressed PDF object stream containing the cross-reference table. According to the PDF Reference, a standard compression method shall be used (zlib). So I used the FPC "zstream" unit and tested the compression and uncompression functions successfully (using a TMemo component). However, when I try to apply the decompression to an existing PDF file, I get a EDecompression data error. Here is the decompression function:
Code: Pascal  [Select][+][-]
  1. function DecompressStream(inStream, outStream: TStream): Boolean;
  2. var
  3.   ds: TDecompressionStream;
  4.   buf: array[0..4095] of Byte; count: integer;
  5. begin
  6.   Result := True;
  7.   inStream.Position := 0;
  8.   outStream.Size := 0;
  9.   if inStream.Size > 0 then
  10.   begin
  11.     ds := TDecompressionStream.Create(inStream);
  12.     try
  13.       repeat
  14.         try
  15.           Count := ds.Read(Buf[0], SizeOf(Buf));
  16.         except
  17.           break;
  18.         end;
  19.         if Count = 0 then
  20.           break
  21.         else begin
  22.           OutStream.Write(Buf[0], Count);
  23.         end;
  24.       until count=0;
  25.     except
  26.       Result := False;
  27.     end;
  28.     ds.Free;
  29.   end;
  30. end;

Here is the call from the main program reading pdf files:
Code: Pascal  [Select][+][-]
  1.        instream:= TMemoryStream.Create;
  2.        outstream:= TMemoryStream.Create;
  3.        try
  4.          instream.WriteBuffer(p, k);       <--- here I insert the compressed data of length k from the pdf file into the stream
  5.          if DeCompressStream(instream,outstream) then begin
  6.            cnt:= outstream.Size;
  7.            GetMem(buffer,cnt);
  8.            outstream.Position:=0;
  9.            outstream.ReadBuffer(buffer,cnt);
  10.          end
  11.          else exit;
  12.        finally
  13.          instream.Free;
  14.          outstream.Free;
  15.        end;  

Can anybody help? I tested several PDF files, so that is not where the problem is. I wonder whether "zstream" is actually a precise implementation of the "deflate" algorithm required in PDF files.
« Last Edit: February 13, 2024, 01:45:07 pm by zxmwc24 »

domasz

  • Hero Member
  • *****
  • Posts: 554
Re: PDF decompression / zlib
« Reply #1 on: February 13, 2024, 01:38:13 pm »
People often confuse zlib and deflate. Try deflate instead.

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: PDF decompression / zlib
« Reply #2 on: February 13, 2024, 01:45:45 pm »
You may try with my zflate unit:

https://github.com/fibodevy/zflate/blob/main/src/zflate.pas

Code: Pascal  [Select][+][-]
  1. //try to detect buffer format and decompress it at once
  2. function zdecompress(data: pointer; size: dword; var output: pointer; var outputsize: dword): boolean;
  3. //try to detect string format and decompress it at once
  4. function zdecompress(str: string): string;
  5. //try to detect bytes format and decompress it at once
  6. function zdecompress(bytes: TBytes): TBytes;

zxmwc24

  • New member
  • *
  • Posts: 7
Re: PDF decompression / zlib
« Reply #3 on: February 13, 2024, 01:49:37 pm »
People often confuse zlib and deflate. Try deflate instead.
Looks like I am one of these people. I read somewhere that there is a difference in the header of the compressed data. But zstream should automatically detect whether it is flate/deflate or gzip. Or is there distinct different function I should apply?

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: PDF decompression / zlib
« Reply #4 on: February 13, 2024, 01:53:28 pm »
It does not detect, but zflate does. Put all your data in array of bytes and try with TBytes version of zdecompress().

Code: Pascal  [Select][+][-]
  1. yourbytes := zdecompress(yourbytes);

zxmwc24

  • New member
  • *
  • Posts: 7
Re: PDF decompression / zlib
« Reply #5 on: February 13, 2024, 01:55:10 pm »
You may try with my zflate unit:

https://github.com/fibodevy/zflate/blob/main/src/zflate.pas

Code: Pascal  [Select][+][-]
  1. //try to detect buffer format and decompress it at once
  2. function zdecompress(data: pointer; size: dword; var output: pointer; var outputsize: dword): boolean;
  3. //try to detect string format and decompress it at once
  4. function zdecompress(str: string): string;
  5. //try to detect bytes format and decompress it at once
  6. function zdecompress(bytes: TBytes): TBytes;

Thanks a lot, will try and report back!

zxmwc24

  • New member
  • *
  • Posts: 7
Re: PDF decompression / zlib
« Reply #6 on: February 14, 2024, 07:25:07 am »
I tried zflate. Unfortunately, it returns an empty array of bytes. The PDF file I am currently looking at has compressed data of Length 128 in the cross reference stream:

43 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Encrypt 11 0 R/Filter/FlateDecode/ID[<E9DB3CAC0A6C2BE84FD276E7ECFAE2D4><A15FBE31C7DE944291C84AACB9645D60>]/Index[10 81]/Info 9 0 R/Length 128/Prev 47539/Root 12 0 R/Size 91/Type/XRef/W[1 2 1]>>stream
hÞbbd``b`¶Œ×€Cˆ$˜¬A¬    ˆ`̱ê@¬ZaâF‚XV A¬÷@‚ïà>"æƒÌk‚¼ %i n*a)v¥@‚³Ä2™W“`Jƒ+Ý·˜–‚ ``ÖÄ&¿Ë  Td
endstream

The corresponding bytes between stream and endstream (without the #10 and #13 endline markers) are as follows:
104, 222, 98, 98, 100, 16, 96, 96, 98, 96, 182, 3, 18, 140, 215, 128, 4, 67, 20, 136, 21, 8, 36, 152, 172, 65, 172, 9, 32, 22, 136, 96, 204, 3, 177, 234, 64, 172, 90, 16, 97, 1, 226, 70, 130, 88, 86, 32, 3, 26, 65, 172, 247, 64, 130, 239, 19, 144, 224, 62, 14, 34, 230, 131, 204, 107, 5, 18, 130, 188, 32, 37, 105, 32, 110, 42, 144, 16, 97, 1, 41, 118, 3, 17, 165, 64, 130, 179, 4, 196, 50, 3, 153, 87, 6, 147, 96, 74, 131, 43, 1, 17, 221, 183, 24, 152, 24, 25, 150, 130, 12, 96, 96, 28, 214, 196, 127, 38, 191, 203, 0, 1, 6, 0, 84, 100, 19, 13

After decompressing the stream, I still need to apply the ReversePNG filter (Predictor 12). But first I need to get the decompression working. Thanks for any hints.
P.S. The encryption can be ignored, it does not apply to cross reference streams / tables.
« Last Edit: February 14, 2024, 07:32:47 am by zxmwc24 »

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: PDF decompression / zlib
« Reply #7 on: February 14, 2024, 07:35:29 am »
Isnt it encrypted? Looks like it is.

Its not ZLIB, not GZIP and I checked deflate (with no header) - data error, so not defalte either.

EDIT: Is this correct data?

Quote
02010010000200033e00020001d6000200005a0002000151000200023b00020001900002000290000200016e000200027e000200017d00020001380002000259000200013a000200038100020001ef0002000ef20002000bc70002000b9f0002000085000200110d0002000366000200006500020014040002000146000200017500020009740002000136000200027600020001460002000266000200014600020001460002008bda00020100a50002000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102000000010200000001020000000102ff024ed3
« Last Edit: February 14, 2024, 07:44:13 am by Fibonacci »

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: PDF decompression / zlib
« Reply #8 on: February 14, 2024, 08:10:14 am »
zflate has been updated: https://github.com/fibodevy/zflate/commit/333cb73eff54cea84599be057958c7ccc61f5e94

It turned out there was a check only for a small (but most common) set of ZLIB headers. I updated the zlib header reader function to check other possible headers.

Your data is correct and has valid adler32 checksum.

For ZLIB you can use gzuncompress(), or use zdecompress() to auto-detect.

PS. Thanks for helping to improve zflate ;) Now it should handle every ZLIB, GZIP and pure deflate.
« Last Edit: February 14, 2024, 08:13:35 am by Fibonacci »

zxmwc24

  • New member
  • *
  • Posts: 7
Re: PDF decompression / zlib
« Reply #9 on: February 14, 2024, 03:04:27 pm »
Thanks a million, works like a charm now!   ;)

zxmwc24

  • New member
  • *
  • Posts: 7
Re: PDF decompression / zlib
« Reply #10 on: February 24, 2024, 04:45:58 pm »
zflate has been updated: https://github.com/fibodevy/zflate/commit/333cb73eff54cea84599be057958c7ccc61f5e94

It turned out there was a check only for a small (but most common) set of ZLIB headers. I updated the zlib header reader function to check other possible headers.

Your data is correct and has valid adler32 checksum.

For ZLIB you can use gzuncompress(), or use zdecompress() to auto-detect.

PS. Thanks for helping to improve zflate ;) Now it should handle every ZLIB, GZIP and pure deflate.

I came across another PDF compressed stream which appears to be incompatible with your algorithm. Could you check if the following are valid bytes:
99, 70, 207, 82, 195, 176, 243, 145, 171, 59, 17, 128, 168, 69, 142, 117, 232, 221, 67, 179, 208, 190, 76, 120, 88, 230, 191, 19, 31, 155, 187, 181, 244, 6, 100, 211, 44, 116, 197, 4, 242, 78, 172, 104, 243, 146, 184, 51, 60, 112, 210, 87, 64, 194, 215, 225, 124, 216, 46, 227, 160, 145, 32, 208, 148, 117, 66, 143, 62, 253, 177, 171, 212, 115, 209, 158, 246, 15, 156, 46, 66, 36, 45, 226, 234, 208, 129, 139, 34, 149, 121, 187, 223, 210, 100, 122, 18, 108, 12, 170, 81, 172, 153, 142, 44, 19, 75, 244, 99, 128, 122, 144, 223, 23, 198, 202, 82, 70, 96, 122, 179, 125, 143, 109, 100, 184, 122, 43, 221, 250, 165, 29, 175, 143, 252, 162, 187, 118, 173, 116, 27, 217, 85, 52, 185, 198, 194, 77, 39, 65, 144, 142, 99, 123, 59, 150, 173, 247, 133, 163, 246, 240, 70, 93, 176, 159, 200, 24, 237, 57, 97, 154, 209, 129, 220, 96, 210, 1, 97, 220, 178, 133, 246, 252, 78

Length of the stream is 185.

rvk

  • Hero Member
  • *****
  • Posts: 6640
Re: PDF decompression / zlib
« Reply #11 on: February 24, 2024, 05:04:36 pm »
I came across another PDF compressed stream which appears to be incompatible with your algorithm. Could you check if the following are valid bytes:
Are you sure the stream is unencrypted?

Some streams are encrypted and without the entire file is hard to day.

Also, what's around that stream is also important as to know how it's compressed.


Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: PDF decompression / zlib
« Reply #12 on: February 24, 2024, 05:08:51 pm »
That doesnt look valid, and inflate has problems with it. zflate does decompress it to 231 bytes but that seems invalid (mostly nulls). PHP's gzinflate() returns data error. I checked in a loop cutting first byte off to see where it will succeed, but it failed all over. Also tried to cut 4 and 8 last bytes, without success.

 

TinyPortal © 2005-2018