Recent

Author Topic: Pack  (Read 8797 times)

paweld

  • Hero Member
  • *****
  • Posts: 1028
Re: Pack
« Reply #30 on: February 19, 2024, 12:18:55 pm »
I ran another test on large files. The test machine is a VPS with an 8 core processor, 16gb ram, 100gb nvme, Windows 10 x64.
Resource consumption during compression:
Code: [Select]

- 7z:   CPU ~87% RAM ~1200MB
- zip:  CPU ~12% RAM ~14MB
- pack: CPU ~20% RAM ~95MB
Compression result:
Code: [Select]
================================================
folder name: test1
folder content: debian 12 BD iso
Files:        1
Directories:  1
Size:         8.095.482.046 bytes
Size on disk: 8.095.482.048 bytes
------------------------------------------------
    test1.7z -> time:       378328 ms; size:           7948464519 b
   test1.zip -> time:       310562 ms; size:           8035547628 b
  test1.pack -> time:        24203 ms; size:           8000180224 b

================================================
folder name: test2
folder content: ai model
Files:        1
Directories:  1
Size:         6.938.041.559 bytes
Size on disk: 6.938.046.464 bytes
------------------------------------------------
    test2.7z -> time:       355765 ms; size:           6253348943 b
   test2.zip -> time:       316250 ms; size:           6406476592 b
  test2.pack -> time:        19578 ms; size:           6382604288 b

================================================
folder name: test3
folder content: 4k mp4 video
Files:        1
Directories:  1
Size:         2.190.991.455 bytes
Size on disk: 2.190.991.536 bytes
------------------------------------------------
    test3.7z -> time:        69094 ms; size:           1570177331 b
   test3.zip -> time:        81672 ms; size:           1570864278 b
  test3.pack -> time:         4625 ms; size:           1570967552 b

================================================
folder name: test4
folder content: huge sample bin file
Files:        1
Directories:  1
Size:         10.737.418.314 bytes
Size on disk: 10.737.418.320 bytes
------------------------------------------------
    test4.7z -> time:       538609 ms; size:          10738083216 b
   test4.zip -> time:       478906 ms; size:          10737418642 b
  test4.pack -> time:        33156 ms; size:          10748678144 b
Best regards / Pozdrawiam
paweld

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #31 on: February 19, 2024, 12:27:08 pm »
Thank you, Paweld; your results are really appreciated.
I think Pack wants to use 50% of the CPU, but it can not. It is limited by the Disk IO speed. With a better NVMe, it should use 4 cores and the times will decrease even more.
You can check Disk IO and see if it capped or not. But I guess it is. For example the last test, Pack wrote near 10GB in 3 seconds, so more than 3GB/s output speed. Pretty fast disk.
« Last Edit: February 19, 2024, 12:35:56 pm by O »

paweld

  • Hero Member
  • *****
  • Posts: 1028
Re: Pack
« Reply #32 on: February 19, 2024, 01:06:56 pm »
For example the last test, Pack wrote near 10GB in 3 seconds, so more than 3GB/s output speed. Pretty fast disk.
33 seconds.     

The drive is on a vps so it definitely has some limitations. Unfortunately I don't have access to physical machines to test on - they all run production and as I wrote I won't run the program from an unknown source.
« Last Edit: February 19, 2024, 01:08:54 pm by paweld »
Best regards / Pozdrawiam
paweld

domasz

  • Sr. Member
  • ****
  • Posts: 443
Re: Pack
« Reply #33 on: February 19, 2024, 01:38:58 pm »
For the curious- here's a demo of SynLZ. Shoulb be faster than zstandard (and the Pack) but gives bigger files.

lichess_db_standard_rated_2014-06.pgn is packed to 378 MB but almost instantly.
« Last Edit: February 19, 2024, 01:42:45 pm by domasz »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #34 on: February 19, 2024, 01:48:03 pm »
@paweld, sorry for the misread. So it definitely got capped.
@domasz thank you for the test. SynLZ is fantastic, and has very fast compression and even faster decompression at the cost of compression ratio. It is (like similar LZ4) a much better fit for fast wire transfers. Great fit for mORMot the great work of even a greater man, ab.

KodeZwerg

  • Hero Member
  • *****
  • Posts: 2216
  • Fifty shades of code.
    • Delphi & FreePascal
Re: Pack
« Reply #35 on: February 19, 2024, 05:50:26 pm »
@KodeZwerg thank you for the test results.
My pleasure!

Can you tell me what the source file is?
Sure, here is the animated tiger GIF located that I used for my tests.

By reading more in your thread I do really start wondering how it is done, the transfer from an input file into compressed data that is written multithreaded to a SQlite.
Or the opposite read multithreaded from BLOB and store on disk.
You made me curious, can't await to look at sources to learn from!

ZipX I did used in my tests @domasz, those are the filenames with extension .zipx :D
I was actually not aware about what you've wrote about that ZipX first decode original data to achieve a better compression whilst on decompression the byte-by-byte original comes back.
I need to check it out since it sound very clever but for me it also sound impossible to recode uncompressed data to a fileformat where the byte-by-byte check is matching.


I do agree, SynLZ from mORMot is cool but better is the 7z support from mORMot 2  O:-) (depends on usecase of course)
ZLib is also quit fast IMHO while it compress in a Deflate way of compression (slightly behind original PKZip).
« Last Edit: Tomorrow at 31:76:97 xm by KodeZwerg »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #36 on: February 20, 2024, 10:14:53 am »
Thank you @KodeZwerg.
About the Cheetah GIF, as we guessed, it is the cost of the header that makes Pack produce a 7KB file. And I am not worried about it, as Packing a not compressible file less than 10KB is not an everyday task. And even then, Pack other features like speed can cover that cost, most of the times.

But to have a little of fun and show the power of handling multiple files, I made 10,000 copies of the file in a directory and tested them. Here are the results:
   tar.gz: 670 ms, 484 KB
   RAR: 3460 ms, 56.2 MB
   RAR (-s, Solid Archive): 2500 ms, 961 KB (as suggested by @marcov)
   7z: 1528 ms, 20 KB
   Pack: 103 ms, 772 KB

tar.gz is significantly faster this time, but still, more than 6X slower.
« Last Edit: February 20, 2024, 02:17:29 pm by O »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11513
  • FPC developer.
Re: Pack
« Reply #37 on: February 20, 2024, 01:50:38 pm »
(with RAR you might have to enable solid archives to compare them on an equal footing)

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #38 on: February 20, 2024, 02:25:53 pm »
Thank you, @marcov, for the suggestion. I updated the previous post about it. However, it was a fun test and not a real-life use case.
I am more focused on default behavior out of the box, as most people will use that.
It is faster than the default and has a much lower output size.
And I don't think it is equal footing; as far as I can find, solid archives make updating and random access slower. I do not know how accurate this is, as I never tested a solid feature until you suggested it.
Anyway, Pack produces smaller files, is significantly faster and still has no loss of speed on random access.

@marcov did you get to test Pack? I value your opinion.

KodeZwerg

  • Hero Member
  • *****
  • Posts: 2216
  • Fifty shades of code.
    • Delphi & FreePascal
Re: Pack
« Reply #39 on: February 20, 2024, 02:36:03 pm »
   RAR (-s, Solid Archive): 2500 ms, 961 KB (as suggested by @marcov)
That result must be wrong (size I do mean)
The RAR solid option does overlap the compression for all files taken as one stream = smaller final result than to take each file and compress it single into the container.
Whenever I am bored, I do try to repeat your rar based test.
« Last Edit: Tomorrow at 31:76:97 xm by KodeZwerg »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #40 on: February 20, 2024, 02:38:06 pm »
This is what I tried, if I should do something else, please let me know.
Code: [Select]
rar a -s solid.rar .\Cheetah10000\

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #41 on: February 20, 2024, 02:47:39 pm »
Linux version of the CLI client has been released.
Check the first post for more info.
You can get the latest CLI client from here: https://pack.ac/ or https://pack.ac/releases/linux/pack

On Linux and especially ext4, Pack is even faster, near 30% percent. So for the first post, Linux source code test and packing are done in 960 ms. Linux Core and ext4 are fast.

Linux Source code test:
Pack on Windows (NTFS): 1350 ms
Pack on Linux (ext4): 960 ms (28% faster)

tar.gz on Windows (NTFS): 28.5 s
tar.gz on Linux (ext4): 27.5 (3% faster)

With its approach, Pack can use the ext4 file system significantly better, and be even faster. I am surprised too.
« Last Edit: February 29, 2024, 08:14:16 pm by O »

KodeZwerg

  • Hero Member
  • *****
  • Posts: 2216
  • Fifty shades of code.
    • Delphi & FreePascal
Re: Pack
« Reply #42 on: February 20, 2024, 02:51:14 pm »
This is what I tried, if I should do something else, please let me know.
Code: [Select]
rar a -s solid.rar .\Cheetah10000\
I do not own the CLI tools but I assume the parameters are same:
Code: Pascal  [Select][+][-]
  1. "x:\y\WinRAR.exe" a -s -m5 "archive.rar" ".\test\*"
That would create a solid archive with best compression turnen on and yes I do agree to the counterpart, updating an solid archive takes long since the stream must be altered in a whole, same for single file extraction where the stream must be processed for that one piece.
« Last Edit: Tomorrow at 31:76:97 xm by KodeZwerg »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11513
  • FPC developer.
Re: Pack
« Reply #43 on: February 20, 2024, 03:02:27 pm »
And I don't think it is equal footing; as far as I can find, solid archives make updating and random access slower. I do not know how accurate this is, as I never tested a solid feature until you suggested it.

This is true, but that also goes for .tar.gz and afaik also 7z. These have to  also decompress files that come before the file you are extracting in a single extract scenario and repack everything when a file is added.

Quote
Anyway, Pack produces smaller files, is significantly faster and still has no loss of speed on random access.

Solid is not the ability of disallowing extracting random files in general, but not having multiple streams. So when not solid, the first byte that rolls out of the decompressor is the first byte of the file that you want. (like zip, arj and other older compressors).  So compression restarts several times within the archive, and even in a corrupted archive you can scan for such headers to do partial recovery.

So basically tar.gz first archives then compresses into a stream, while classic zip has an archive of compressed streams.

Quite often the solid aspect (everything is one big compression stream) is combined with some form of heuristical sorting of files (on extension or based on short entropy or histogram tests) to put files that are alike after eachother in the compression process. Similarly archive directories are often also compressed.

There are also compromise scenarios (iirc UCS(2) compressor had that) that it reorders files to aggregate small files in blocks that are solid (back then in  486/ Pentium I/II times, in blocks of a few MB, nowadays it would probably be larger), i.e. limiting the "extra" bytes before the first wanted byte that comes out of the decompressor when extracting single files.

(did some updates after reading KodeZwerg's post that reminded me of some more things)
« Last Edit: February 20, 2024, 03:08:25 pm by marcov »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #44 on: February 20, 2024, 04:20:39 pm »
In that sense, Pack is a hybrid.
It creates Contents (packages of raw data) from the input, but unlike tar, it does not compress all files as a whole, and unlike zip, it does not process each file one by one.
Depending on the input, each piece of content can have a chunk, a whole, or many of the input files. And after that, they can be compressed, if they are worth it.
For now, each content size is chosen dynamically; most of the time, 8 MB; This helps Pack access a file or a part of it, by reading and decompressing only an small part. Similar to pages in file systems, but bigger, and compressed. And luckily, the SQLite file format enables this random access, fast and safe.
It makes updating a file or adding items easier by only needing to touch what is needed, not the whole Pack file.
This is mandatory, as Pack's locking mechanism will let users access a file without forcing them to decrypt and decompress all the files, reducing resource use, preventing temp files, and increasing speed and security.

 

TinyPortal © 2005-2018