Recent

Author Topic: Pack  (Read 8800 times)

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Pack
« on: February 18, 2024, 08:47:51 am »
(Update 2024-03-18: Pack version 2 has been released. It added List, Partial Unpack, and more; see this post for more.)



Hello to you all.

I’ve been working on a new container format that can support files and raw data.
It is made to be safe, fast, and reliable for years to come.
It is designed to provide random access to the content while being as or better compressed than most similar formats.
And it is faster, almost in all cases, than other similar projects like Zip, gzip, tar, RAR, 7z, and such.

And it is called Pack.

You can get the latest CLI client from here: https://pack.ac/
Windows: https://pack.ac/releases/windows/pack.exe
Linux: https://pack.ac/releases/linux/pack
Source: https://pack.ac/releases/source/Source.pack (or from GitHub) (Build)

For packing some files, simply write:
Code: [Select]
pack ./test/
And for unpacking:
Code: [Select]
pack ./test.pack
Use `--help` parameter for more options.
For example, here is how to overwrite the output:
Code: [Select]
pack -i ./test/ -o ./test.pack -w
Some numbers:
Packing a copy of Linux source code containing more than 81K files and around 1.25GB on Windows (with NTFS):
Code: Pascal  [Select][+][-]
  1. ZIP:     253 MB,  146 s      = 1
  2. tar.gz:  214 MB,  28.5 s     ====== 6
  3. tar:    1345 MB,  4.7 s      ====== 6
  4. RAR:     235 MB,  27.5 s     ====== 6
  5. 7z:      135 MB,  54.2 s     ===== 5
  6. Pack:    194 MB,  1.3 s      ================================================================================================================================================= 145

On Linux (with ext4) it is even faster:
   tar.gz: 27.5 s
   Pack: 0.96 s

And it is even faster that tar even if it does not do any compression:
   Pack (With no Press): 1.8 s, 1.25 GB (More)

Please test and report back any suggestions or issues. I will look into it all.
You can find me here, or emailing me using, o at pack.ac.
« Last Edit: March 19, 2024, 03:57:37 pm by O »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #1 on: February 18, 2024, 08:48:09 am »
Notes:

What you may like to know:
- It is Free and remains as is. I made it to enable people to have a safer, easier and faster life.
- It is made using the latest versions of FreePascal and Lazarus (but no FCL or LCL) and internally uses SQLite and ZSTD statically; there is no external dependency beside OS.
- Source code is available with a permissive licence.
- It is at the beta stage; keep the input files. It is designed to be safe, have crash resistance, and prevent reading problematic data and exploited by many vulnerabilities, but for now it is intended only for evaluation purposes.
- It is fast—really fast.
- It is smart. It configures itself as needed; there are not many dials to play with.
- It is very resource-friendly.
- It will get more updates including Encryption.


If you like to know more, continue.

Safety: Everything is made with safety as the first priority.
If you want to verify the packing process, use these parameters: `--activate-other-options --verify-pack`
It will pack, unpack, and compare the input and output to make sure everything went correctly.

Privacy: No private metadata will be written to the file.

Benchmarking: Numbers are from my personal machine with their corresponding official program in an out-of-the-box configuration. And all are considered to be in a warm state with no antivirus interference. Please test for yourself and report back if possible. On the first try of reading many files, Windows Defender (or any other antivirus) makes everything slow as it is scanning the files. Run any test at least two times.

File Format: This version is Draft 0. At the end of this stage, it will get to Final 1. Any file made now will be supported by future versions, so rest easy.
Internally, it is based on SQLite3 and tries to inherit all the goods like reliability, safety, and speed, but the file header is changed to prevent any mistakes and future compatibility.
You can use the `--activate-other-options --transform-to-sqlite3` parameter to convert a Pack file to SQLite3, and use the `--activate-other-options --transform-to-pack` to go back.

Platforms: For now, it is only available for Windows and Linux. It is designed to run on any architecture supported by FreePascal. Soon, support for macOS, and more will come.

Program Version: They will use incremental versioning. So the first public version is 1, and the next with fixes will be 2, and so on.

Pascal: Other than static builds of SQLite and Zstandard all the code is in Pascal, in Delphi mode (as I prefer the generic style), and it only uses system and objfpc units, so no FCL or LCL. I do not have a plan to support the Delphi compiler, as it has been years since I tried it.

Design: Format and Programs are separate from each other.

Data structure:
The standard format in the Draft 0 version is about having three tables in SQLite:
- One for Items to store files or data information and structure. For example, if it is a folder or a file, what is the name.
- One for Contents. They are packages of raw data from a chunk, a whole, or many of the items. They may be compressed if needed.
- One for the relationship between Items and Contents. Showing what part of the content is used for what part of what item.
This approach, while simple at first glance, enables grouping similar data to fit together to enable better compression, while maintaining fast random access.
All the columns were chosen carefully (to match the SQLite3 file format) to keep the overhead at a minimum and let the reader extract information as needed. So getting a list of items (or files) is a matter of milliseconds.
Using such a popular, safe, and well-tested database, ensures safety, crash resistance, and transactional updates, future compatibility and even being faster than file systems in some cases, although Pack takes this to another level and, in more cases, is faster than file systems (because of the handling of small files, and working with fast compressed data).
For the curious: Look at the previous `File Format` note to convert Pack files to SQLite3 and back.

Using Zstandard as a well-used, standard, popular and new proper alternative to DEFLATE, enables Pack to compress fast, and decompress even faster. Although it comes with multi-threading, Pack uses its own, configuring as needed to get even more speed out of this great library, enabling it to even be a little faster than Zstandard's official client while being a container format and supporting multiple files and folders and random access. All the compressed data has Zstandard signature using the extremely fast xxHash

Programs:
Implementing Pack and Unpack should be done in maybe a couple of hundreds of lines (and I may do it as a demo). But the officially provided programs use a multithreaded (depending on the CPU specification and input data), queue approach. It scans all the input files or data, estimates how it can split them into contents, and determines if they are worth the compression. Then read them asynchronously, compress if needed using Zstandard, and store them synchronously into SQLite. All items' information is stored at the start to keep accessing them faster, and with fewer disk jumps.
It is designed for new generations of hardware, for multicore CPUs, fast SSDs, and NVMes. Albeit it can show capability with even one core and an old HDD anyway, you see the difference on newer hardware much better.
What made Pack considerable, is that it configures SQLite and Zstandard to their best ability, and the code is written using a custom optimized standard library, so almost no memory allocation is needed, and hopefully no extra CPU cycle is wasted.
Considering the resource usage, user experience, and speed, it can be a proper choice for everyday use as it goes easy on the end-user hardware, and because of its high throughput compared to needed resources, it can be a proper choice to be used on servers and for heavy-duty tasks.


Final Note:
SQLite and Zstandard are fantastic; I felt tying them together using a fresh Pack approach produced an interesting outcome, and for now, results show that. Beside these, without dear FreePascal and Lazarus, my life would be harder. My respect for the teams.
« Last Edit: March 01, 2024, 08:08:35 am by O »

bobby100

  • Full Member
  • ***
  • Posts: 176
    • Malzilla
Re: Pack
« Reply #2 on: February 18, 2024, 09:08:04 am »
Short translation:

Hi, I am new here, and here is an EXE file for you to download and start on your PC. It has nothing to do with Lazarus/FreePascal, but start it anyway.
https://gitlab.com/bobby100 - my Lazarus components and units
https://sourceforge.net/u/boban_spasic/profile/ - my open source apps

https://malzilla.org/ - remainder at my previous life as a web security expert

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #3 on: February 18, 2024, 09:18:00 am »
That's fair, but the source code needs more work.
As stated in the notes, it is written using FreePascal and Lazarus.
Here is a link to an online antivirus check:
https://www.virustotal.com/gui/file/ca06985ffe1bb9f013a4938401e6c3c81404b7642e2963526a3ecb40facde64b

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9979
  • Debugger - SynEdit - and more
    • wiki
Re: Pack
« Reply #4 on: February 18, 2024, 09:46:27 am »
I removed the links to the exe.

Please explain and show how this relates to Pascal.
« Last Edit: February 18, 2024, 10:58:50 am by Martin_fr »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #5 on: February 18, 2024, 09:52:02 am »
Hello Martin_fr, how can I show how this relates other than saying it is made using FreePascal? Let me know and I will be happy to solve it.
And on the https://www.virustotal.com/gui/file/ca06985ffe1bb9f013a4938401e6c3c81404b7642e2963526a3ecb40facde64b/details (Details Tab), you can see that is compiled with FreePascal.
And I update the original post and add some notes about Pascal.

On why I first released it here, I should say that I was showing appreciation for the tool I mainly used to develop it.
« Last Edit: March 01, 2024, 11:20:09 am by O »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9979
  • Debugger - SynEdit - and more
    • wiki
Re: Pack
« Reply #6 on: February 18, 2024, 10:28:06 am »
Well, welcome to the forum. Let's see where this goes.

At all others, please note that this remains an exe of unknown origin, with all risks this may carry.
For those willing to take that risk: His site is "pack.ac"
« Last Edit: February 18, 2024, 10:34:36 am by Martin_fr »

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #7 on: February 18, 2024, 10:30:38 am »
Thank you. Will you bring back the URLs?
I had the third post for Versions as I imagine I will release more versions soon with bug fixes.

P.S. I can not message you and I do not have your email, but I will be happy to send you the codebase if you like to check.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9979
  • Debugger - SynEdit - and more
    • wiki
Re: Pack
« Reply #8 on: February 18, 2024, 10:30:50 am »
Your packing into a database? Interesting choice.

Have you considered that this may cause overhead? E.g. that you could pack even better if you skipped that?

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #9 on: February 18, 2024, 10:38:14 am »
Yes. And yes. I experimented with many designs.
SQLite gives you the ability to perform a transactional update. As I noted, the first priority was and is safety.
As far as I know, it is the safest verified way to update a container file like this without risking it being corrupted.
Also, it is the fastest and easiest way to query the structure to create explorer-like abilities, so unlike similar formats, Pack can give you the list of files for a big pack in milliseconds.

And yes, there is overhead, but I engineered it to lower it significantly, mostly less than 1% for big files (with many files), like the Linux codebase with more than 81K files and folders, and almost none for one or two big files, like a couple of music files.
For the smallest  output Pack files, the current limit is 2KB.

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #10 on: February 18, 2024, 10:40:59 am »
@Martin_fr, I updated the first post with the links, but if there is something in need of change, please let me know.
Can you bring back the third post with Version history so I can update it in future?

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9979
  • Debugger - SynEdit - and more
    • wiki
Re: Pack
« Reply #11 on: February 18, 2024, 10:47:29 am »
@Martin_fr, I updated the first post with the links, but if there is something in need of change, please let me know.
Can you bring back the third post with Version history so I can update it in future?

Sorry, they are not stored, once removed. So unfortunately it needs to be redone.

I'll see what the other moderators think about having the links for now (i.e. until we know you better).

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #12 on: February 18, 2024, 10:57:51 am »
That's fine. I do it another way.
Thank you.

I hope others find Pack, useful.

paweld

  • Hero Member
  • *****
  • Posts: 1028
Re: Pack
« Reply #13 on: February 18, 2024, 02:14:38 pm »
I tested your application and it promises to be interesting - I ran the test on 2 vps (4 cores, 8gb ram, 50gb nvme drive, win10 pro x64), the first one was used for data compression and the second one for decompression. I compared your solution to 7z and zip formats - 7zip version 23.01 x64.
Unfortunately, due to an untrusted source and lack of testing, I treat this as a curiosity for now.
Attached are the sources of the test programs.
Pack log:
Code: [Select]
================================================
folder name: test1
folder content: 7z2301-extra
Files:        18
Directories:  3
Size:         3.959.775 bytes
Size on disk: 4.038.656 bytes
------------------------------------------------
    test1.7z -> time:     516 ms; size:    1083726 b
   test1.zip -> time:     218 ms; size:    1914301 b
  test1.pack -> time:     156 ms; size:    1597440 b

================================================
folder name: test2
folder content: fpc src
Files:        24239
Directories:  1747
Size:         310.947.442 bytes
Size on disk: 459.063.296 bytes
------------------------------------------------
    test2.7z -> time:   29328 ms; size:   34315715 b
   test2.zip -> time:   12204 ms; size:   64853627 b
  test2.pack -> time:    1328 ms; size:   48402432 b

================================================
folder name: test3
folder content: lazarus src
Files:        17703
Directories:  1008
Size:         223.762.871 bytes
Size on disk: 327.360.512 bytes
------------------------------------------------
    test3.7z -> time:   22734 ms; size:   32914271 b
   test3.zip -> time:    7516 ms; size:   64292829 b
  test3.pack -> time:     937 ms; size:   44597248 b

================================================
folder name: test4
folder content: fpc and lazarus src
Files:        41944
Directories:  2756
Size:         667.049.996 bytes
Size on disk: 918.773.760 bytes
------------------------------------------------
    test4.7z -> time:   61531 ms; size:  178884598 b
   test4.zip -> time:   23734 ms; size:  252376590 b
  test4.pack -> time:    2516 ms; size:  207597568 b

================================================
folder name: test5
folder content: mssql db backup
Files:        1
Directories:  1
Size:         1.897.227.776 bytes
Size on disk: 1.897.230.336 bytes
------------------------------------------------
    test5.7z -> time:  135985 ms; size:  187261769 b
   test5.zip -> time:  194641 ms; size:  344350196 b
  test5.pack -> time:    3187 ms; size:  287600640 b

================================================
folder name: test6
folder content: installers (x64): firefox, lazarus, winmerge, 7z, notepad++
Files:        5
Directories:  1
Size:         291.381.288 bytes
Size on disk: 291.393.536 bytes
------------------------------------------------
    test6.7z -> time:   23156 ms; size:  291090572 b
   test6.zip -> time:    7641 ms; size:  291203917 b
  test6.pack -> time:     719 ms; size:  291467264 b

================================================
folder name: test7
folder content: another mssql db backup
Files:        1
Directories:  1
Size:         319.930.368 bytes
Size on disk: 319.930.368 bytes
------------------------------------------------
    test7.7z -> time:   18609 ms; size:   92631455 b
   test7.zip -> time:   23234 ms; size:  103393515 b
  test7.pack -> time:     703 ms; size:  100556800 b
Unpack log:
Code: [Select]
------------------------------------------------
  test1.pack -> time:     141 ms
folder name: test1
folder content: 7z2301-extra
Files:        18
Directories:  4
Size:         3.959.775 bytes
Size on disk: 4.042.752 bytes
------------------------------------------------
    test1.7z -> time:      78 ms
folder name: test1
folder content: 7z2301-extra
Files:        18
Directories:  4
Size:         3.959.775 bytes
Size on disk: 4.042.752 bytes
------------------------------------------------
   test1.zip -> time:      62 ms
folder name: test1
folder content: 7z2301-extra
Files:        18
Directories:  4
Size:         3.959.775 bytes
Size on disk: 4.042.752 bytes

------------------------------------------------
  test2.pack -> time:    6172 ms
folder name: test2
folder content: fpc src
Files:        24239
Directories:  1748
Size:         310.947.442 bytes
Size on disk: 459.067.392 bytes
------------------------------------------------
    test2.7z -> time:   15844 ms
folder name: test2
folder content: fpc src
Files:        24239
Directories:  1748
Size:         310.947.442 bytes
Size on disk: 459.067.392 bytes
------------------------------------------------
   test2.zip -> time:   17094 ms
folder name: test2
folder content: fpc src
Files:        24239
Directories:  1748
Size:         310.947.442 bytes
Size on disk: 459.067.392 bytes

------------------------------------------------
  test3.pack -> time:    4969 ms
folder name: test3
folder content: lazarus src
Files:        17703
Directories:  1009
Size:         223.762.871 bytes
Size on disk: 327.364.608 bytes
------------------------------------------------
    test3.7z -> time:   11547 ms
folder name: test3
folder content: lazarus src
Files:        17703
Directories:  1009
Size:         223.762.871 bytes
Size on disk: 327.364.608 bytes
------------------------------------------------
   test3.zip -> time:   17719 ms
folder name: test3
folder content: lazarus src
Files:        17703
Directories:  1009
Size:         223.762.871 bytes
Size on disk: 327.364.608 bytes

------------------------------------------------
  test4.pack -> time:   14203 ms
folder name: test4
folder content: fpc and lazarus src
Files:        41944
Directories:  2757
Size:         667.049.996 bytes
Size on disk: 918.777.856 bytes
------------------------------------------------
    test4.7z -> time:   28922 ms
folder name: test4
folder content: fpc and lazarus src
Files:        41944
Directories:  2757
Size:         667.049.996 bytes
Size on disk: 918.777.856 bytes
------------------------------------------------
   test4.zip -> time:   39750 ms
folder name: test4
folder content: fpc and lazarus src
Files:        41944
Directories:  2757
Size:         667.049.996 bytes
Size on disk: 918.777.856 bytes

------------------------------------------------
  test5.pack -> time:    4641 ms
folder name: test5
folder content: mssql db backup
Files:        1
Directories:  2
Size:         1.897.227.776 bytes
Size on disk: 1.897.234.432 bytes
------------------------------------------------
    test5.7z -> time:    4703 ms
folder name: test5
folder content: mssql db backup
Files:        1
Directories:  2
Size:         1.897.227.776 bytes
Size on disk: 1.897.234.432 bytes
------------------------------------------------
   test5.zip -> time:    7484 ms
folder name: test5
folder content: mssql db backup
Files:        1
Directories:  2
Size:         1.897.227.776 bytes
Size on disk: 1.897.234.432 bytes

------------------------------------------------
  test6.pack -> time:    2609 ms
folder name: test6
folder content: installers (x64): firefox, lazarus, winmerge, 7z, notepad++
Files:        5
Directories:  2
Size:         291.381.288 bytes
Size on disk: 291.397.632 bytes
------------------------------------------------
    test6.7z -> time:    1266 ms
folder name: test6
folder content: installers (x64): firefox, lazarus, winmerge, 7z, notepad++
Files:        5
Directories:  2
Size:         291.381.288 bytes
Size on disk: 291.397.632 bytes
------------------------------------------------
   test6.zip -> time:    1797 ms
folder name: test6
folder content: installers (x64): firefox, lazarus, winmerge, 7z, notepad++
Files:        5
Directories:  2
Size:         291.381.288 bytes
Size on disk: 291.397.632 bytes

------------------------------------------------
  test7.pack -> time:    1250 ms
folder name: test7
folder content: another mssql db backup
Files:        1
Directories:  2
Size:         319.930.368 bytes
Size on disk: 319.934.464 bytes
------------------------------------------------
    test7.7z -> time:    1609 ms
folder name: test7
folder content: another mssql db backup
Files:        1
Directories:  2
Size:         319.930.368 bytes
Size on disk: 319.934.464 bytes
------------------------------------------------
   test7.zip -> time:    1609 ms
folder name: test7
folder content: another mssql db backup
Files:        1
Directories:  2
Size:         319.930.368 bytes
Size on disk: 319.934.464 bytes
Best regards / Pozdrawiam
paweld

O

  • New Member
  • *
  • Posts: 39
  • Creator of Pack
    • Pack
Re: Pack
« Reply #14 on: February 18, 2024, 02:30:17 pm »
Thank you, paweld, for the results. At this stage, curiosity is preferred, as I want feedback and potential issues to be fixed.
Did you run the tests multiple times, or was it on cold data? If it is cold, the first go is slower, because of OS, Disk and Antivirus.
And note that for unpacking or decompression, if you do it multiple times on NTFS, it gets tired, and shows slower results, so the last test shows a higher time.

 

TinyPortal © 2005-2018