@x2nie
I cannot use anything else then HEX, I must keep the compatibility with the older version.
Very nice, so you will only deal with the earlier "word" problem (16 bit fixed length), then.
Okay, the next idea is what Zip does (at least AFAIK).
I didn't really know what are deflate, inflate, etc. But, they will help you more in detailed technique.
Here we are talking about
get benefit of such pattern of bit:
1. we can split all bits into groups.
let say, you can split each of 4 bits: 1010 1100 1100 0010 // = 4 group of 4bit
or 5 bits : 1 01011 00110 00010 // = 4 group of 5 bit
(or whatever as needed later).
2. make constants of known pattern. Below pattern should be applied if you choose 5bit of pattern length.
patternFirst = {'00001'} $1;
pattern02 = {'00010'} $2; etc...
3. check if a pattern is repeated twice or more.
save it in a few bits. let say this info need 2 bit
this way, this additional repetation bit info will decrease the maximum length (16 - 3 = 13)
but you can play with this yet unused 13 bit.
4. the amount of known pattern will take place into account.
let say your known pattern constant amount is 32 { = 5bit}, you have 13 - 5 = 8 bit unused.
let take all together:
* picked pattern = 3 group of 5 bit each.
* pattern repetitive ................ = 3bit
* known pattern = 32 pattern = 5bit
-----------------------------------------------+
* used bit=...............................= 8
* total bit ................................= 16
---------------------------------------------- -
* unused bit.............................= 8
so for such actual number [00010 00010 00010 00010], will be stored as [ 011 00010]
explanation right to left:
00010 = 2 = second pattern = 00010
011 = 3 = the '00010' repeated three times: as first and second and 3rd
let name it SCHEME#1
Well, this situation:
+ you have a range of well recognized (parse-able) bits.
+ you have a large (

unused bit
+ you lost another range of non-parse-able 16bit numbers.
Nice, 8 used bit + 8 unused bit. !
The trick is we use 8 unused bit for our other format.
Remember, above is just first format.
8 unused bit = 256 format.
(How complex it would be?)
You need to repeat the first step for creating another new scheme.
In real world, it wouldn't be that large, maybe you only need 16 scheme (4bit).
each scheme can have different parsing (bit allocation).
If so, there are only 4bit fixed being allocated for scheme flag for any 16bit numbers.
other than this 4bit is parsed depends of scheme used.
in this stage, before parse anything, you need to check this 4bit.
Let say the first 4 bit value = $0000; the you do above (SCHEME#1) parsing.
if the first 4 bit = $0001; then you do SCHEME#2
Scheme#2, may use different group & split of the rest bit.
Scheme#3, will use completley different calculation
Scheme#n may use constant of 16bit values.
At this level,
for such actual number
[00010 00010 00010 00010],
will be stored as
[011 00010 0000]explanation right to left:
0000 = SCHEME#1
00010 = 2 = second pattern = 00010
011 = 3 = the '00010' repeated three times: as first and second and 3rd
If you aren't satisfied with this stupid result, you can iterate through all available scheme, finding which result among them is less significant (less bit needed).
So you may compare them to get the best result.
It is specially needed to reduce bits needed, and in the same time to increase the amount of parse-able numbers.
-------------
That's all what I know about compression.
Remember, you have 16 SCheme (if you choose 4bit). So use your imagination to fill that very large possibilities.
Well, is it proof of concept? Yes. it is.
In the Scheme#1, it really take benefit of such pattern existed in bits value.
In the end, you will need to find which 16bit values are not catched by any schemes.
in this situation, your last scheme will just a map headed of one to one between number as index and the value it should return.
I think you need to learn deeply of what zip did, it is open-source, AFAIK. so good luck!