I am currently collating a few 32bit pseudo random generators (PRNGs). My focus is on speed, small state and good statistical quality. I came across a scheme which is, I believe, not so widely known: invertible random mapping. An interesting topic I would like to share with those interested in PRNGs.
What is invertible mapping ? Here is an example: two variables somehow rotated and mutually added up. It's not quite straighforward to see what the code really does; it is in fact a simple PRNG, extremely fast (0.49ns in a microbenchmark, 3x faster than xor32 on my PC).
function Next: dword;
const
A: dword = 1;
B: dword = 2;
begin
B := rordword (B,13) + A;
A := rordword (a,25) - B;
result := A;
end;
"Invertible" means that the function can be written to run backwards, producing the exact same sequence just in opposite direction. Therefore each sequence must form a closed ring without branching. In contrast, Von Neumann's old middle-square method is a non-invertible mapping: numbers may have several predecessors, there is no unique path backwards, but many branches and a rather high chance to end up in short cycles.
What is the period of this generator ? It depends on the seed. A 16-bit Version allows to measure the period for all possible seeds simply by counting. Here is the result:
3613707346
655021847
12877974
9726090
2549931
682853
340660
47897
6309
2931
2213
782
233
166
39
14
8
1
So there are 18 possible cycles in total. It only depends on the initial seed which one will run. Each of the 2
32 seeds must be part of exactly one cycle, i.e. the sum of the cycle lengths must be 2
32. The cycles are strictly separate, there is no branching to another cycle; the invertibility does not allow that.
Obviously the smallest cycle length is 1 i.e. with bad luck in seeding, the generator may get stuck, or output the same few numbers forever. We can see from the table that the probability to select exactly the catastrophic seed is 2
-31 (16-bit Version), while the chance to choose a good seed on the largest ring is 3613707346 / 2
32 = 0,84. This depends somewhat on the ror constants, but as a general rule we find that:
- The expected period is approx 2bits-1, i.e. half of all states lie on the largest ring, a quarter on the next smaller ring etc. Approximately ! This can vary depending on the ror constants.
- The risk of a period collapse by factor r in relation to the expected period is 1/r.
The academic literature (Flajolet / Niederreiter, Random Mapping Statistics) is way above my head but for a very understandable summary see
https://www.pcg-random.org/posts/random-invertible-mapping-statistics.html.
I found it a surprise that this generator in the 32bit Version passes the Rabbit tests up to at least 1GB stream size which is far more than can be said for many of the old designs. It is not perfect, of course: If you make up a stream of pairs of output dwords xored together, the tests will fail. No surprise, the xor partly undoes the simple mixing; this could be prevented by measures such as adding a final multiplication by a constant, but that's not the issue here. The major problem of this generator is the risk of small cycles.
And this is systemic. For this type of generators, only probabilistic assertions on the period are possible. This is the main difference to virtually all classic generators such as LCG, MWC, Tausworthe, Mersenne-Twister etc., and also the modern xoroshiro Class. Their period is calculable and guaranteed by number theory. While this is of course a desirable property, it is not alone sufficient to make a good generator. 30 years ago, when the randomness test suites came up, many of the older designs, however firmly based on mathematical foundations, were discovered to have serious deficiencies.
In random mapping generators, we can luckily reduce the risk of small cycles simply by increasing the state. Bob Jenkins "small fast" generator (
www.burtleburtle.net), AFAIK the first published generator based on random mapping, uses 4x32 bit. Melissa O'Neill has tested it extensively; she found no flaws, and a vanishing risk of small cycles, in her words "It's more likely that he would have won the lottery and then been immediately struck by lightning than that he happened to make jsf32's seeding method accidentally capable of hitting a short cycle." See
https://www.pcg-random.org/posts/bob-jenkins-small-prng-passes-practrand.html.
Still, one may feel uneasy with the risk of small cycles, so let's make a small modification: We simply weave in an independent counter like here:
function Next: dword;
const
A: dword = 1;
B: dword = 2;
counter: dword = 0;
begin
inc (counter);
B := A + rordword (B,13);
A := rordword (A,25) - b xor counter;
result := A;
end;
The counter safely prevents periods shorter than 2
32. For this minimum period to happen, A and B would need to have the exact same values when the counter started and when it reaches zero again, a chance of 2
-64. The counter has additional benefits, it also increases the expected period by 2
32 i.e. it acts similar to an extra state variable.
The idea of adding a counter (a "Weyl sequence") goes back to Chris Dotey-Humphrey, the author of practrand. He says on his generators (SFC for "small fast chaotic") : "The sfc* RNGs are the fastest of the recommended RNGs and one of the smallest. The 32 and 64 bit variants have no known drawbacks, though the 16 bit variant is considered inadequate for parallel uses."
https://pracrand.sourceforge.net/RNG_engines.txtThere are a few extra points worth mentioning:
If you want to enforce a larger minimum period, then instead of a 32 bit counter use 64 bit, or, why not, a conventional PRNG e.g. an MWC with period around 2
63 as input. The invertible mapping step will increase the period of the "input" generator by factor 2
bits-1 (expected), and at the same time cancel out the input generator's statistical deficiencies. Plausible, where even a simple counter as input results in a statistically good output stream. The mapping step acts similar to a hash function.
Sometimes e.g. for massive parallel simulations, you want independent random streams without overlapping, to avoid bias. I believe (no proof!) it will suffice to use different counter increments for each generator instance. The increment should be odd of course, which allows for 2
31 independent and uncorrelated instances.
The generator may be extended to arbitrarily large period, simply by replacing one state variable by an array:
function Next: dword;
var i: dword;
const
state: array [0..7] of dword = (0,1,2,3,4,5,6,7);
B: dword = 42;
counter: dword = 0;
begin
inc (counter);
i := counter mod length (state);
B := state[i] + rordword (B, 13);
state[i] := rordword (state[i], 25) - B xor counter;
result := state[i];
end;
The state can be made any size without changing anything else. For performance reasons, it should be of size 2N so the compiler can optimise away the mod operation. When testing such a generator, mind that a large state is not a guarantee for good statistical quality; even if testing shows no flaws, a large state may well conceal systemic flaws from the tests. I believe this generator is quite good but I would not be surprised if it failed tests on a larger scale than I can run. After all, the mixing is not too strong, only two stages; for demanding applications, I would recommend to add a third mixing stage. This is another point I like: The concept allows to choose a tradeoff between quality and speed.
On the ror / rol constants: I chose them empirically for best mixing; Bob Jenkins' rngav.c shows how to do it: In principle, you run two generators with seeds differing only in a single bit, see how the outputs diverge after a few runs, and choose those constants which result in maximum difference. The code is small, I translated it to pascal but this is so messy I don't want to post it. The choice of constants is not too critical anyway, as long as you avoid the trivial values like 0 and 32, and probably 1,31, 16.
Final remark: These codes are just meant to explain the principle. For practical use I made a small library including entropy collector for randomisation and a proper seeding algorithm but it is still in work.