Recent

Author Topic: Output of Random(SomeInteger) appears to be biased  (Read 4633 times)

Pangea

  • New member
  • *
  • Posts: 8
Output of Random(SomeInteger) appears to be biased
« on: July 24, 2019, 10:22:49 pm »
As suggested i open a forum thread for the problem discussed in https://bugs.freepascal.org/view.php?id=35878. If this is the wrong subforum, please move the thread.

Background information: I noticed that the output of the 64-bit random function appears to be biased towards smaller numbers. I originally thought, it would be enough to adapt the same trick the 32 bit implementation uses to the 64 bit function and opened the bug tracker entry. But this was a mistake, as this also seems to be insufficient. It is possible to construct two really simple programs to show that both (the 32 and the 64-bit version) do not produce unbiased output. I apologize for this confusion.

This is the program to reproduce this bug for the 64-bit version ("function Random(l:int64):int64;")
Code: Pascal  [Select][+][-]
  1. program random_test;
  2. {$mode objfpc}{$H+}
  3. {$APPTYPE CONSOLE}
  4. const
  5.    l: UInt64 = 6148914691236517205;
  6. var
  7.    s,n: UInt64;
  8.    i,j: UInt64;
  9. begin
  10.   WriteLn('Experiment:', LineEnding);
  11.   WriteLn(' Draw a natural number r from the intervall [0,l-1] and');
  12.   WriteLn(' increment a counter s when r < l div 2 is satisfied.');
  13.   WriteLn(' Repeat this step n times and calculate the ratio s/n.', LineEnding);
  14.  
  15.   WriteLn(' Expected ratio: ', (l div 2)/l:30, LineEnding);
  16.  
  17.   WriteLn('Input size n':16, 'Observed ratio s/n':30);
  18.   l := 6148914691236517205;
  19.   for j := 4 to 63 do
  20.   begin
  21.     n := (Int64(1) shl j);
  22.     s := 0;
  23.     for i := 0 to n-1 do
  24.       if Random(Int64(l)) < l div 2 then
  25.         Inc(s);
  26.     WriteLn( (UInt64(1) shl j):16, s/n:30);
  27.   end;
  28. end.
  29.  

And this is the code for the 32 bit version ("function Random(l:longint):longint;"):

Code: Pascal  [Select][+][-]
  1. program random_test;
  2. {$mode objfpc}{$H+}
  3. {$APPTYPE CONSOLE}
  4. var
  5.    s,n: longint;
  6.    i,j: longint;
  7.    l: LongInt;
  8. begin
  9.   l := 1227133513;
  10.  
  11.   WriteLn('Experiment:', LineEnding);
  12.   WriteLn(' Draw a natural number r from the intervall [0,l-1] and');
  13.   WriteLn(' increment a counter s when r mod 2 = 0 is satisfied.');
  14.   WriteLn(' Repeat this step n times and calculate the ratio s/n.', LineEnding);
  15.  
  16.   WriteLn(' Expected ratio: ', (l div 2)/l:30, LineEnding);
  17.  
  18.   WriteLn('Input size n':16, 'Observed ratio s/n':30);
  19.   for j := 4 to 31 do
  20.   begin
  21.     n := (LongInt(1) shl j);
  22.     s := 0;
  23.     for i := 0 to n-1 do
  24.       if Random(LongInt(l)) mod 2 = 0 then
  25.         Inc(s);
  26.     WriteLn( (LongInt(1) shl j):16, s/n:30);
  27.   end;
  28. end.

Both programs have an expectation value of 1/2, but the current random implementation does not converge to this.

My take on this matter: This is not a problem of the underlying random number generator, but the way the output range of the PRNG is mapped onto the smaller interval. Therefore it is completly possible for the output of this functions to be biased, while the PRNG reproduces the values of the reference implementation. The bias is not introduced in the generator, but in the map. I also think it is necessary to adapt rejection sampling to guarantee an unbiased result.

Feel free to tear this apart. : )

Edit: I tested the programs against version 3.0.4.




« Last Edit: July 24, 2019, 10:26:23 pm by Pangea »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9913
  • Debugger - SynEdit - and more
    • wiki
Re: Output of Random(SomeInteger) appears to be biased
« Reply #1 on: July 25, 2019, 01:08:50 am »
Actually you appear to be right.

From system unit:
Code: Pascal  [Select][+][-]
  1. function random(l:longint): longint;
  2. begin
  3.   { otherwise we can return values = l (JM) }
  4.   if (l < 0) then
  5.     inc(l);
  6.   random := longint((int64(cardinal(genrand_MT19937))*l) shr 32);
  7. end;
  8.  
  9. function random(l:int64): int64;
  10. begin
  11.   { always call random, so the random generator cycles (TP-compatible) (JM) }
  12.   random := int64((qword(cardinal(genrand_MT19937)) or ((qword(cardinal(genrand_MT19937)) shl 32))) and $7fffffffffffffff);
  13.   if (l<>0) then
  14.     random := random mod l
  15.   else
  16.     random := 0;
  17. end;
  18.  

genrand_MT19937 returns a random 32 bit value. Lets assume that part to be as correct as can be.


There is a generic problem.
If you do "Random(100)" you need to map the 2^32 values to 0..99. But 2^31 is not int dividable by 100. So some of 0..99 will be more likely.
Once you evenly distributed ‭4,294,967,200 out of the ‭4,294,967,296‬ values, you have 96 values left, to evenly distribute.
How to you map 96 values to 0..99 so that each of 0..99 is equally likely chosen?


This problem then manifests in both of the implementations
64bit:
    random := random mod l
means that  96 to 99 are slightly less likely.
(If it was QWord) With random (2^63 + n), the last 2*n values are only half as likely to occur.

32bit:
The numbers with different likelihood are equally spread across the requested range.

--
The issue is that any stable function to map the result of genrand_MT19937 to a given range will have the same problem.

You would need additional inputs to map them different (e.g counting how often "random" was called.)
But such additional input may be guess-able. And if it is part of the random result, then an attacker may be able to make predictions based on that.

Of course if an attacker knows your range, and that you use the fpc implementation, said attacker can make guesses too.

--
Since using "Mod" the bias on the numbers depends on how big the range is (relative to the max of genrand_MT19937 ), you can get a bigger random number, by calling genrand_MT19937  several times.
It still will be biased, but statistically less likely.
« Last Edit: July 25, 2019, 01:21:12 am by Martin_fr »

Thaddy

  • Hero Member
  • *****
  • Posts: 14393
  • Sensorship about opinions does not belong here.
Re: Output of Random(SomeInteger) appears to be biased
« Reply #2 on: July 25, 2019, 07:12:48 am »
It can be right, it can likely be solved by scaling back and forth by the forrmulea -slightly adjusted - I gave and its inverse. That will give a best fit but is slow. The algorithm itself, however is a standard.
Still: we can't change that, because that is how it worked since TP days.  Existing data will become unusable and that is not acceptable.
(I have quite a lot of such data dating as far back as its introduction in FreePascal. There are billions of such data sets, because the algorithm is so widely used in multiple languages.
My suggestion is, and the focus of a solution I will propose on the mailing list, is to overload and where not possible add differently named functionality.
Caveat is a platform without an FPU must support soft float and that is not always the case so that is frankly not a solution. I will investigate if it is possible to implement the first suggestion, but only for ranged randoms: these are not part of the actual standard. The good news is I have the 64 bit MT working and I am now testing if the output is equal to the original test data at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/mt19937-64.out.txt.
If that works I will work on the problem we discuss here so it can be solved for both 32 an 64 bit.
Reason is that I want the 64 bit version standards compliant first, then add it to my compendium and then propose a patch. Non- Xplatform solutions will not be considered by me.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

ASerge

  • Hero Member
  • *****
  • Posts: 2250
Re: Output of Random(SomeInteger) appears to be biased
« Reply #3 on: July 25, 2019, 08:01:37 am »
For those who want a quick workaround, I can suggest adding this to the beginning of the test:
Code: Pascal  [Select][+][-]
  1. function Random(X: SizeInt): SizeInt;
  2. begin
  3.   Result := Round(System.Random * X);
  4. end;

Pangea

  • New member
  • *
  • Posts: 8
Re: Output of Random(SomeInteger) appears to be biased
« Reply #4 on: July 25, 2019, 11:15:04 am »
For those who want a quick workaround, I can suggest adding this to the beginning of the test:
Code: Pascal  [Select][+][-]
  1. function Random(X: SizeInt): SizeInt;
  2. begin
  3.   Result := Round(System.Random * X);
  4. end;

This still fails when SizeInt has 64 bit and L is large. As a test please consider this adapted version of the 64-bit program (i am using the 64-bit compiler, so sizeint has 64-bit):

Code: Pascal  [Select][+][-]
  1. program random_test;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$APPTYPE CONSOLE}
  5.  
  6. function RoundRandom(x: SizeInt): SizeInt;
  7. begin
  8.   Result := round(x*Random);
  9. end;
  10.  
  11. var
  12.    s,n: Int64;
  13.    i,j: Int64;
  14.    l: Int64;
  15. begin
  16.   l := 6148914691236517205;
  17.  
  18.   WriteLn('Experiment:', LineEnding);
  19.   WriteLn(' Draw a natural number r from the intervall [0,l-1] and');
  20.   WriteLn(' increment a counter s when r mod 2 = 0 is satisfied.');
  21.   WriteLn(' Repeat this step n times and calculate the ratio s/n.', LineEnding);
  22.  
  23.   WriteLn(' Expected ratio: ', (l div 2)/l:30, LineEnding);
  24.  
  25.   WriteLn('Input size n':16, 'Observed ratio s/n':30);
  26.   for j := 4 to 31 do
  27.   begin
  28.     n := (Int64(1) shl j);
  29.     s := 0;
  30.     for i := 0 to n-1 do
  31.       if RoundRandom(Int64(l)) mod 2 = 0 then
  32.         Inc(s);
  33.     WriteLn( (Int64(1) shl j):16, s/n:30);
  34.   end;
  35. end.    

So here is an variant using rejection sampling (no floating point arithmetic, just reject the samples you cannot map correctly):

Code: Pascal  [Select][+][-]
  1.   function RandomRejectionSampling(L: UInt64): Uint64;
  2.   var
  3.     thr, rem: UInt64;
  4.   begin
  5.     if L = 0 then
  6.       Result := 0
  7.     else
  8.     begin
  9.       // We need to calculate High(UInt64)+1 mod L without using 128 bit arithmetic.
  10.       rem := (High(UInt64) mod L) + 1;
  11.       if rem = L then
  12.         rem := 0;
  13.       // Calculate the threshold
  14.       thr := (High(UInt64) - rem);
  15.       if thr < L-1 then
  16.         thr := L-1;
  17.       // Perform rejection sampling
  18.       repeat
  19.         Result := ((UInt64(GenRand32) shl 32) or UInt64(GenRand32));
  20.       until Result <= thr;
  21.       Result := Result mod L;
  22.     end;
  23.   end;  
  24.  

It's just the very basic idea, so there might be room for optimization.

Edit: Fixed a bug in the code.




« Last Edit: July 25, 2019, 12:09:17 pm by Pangea »

ASerge

  • Hero Member
  • *****
  • Posts: 2250
Re: Output of Random(SomeInteger) appears to be biased
« Reply #5 on: July 25, 2019, 11:25:45 am »
This still fails when SizeInt has 64 bit and L is large. As a test please consider this adapted version of the 64-bit program (i am using the 64-bit compiler, so sizeint has 64-bit):
And where fails? I've added my function to both your implementations. Everything was compiled and started without problems. And the distribution became about 0.5.

Pangea

  • New member
  • *
  • Posts: 8
Re: Output of Random(SomeInteger) appears to be biased
« Reply #6 on: July 25, 2019, 11:41:58 am »
It only produces even numbers for large L. You can test it against the 64 bit program when you change the condition to random(l) mod 2 = 0.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9913
  • Debugger - SynEdit - and more
    • wiki
Re: Output of Random(SomeInteger) appears to be biased
« Reply #7 on: July 25, 2019, 12:27:01 pm »
Quote
Code: Pascal  [Select][+][-]
  1.      // Perform rejection sampling
  2.       repeat
  3.         Result := ((UInt64(GenRand32) shl 32) or UInt64(GenRand32));
  4.       until Result <= thr;

Depending on the underlying random, you may have to reject a lot of numbers...

Also this makes the random function, take different amount of time depending on where in the underlaying random sequence it is. Are you sure this will not open the door for timing attacks?

Pangea

  • New member
  • *
  • Posts: 8
Re: Output of Random(SomeInteger) appears to be biased
« Reply #8 on: July 25, 2019, 01:38:45 pm »
I attached a plot showing the acceptance probability over the choice L. The acceptance should never be smaller than 1/2 (unless there is a bug).

My personal opinion on the timing part is: The underlying PRNG is not cryptographically secure in any way, so securing the random number generation against side channel attacks does not really help.

ASerge

  • Hero Member
  • *****
  • Posts: 2250
Re: Output of Random(SomeInteger) appears to be biased
« Reply #9 on: July 25, 2019, 01:47:38 pm »
It only produces even numbers for large L. You can test it against the 64 bit program when you change the condition to random(l) mod 2 = 0.
That's expected. The accuracy of fraction part for double is 52, for large numbers only the exponent is used.
There is no such problem for x32, of course.

Thaddy

  • Hero Member
  • *****
  • Posts: 14393
  • Sensorship about opinions does not belong here.
Re: Output of Random(SomeInteger) appears to be biased
« Reply #10 on: July 25, 2019, 04:03:21 pm »
My personal opinion on the timing part is: The underlying PRNG is not cryptographically secure in any way, so securing the random number generation against side channel attacks does not really help.
It is not secure, but it is compiled in B+ mode, so Boolean shortcuts have no timing influence because every expression is fully evaluated.. This was done on my request.
[edit]
This is not true for random, but for the hashes. But timing attacks can be resolved by B+.
« Last Edit: July 25, 2019, 04:24:28 pm by Thaddy »
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Pangea

  • New member
  • *
  • Posts: 8
Re: Output of Random(SomeInteger) appears to be biased
« Reply #11 on: July 25, 2019, 06:53:04 pm »
I don't think it is necessary to harden random against timing attacks. It is not designed to be used as cryptographic primitive.

I eliminated an unnecessary comparison and this is, what i would propose:
Code: Pascal  [Select][+][-]
  1.   function Random(L: UInt32): UInt32; overload;
  2.   var
  3.     thr, rem: UInt32;
  4.   begin
  5.     if L = 0 then
  6.       Result := 0
  7.     else
  8.     begin
  9.       rem := (UInt64(High(UInt32)) + 1) mod L;
  10.       thr := (High(UInt32) - rem);
  11.       repeat
  12.         Result := genrand_MT19937;
  13.       until Result <= thr;
  14.       Result := Result mod L;
  15.     end;
  16.   end;    
  17.  
  18.   function Random(L: UInt64): UInt64; overload;
  19.   var
  20.     thr, rem: UInt64;
  21.   begin
  22.     if L = 0 then
  23.       Result := 0
  24.     else
  25.     begin
  26.       // We need to calculate High(UInt64)+1 mod L without using 128 bit arithmetic.
  27.       rem := (High(UInt64) mod L) + 1;
  28.       if rem = L then
  29.         rem := 0;
  30.       // Calculate the threshold
  31.       thr := (High(UInt64) - rem);
  32.       repeat
  33.         Result := ((UInt64(genrand_MT19937) shl 32) or UInt64(genrand_MT19937));
  34.       until Result <= thr;
  35.       Result := Result mod L;
  36.     end;
  37.   end;
  38.  

However this does NOT behave like the old implementation. This is especially relevant for former negative input. If negative input is desired, maybe this is more appropriate (Also the code is simpler):

Code: Pascal  [Select][+][-]
  1. function Random(L: Int32): Int32; overload;
  2. const
  3.   UpperBound: UInt32 = (High(UInt32) shr 1);
  4.  var
  5.    thr, rem: UInt32;
  6.  begin
  7.    if L <= 0 then
  8.      Result := 0
  9.    else
  10.    begin
  11.      rem := UInt32(UpperBound+1) mod UInt32(L);
  12.      thr := (UpperBound - rem);
  13.      repeat
  14.        Result := genrand_MT19937 and UpperBound;
  15.      until UInt32(Result) <= thr;
  16.      Result := Result mod L;
  17.    end;
  18.  end;
  19.  
  20. function Random(L: Int64): Int64; overload;
  21. const
  22.   UpperBound: UInt64 = (High(UInt64) shr 1);
  23.  var
  24.    thr, rem: UInt64;
  25.  begin
  26.    if L <= 0 then
  27.      Result := 0
  28.    else
  29.    begin
  30.      rem := (UpperBound +1) mod UInt64(L);
  31.      thr := (UpperBound - rem);
  32.      repeat
  33.        Result :=  ((UInt64(genrand_MT19937) shl 32) or UInt64(genrand_MT19937)) and UpperBound;
  34.      until UInt64(Result) <= thr;
  35.      Result := Result mod L;
  36.    end;
  37.  end;    
  38.  

However this still behaves differently for neagtive L in the 64 bit case. Maybe it should be clarified in the documentation what happens when L is smaller or euqal to zero.

Edit: I read the old code wrong. If it is negative it should return a negative number in the respective range. So this is still insufficient.
Edit: It is too hot. I can't think clearly and i do not trust myself. Currently random(Int32(-5)) seems to return numbers in the interval [-4; 0], but random(Int64(-5)) returns numbers in the interval [0, 4]??? Can someone clarify this please?
« Last Edit: July 25, 2019, 07:17:23 pm by Pangea »

PascalDragon

  • Hero Member
  • *****
  • Posts: 5486
  • Compiler Developer
Re: Output of Random(SomeInteger) appears to be biased
« Reply #12 on: July 26, 2019, 09:23:42 am »
I don't think it is necessary to harden random against timing attacks. It is not designed to be used as cryptographic primitive.
Correct. That's not its purpose.

devEric69

  • Hero Member
  • *****
  • Posts: 648
Re: Output of Random(SomeInteger) appears to be biased
« Reply #13 on: July 26, 2019, 10:31:15 am »
If it's a question of having a real random number generator (a long sequence of numbers, stored and studied in a spreadsheet program, to see if it's roughly (no computer generator is perfect: it's a question of numbers generated, before it becomes biased. So, the real question is: does the count, of random numbers that I will use, remain more or less uniformly distributed in the considered range of random numbers that interests me, with this generator?) formally distributed according to a bell curve, over a given range of numbers, centered, and according to a normal or other statistical law), then it's preferable to use a more advanced algorithm like Tore, or another one.

My 2 cents.
« Last Edit: July 26, 2019, 11:03:02 am by devEric69 »
use: Linux 64 bits (Ubuntu 20.04 LTS).
Lazarus version: 2.0.4 (svn revision: 62502M) compiled with fpc 3.0.4 - fpDebug \ Dwarf3.

Pangea

  • New member
  • *
  • Posts: 8
Re: Output of Random(SomeInteger) appears to be biased
« Reply #14 on: July 27, 2019, 09:53:27 pm »
it's preferable to use a more advanced algorithm like Tore
I googled this algorithm and apparently this is a way to construct low discrepancy sequences by taking the fractional part of multiplies of irrational numbers (e.g. the square roots of primes). Is this correct?

I also found [1] and [2], both giving an overview over different sampling methods. I went ahead and implemented them together with a simple check, wether their output is really unbiased (a.k.a. i did not implement a bug). In my opinion the one using the intrinsic (no division) and the one with only a single division are really interesting.

[1]  https://arxiv.org/pdf/1805.10941
[2]  http://www.pcg-random.org/posts/bounded-rands.html

Edit: Removed source.
« Last Edit: July 28, 2019, 04:53:45 am by Pangea »

 

TinyPortal © 2005-2018