As alluded to in my previous post in this series, one of the most straight forward ways to test if a PRNG is generating good random sequences is by examining the frequency of 0's and 1's.  This is just a couple lines in Maple using Statistics:

(**) r1 := rand(0..1):L := [seq(r1(), i=1..10000)]:
(**) n := nops(L); tally := `+`(op(L));
(**) Statistics:-ChiSquareGoodnessOfFitTest(
[n-tally, tally], [n/2, n/2], ':-output'=':-hypothesis');

This is testing the Null Hypothesis that the observed sample does not differ from a uniform random sample. A return of "true" means that there is no evidence that the sample, L, is not random. The Statistics command used here will give more verbose output to this effect if you set

(**) infolevel[Statistics] := 1;

Of course, we have seen examples of PNRGs that are not very random, but still pass this simple frequency test and so we want other tests. As discussed in Monte Carlo Pi, good tests are often targeted towards how you intend to use your pseudo-random sequences. One way we use PRNGs in computer algebra and computational number theory is to generate random matrices and so it makes sense that computing the rank of matrices formed from random sequences is one of the tests in the DIEHARD test suit.

In order to use computing ranks in a statistical test of randomness, we should know the probability of an m by n matrix having rank r. This probability is easiest to derive recursively over GF2 (the integers modulo 2) and is:

2^(r*(n+m-r)-m*n)*product(((1-2^(l-n))*(1-2^(l-m)))/(1-2^(l-r)),l=0..(r-1))

And so, our test will be, for a given m and n, take our sequence of bits and form it into a collection of m by n matrices.  Then, compute the ranks of those matrices using ?LinearAlgebra,Modular,Rank and use ?Statistics,ChiSquareGoodnessOfFitTest with the probabilities above to test of the distribution of ranks is consistent with randomly chosen matrices.  Here is my procedure to do just that:

GF2RankProbability := (m,n,r) -> 2^(r*(n+m-r)-m*n)*
mul(((1-2^(l-n))*(1-2^(l-m)))/(1-2^(l-r)),l=0..(r-1));
BinaryRank := proc(L::list, {dimension::{posint,list}:=16, level::numeric:=0.05}, $)
local m, n, N, ranks, r, expected, rcutoff, l;
if type(dimension,'list') then
m := dimension[1];
n := dimension[2];
else
m := dimension;
n := dimension;
end if;

r := min(m,n);

N := floor( nops(L) / m / n );

if N < 1 then
error "not enough bits to form a matrix of the desired size";
end if;

ranks := [seq(
LinearAlgebra:-Modular:-Rank(2,
Matrix(m,n, (j,k)->L[m*n*(i-1)+(j-1)*n+k], datatype=float[8])
),
i = 1..N )]:

ranks := table(Statistics:-Tally(ranks),sparse);

expected := [seq(evalhf(GF2RankProbability(m,n,i)), i=0..r)];

rcutoff := min(r, nops(select(`>`, expected, evalf( 0.05 / log[10](N) ) )));

expected := [ evalhf(N*add(GF2RankProbability(m,n,i), i=0..r-rcutoff)),
seq(1.*N*expected[i], i=r-rcutoff+2..r+1)];

ranks := [evalf(add(ranks[i],i=0..r-rcutoff)),
seq(1.*ranks[i],i=r-rcutoff+1..r)];

return Statistics:-ChiSquareGoodnessOfFitTest( ranks, expected,
output=[hypothesis,pvalue], ':-level'=level );

end proc: # BinaryRank

Consistent with DIEHARD, this routine looks at the individual frequencies of ranks likely to occur, and lumps all the rest into the same bucket. So, looking at 16 by 16 matrices, we have buckets for ranks 16, 15, 14, and 13 individually and everything else together. For 100 matrices, we expect to get no matrices below rank 13, 1 of rank 13, 13 of rank 14, 58 of rank 15, and 28 of rank 16.

To get significant results, I found you need to look at a lot of bits, so here are some trials of this test on sequences of 10^6 bits. First, the Mersenne Twister generator in Maple:

(**) r1 := rand(0..1): L1 := [seq(r1(), i=1..1000000)]:
(**) BinaryRank(L1);
# [(14.)::(20.6430562981839252), (480.)::(501.3208333), (2301.)::(2256.012599), (1111.)::(1128.023512)]
# true, 0.2408079888

Here I am printing out the (actual number of matrices)::(expected number) for each bucket before the output from Statistics: truefalse, pvalue. Remember "false" means probably not random and the pvalue descibes how significant that answer is, closer to 0 mean less likely to be random.

Second, the biased coin flip:

(**) r2a := rand(0..2): r2 := ()-> r2a() mod 2: L := [seq(r2(), i=1..1000000)]:
(**) BinaryRank(L);
# [(32.)::(20.6430562981839252), (529.)::(501.3208333), (2288.)::(2256.012599), (1057.)::(1128.023512)]
# false, 0.0053281097

And, finally, the bad linear congruence from the previous posts:

(**) LCState := 1: r3 := proc() global LCState;
(**) LCState := irem(89853*LCState, 50027); irem(LCState, 2); end proc:
(**) L := [seq(r3(), i=1..1000000)]:
(**) BinaryRank(L);
# [(0.)::(20.6430562981839252), (501.)::(501.3208333), (2283.)::(2256.012599), (1122.)::(1128.023512)]
# false, 0.0001053637

As always, the computations are in this worksheet: BinaryRank.mw. Also tested in that worksheet are: the digits of Pi, RandomTools:-LinearCongruence, and RandomTools:-QuadraticCongruence all three of which passed.

Please Wait...