mmcdara

Ask a Question

Create a Post

22 Badges

Member for: 9 years, 219 days

Contact mmcdara

MaplePrimes Activity

These are Posts that have been published by mmcdara

Are Maple's pseudo random number generat...

Posted: mmcdara 7896

December 04 2019

2 3

I'm particularly interested in data analysis and more specifically in statistical analysis of computer code outputs.

One of the main activity of this very broad field is named Uncertainty Propagation. In a few words it consists in perturbing the inputs of a computational code in order to understand (and quantify) how these perturbations propagates through the outputs of this code.

At the core of uncertainty propagation is the ability to generate large numbers of "random" variations of the inputs. Knowing that these entries can be counted in tens, one sees that the first problem consists in generating "random" points in a space of potentially very large dimension.

Even among my mathematician colleagues an impressive number of them is completely ignorant of the way "random" numbers are generated. I guess that a lot of Mapleprimes' users are too. My purpose is not to give a course on this topic and the affording litterature is vast enough for everyone interested might find informations of any level of complexity.
Among those who have some knowledge about Pseudo Random Numbers Generators (PRNG), only a few of them know that a PRNG has to pass severe tests ("tests of randomness") before the streams of number it generates might be qualified as "reasonably random" and therefore this PRNG might be released.

One of most famous example of a bad PRNG is given by "randu" (IBM 1966, and probably used in Fortran libraries during more than 30 years), this same PRNG that Knuth qualified himself as the "infamous generator".

These tests of randomness are generally gathered in dedicated libraries and Diehard is probably tone of the most known of them.
Diehard has originally been developed by George Marsaglia more than twenty years ago and it's still widely ued today.

I recently decided, not because I have doubts about the quality of the work done by Maplesoft, to test the Maple's PRNG named "Mersenne Twister". First, because it can do no harm to publish quantitative information that allows everyone to know that it is using a proven PRNG; second, because the (very simple) approach used here can fill the gaps I have mentioned above.

Mersenne Twister (often dubbed mt19937) is considered as a very good PRNG; it is used in a lot of applications (including finance where it is not so rare to sample input spaces of dimensions larger than 1000... ok I know, mt19937 is often considered as a poor candidate for cryptography applications, but it's not my concern here).

I have thus decided to spend some time to run the Diehard suite of tests on a sequence of integers numbers generated by RandomTools[MersenneTwister].

restart:

DIEHARD tests suite for Pseudo Random Numbers Generators (PRNG)

Reference: http://webhome.phy.duke.edu/~rgb/General/dieharder.php

The installation procedure (Mac OSX) can be found here
    https://gist.github.com/blixt/9abfafdd0ada0f4f6f26
or here
    http://macappstore.org/dieharder/

For other operating systems, please search on the web pages.

dieharder [-h]   # for inline help
dieharder -l      # to get the lists all the avaliable tests

A description of the many tests can be found here:
    https://en.wikipedia.org/wiki/Diehard_tests
    https://sites.google.com/site/astudyofentropy/background-information/the-tests/dieharder-test-descriptions
    https://www.stata.com/support/cert/diehard/randnumb_mt64.out

General theory about PRNG testing can be found here (a reference among many):
    http://liu.diva-portal.org/smash/get/diva2:740158/FULLTEXT01.pdf

or here (more oriented to the NIST test suite)
    https://www.random.org/analysis/Analysis2005.pdf
    https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-22r1a.pdf

In a terminal window execute the following commands for an exhaustive testing ("-a" option).
The "-g 202" option means that the generator is replaced by a text format input file
(use dieharder -h for more details).

cd //..../Desktop/DIEHARD

dieharder -g 202 -f SomeAsciiFile -a > //..../Desktop/DIEHARD/TheResultFile.txt

Be carefull, the complete testing takes several hours (about 5 on my computer)

__________________________________________________________________________________

Maple's Mersenne Twister Generator

Maple help page : RandomTools[MersenneTwister][GenerateInteger]
(see rincluded references to the Mersenne Twister PRNG).

Note: in the sequel this generator will be dubbed mt19937

The Mersenne Twister is implemented in many softwares.
It is higly likely that this PRNG (and the others these softwares propose) have been intensively
tested with one of the existing PRNG testing libraries.
Unfortunately only a few editors have made public the results of these tests (probably because
the implementation in itself is rarely questioned... but a code typo is always a possibility).

One exception is ths software STATA.
A summary of the results can be found here
https://www.stata.com/support/cert/diehard/.
A complete description of the results of the tests passed is given here
https://www.stata.com/support/cert/diehard/randnumb_mt64.out

The classical pattern of the performances of mt19937 can be found here

http://www2.ic.uff.br/~celso/artigos/pjo6.ps.

and the table below comes from it (P means "Passed", F means "Failed"):

____________________________________________________________________________

In the Maple code below, a sequence of N UnsignedInt32 numbers is generated from the
Maple's Mersenne Twister and the result is exported in an ASCII file.
The Seed is set to 1 (SetState(state=1)) to compare, with a small value of N (let's say N=10)
the sequence produced by Maple's mt19937 with the the sequence of the same length generated
by Diehard's mt19937.
To generate this later sequence and save it in file Diehard_mt19937, just run in a terminan window
the command (-S 1 means "seed = 1", -t 10 means "a sequence of length 10"):
dieharder -S 1 -B -o -t 10 > Diehard_mt19937

About the value of N:

In http://webhome.phy.duke.edu/~rgb/General/dieharder.php it's recommend that N be at least
equal to 2.5 million; STATA used N=3 million.
Other web sources say this value is too small.
For N=10 million the Maple's mt19937 doesn't pass the tests successfully.
I used here N=50 million (the resulting ASCII file has size 537 Mo).

Name of the input file.

The file generated by Maple is named Maple_mt19937_N=5e7.txt

One important thing is the preamble of a licit input file.

This preamble must have 6 lines (the value 10 right to count must be set to the value of N).
A licit preamble is of the form.

#==================================================================

# some text indicating the generator used

#==================================================================

type: d

count: 10

numbit: 32

As Maple_mt19937_N=5e7.txt is generated from an ExportMatrix command, this preamble is added
by hand.

Running multiple Diehard tests

To run the same tests used to qualify STATA's Mersenne Twister, open a terminal window,
go to the directory that contains input file Maple_mt19937_N=5e7.txt and run this script:

for i in {0,1,2,3,4,8,9,10,11,12,13,14,15,16}; do

dieharder -g 202 -f Maple_mt19937_N=5e7.txt -d $i >> Diehard___Maple_mt19937_N=5e7

done ;

The results are then forked in the ASCII file Diehard___Maple_mt19937_N=5e7

>	with(RandomTools[MersenneTwister]):

>	dir := cat("/", currentdir(), "Desktop/DIEHARD/"): InputFile := cat(dir, "Maple_mt19937_N=5e7.txt"):

>	SetState(state=1); N := 5*10^7: st := time(): S := convert([seq(GenerateUnsignedInt32(), i=1..N)], Matrix)^+; time()-st;

$S := Vector(4, {(1) = ` 50000000 x 1 `*Matrix, (2) = `Data Type: `*anything, (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})$

(1)

>	st := time(): ExportMatrix(InputFile, S, format=rectangular, mode=ascii); time()-st;

(2)

Diehard's results

Full test suite (about 5 hours of computational time)

Command :
dieharder -g 202 -f Maple_mt19937_N=5e7.txt -a > Diehard___ALL___Maple_mt19937_N=5e7

The results are compared to those obtained for Diehard's mt19937.
Two ways are used :

  - 1 - In a first stage one generates a stream of PRN and store it in an ASCII file (just as we did with Maple).
         The whole suite of tests is then run on this file.
         Commands (-g 013 codes for mt19937):

         dieharder -S 1 -g 013 -o -t 50000000 > Diehard_mt19937_N=5e7.txt
         dieharder -g 202 -f Diehard_mt19937_N=5e7.txt -a > Diehard___ALL___Diehard_mt19937_N=5e7

  - 2 - The whole suite is run by invoking directectly mt19937 "online"
         Commands :
         dieharder -S 1 -g 013 -t 50000000 -a > Diehard___ALL___Online

A UNIX diff command has been used to verify that the two files Maple_mt19937_N=5e7.txt and
Diehard_mt19937_N=5e7.txt were identical (thet were).

Note that the Diehard doens't responds identically depending on the stream of random numbers comes from a file
or is generated online (this last [- 2 -] situation seems to give better results).-

Résumé (114 tests):
   - * - Maple's and Diehard's mt19937 respond exactly the same way when the stream of random
          numbers is read from an ASCII file (8 tests failed (******) and 6 weak (**)).
   - * - Diehard's mt19937 fails 0 test and is weak on 4 tests when the stream is generated online

restart:

dir := currentdir():
FromMapleFile     := cat(dir, "Diehard___ALL___Maple_mt19937_N=5e7"):
FromDiehardFile   := cat(dir, "Diehard___ALL___diehard_mt19937_N=5e7"):
FromDiehardNoFile := cat(dir, "Diehard___ALL___Online"):

printf("                           ======================|======================|======================|\n"):
printf("                          |   From Maple's file | From Diehard's File | Diehard online test |\n"):
printf("==========================|======================|======================|======================|\n"):
printf("          test       ntup | p.value   Assessment | p.value   Assessment | p.value   Assessment |\n"):
printf("==========================|======================|======================|======================|\n"):

for k from 1 to 9 do
  LMF := readline(FromMapleFile):
  LDF := readline(FromDiehardFile):
  LDNF := readline(FromDiehardNoFile):
end do:

while LMF <> 0 do
  if StringTools:-Search("|", LMF) > 0 then
    res := StringTools:-StringSplit(LMF, "|")[[1, 2, 5, 6]];
    printf("%-20s %3d | %1.7f ", res[1], parse(res[2]), parse(res[3]));
      if StringTools:-Search("WEAK" , res[4]) > 0 then printf("    **     |")
    elif StringTools:-Search("FAILED", res[4]) > 0 then printf(" ******   |")
    else printf(" PASSED   |")
    end if:
  end if:
  LMF := readline(FromMapleFile):

  if StringTools:-Search("|", LDF) > 0 then
    res := StringTools:-StringSplit(LDF, "|")[[5, 6]];
    printf(" %1.7f ", parse(res[1]));
      if StringTools:-Search(" WEAK" , res[2]) > 0 then printf("     **    |")
    elif StringTools:-Search(" FAILED", res[2]) > 0 then printf("   ****** |")
    else printf("   PASSED |")
    end if:
  end if:
  LDF := readline(FromDiehardFile):

  if StringTools:-Search("|", LDNF) > 0 then
    res := StringTools:-StringSplit(LDNF, "|")[[5, 6]];
    printf(" %1.7f ", parse(res[1]));
      if StringTools:-Search("WEAK" , res[2]) > 0 then printf("     **    |")
    elif StringTools:-Search("FAILED", res[2]) > 0 then printf("   ******    |")
    else printf("   PASSED |")
    end if:
    printf("\n"):
  end if:
  LDNF := readline(FromDiehardNoFile):

end do:

Download DIEHARD_test_of_MAPLE_MersenneTwister.mw

A lot of supplementary details are given in the attached file.
I let the readers discover by themselves if Maple's implementation of the Mersenne Twister PRNG is correct or not.
Beyond this exercise, I hope this work will be useful to people who could be tempted to test their own generator.

Representing a hierarchical table as a t...

Posted: mmcdara 7896 Product: Maple

November 08 2019

2 0

In the applications I am working on, the information are often represented by hierarchical tables (that is tables where some entries can also be tables, and so on).
To help people to understand how this information is organized, I have thought to representent this hierarchical table as a tree graph.
Once this graph built, it becomes very simple to find where a "terminal leaf", that is en entry which is no longer a table, is located in the original table (by location I mean the sequence of indices for which the entry is this "terminal leaf".

The code provided here is pretension free and I do not doubt a single second that people here will be able to improve it.
I published it for i thought other people could face the same kind of problems that I do.

restart

>	with(GraphTheory): interface(version);

(1)

gh := proc(T)
  global s, counter, types:
  local i:
  if type(T, table) then
    for i in [indices(T, nolist)] do
      if type(T[i], table) then
         s := s, op(map(u -> [i, u], [indices(T[i], nolist)] ));
      else
         counter := counter+1:
         types   := types, _Z_||counter = whattype(T[i]);
         s       := s, [i, _Z_||counter];
      end if:
      thisproc(T[i]):
    end do:
  else
    return s
  end if:
end proc:

>	t := table([a1=[alpha=1, beta=2], a2=table([a21=2, a22=table([a221=x, a222=table([a2221={1, 2, 3}, a2222=Matrix(2, 2), a2223=u3, a2224=u4])])]), a3=table([a31=u, a32=v])]); global s, counter, types: s := NULL: counter := 0: types := NULL: ghres := gh(t): types := [types]:

t := table([a1 = [alpha = 1, beta = 2], a3 = table([a32 = v, a31 = u]), a2 = table([a22 = table([a222 = table([a2222 = (Matrix(2, 2, {(1, 1) = 0, (1, 2) = 0, (2, 1) = 0, (2, 2) = 0})), a2223 = u3, a2221 = {1, 2, 3}, a2224 = u4]), a221 = x]), a21 = 2])])

(2)

These 3 lines determine the set of edges of the form ['t', v], that are not been captured by procedure h.
They correspond to "first level" indices of table t (v in {a1, a2, a3} in the example above)

>	L := convert(op~(1, [ghres]), set): R := convert(op~(2, [ghres]), set): FirstLevelEdges := map(u -> ['t', u], L union R minus R):

Complete the set of the edges, build the graph representation TG of table t and draw TG.

>	edges := convert~({ghres, FirstLevelEdges[]}, set): TG := Graph(edges): HighlightVertex(TG, Vertices(TG), white): p := DrawGraph(TG, style=tree, root='t'):

The first line is used to change the the "terminal leaves" of names _Z_n by their type.

>	eval(t); p := subs(types, p): enlarge := plottools:-transform((x,y) -> [3*x, y]): plots:-display(enlarge(p), size=[1000, 400]);

table([a1 = [alpha = 1, beta = 2], a3 = table([a32 = v, a31 = u]), a2 = table([a22 = table([a222 = table([a2222 = (Matrix(2, 2, {(1, 1) = 0, (1, 2) = 0, (2, 1) = 0, (2, 2) = 0})), a2223 = u3, a2221 = {1, 2, 3}, a2224 = u4]), a221 = x]), a21 = 2])])

This procedure is used to find the "indices path" to a terminal leaf.
FindLeaf is then applied to all the terminal leaves.

FindLeaf := proc(TG, leaf)
   local here:
   here := GraphTheory:-ShortestPath(TG, 't', leaf)[1..-2]:
   here := cat(convert(here[1], string), convert(here[2..-1], string)):
   here := StringTools:-SubstituteAll(here, ",", "]["):
   here := parse(here);
end proc:

# where is a2221

printf("%a\n", FindLeaf(TG, a2221));

t[a2][a22][a222]

Download Table_Unfolding_2.mw

An improved approximation of the Inverse...

Posted: mmcdara 7896 Product: Maple

August 15 2019

3 20

Seeking for fast approximate formulas to compute (a huge number of) quantiles of a Gaussian random variable (here the standard one, but its extension to any Gaussian RV is straightforward), I found a few of them in the Abramowitz and Stegun book, page 933, relations 26.2.22 and 26.2.23.
Each approximation model is expressed as a rational fraction, the second one being the more accurate.
Each model depends on (respectively 4 and 6) parameters that are estimated (I guess it was done this way) through a least-square-like method.

See here for an online access http://people.math.sfu.ca/~cbm/aands/page_933.htm.

These approximation, and specially the most accurate one (formula 26.2.23) seem to be still widely used today⁽¹⁾ (see for instance https://www.johndcook.com/blog/normal_cdf_inverse/ ).

As an amusement I decided to compute the best fit by using the Statistics:-NonLinearFit procedure and a sample of (probability, quantile) points where probability ranges in [0.5, 1-1/1000] (the range used in formulas 26.2.22 and 26.2.23 is (0, 0.5] but this is not a point).
Surprisingly Statistics:-NonLinearFit returned, for the two formulas, parameter estimations substantially different from the one given in the Abramowitz & Stegun's book. A reason could be that the points I used when I did the fits weren't the one they used (unfortunately they give no informations about this).

More interesting, whatever the formula I refitted, NonLinearFit produced an approximation whose the absolute error was smaller by about two orders of magnitude to the onesprovided by Abramowitz and Stegun.
For instance they wrote that the most accurate formula (26.2.23) had an absolute approximation error less than 4.5*10^-4as I obtained a value around 10^-6!

(1) To get an idea of the persistence of the use of the formula 26.2.23, just type the value 2.515517 of its parameter c[0] in any search engine.

In the plots below the gray rectangle refers to the region where the approximate ICDF is used for extrapolation.

restart:

>	with(Statistics):

>	cdf := unapply(evalf(CDF(Normal(0, 1), x)), x): X := [seq(0..5, 0.1)]: A := cdf~(X): T := alpha -> sqrt(-2*log(1-alpha)): q := Quantile~(Normal(0, 1), A): Aq := convert([A,q], Matrix)^+:

r := 1:

J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)):

model := J(T(alpha)):
NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha);

# these lines are for estimating the performances
B := Sample(Uniform(0.5, 1), 10^4):
CodeTools:-Usage(Quantile~(Normal(0, 1), B)):
CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)):
CodeTools:-Usage(NL_fit~(B)):
#-----------------------------------------------------
Y := [seq(0..6, 0.01)]:
B := cdf~(Y):
R1 := Quantile~(Normal(0, 1), B, numeric):
R2 := NL_fit~(B):

plots:-display(
ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

proc (alpha) options operator, arrow; (-2*ln(1-alpha))^(1/2)-(HFloat(2.5454311687345044)+HFloat(0.8058592540791468)*(-2*ln(1-alpha))^(1/2))/(1+HFloat(1.4689746699940707)*(-2*ln(1-alpha))^(1/2)-HFloat(0.34455942407858625)*ln(1-alpha)) end proc

memory used=170.31MiB, alloc change=76.01MiB, cpu time=3.06s, real time=3.05s, gc time=54.87ms

memory used=171.59MiB, alloc change=256.00MiB, cpu time=3.12s, real time=3.03s, gc time=154.77ms

memory used=8.24MiB, alloc change=0 bytes, cpu time=95.00ms, real time=95.00ms, gc time=0ns

r := 2:

J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)):

model := J(T(alpha)):
NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha);

# these lines are for estimating the performances
B := Sample(Uniform(0.5, 1), 10^4):
CodeTools:-Usage(Quantile~(Normal(0, 1), B)):
CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)):
CodeTools:-Usage(NL_fit~(B)):
#-----------------------------------------------------

Y := [seq(0..6, 0.01)]:
B := cdf~(Y):
R1 := Quantile~(Normal(0, 1), B, numeric):
R2 := NL_fit~(B):

plots:-display(
ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

proc (alpha) options operator, arrow; (-2*ln(1-alpha))^(1/2)-(HFloat(2.9637294443959394)+HFloat(4.527738737327481)*(-2*ln(1-alpha))^(1/2)-HFloat(0.9571637188191973)*ln(1-alpha))/(1+HFloat(3.472400103322335)*(-2*ln(1-alpha))^(1/2)-HFloat(3.426536241250657)*ln(1-alpha)+HFloat(0.08875278252087411)*(-2*ln(1-alpha))^(3/2)) end proc

memory used=170.09MiB, alloc change=32.00MiB, cpu time=3.29s, real time=3.11s, gc time=268.60ms

memory used=170.85MiB, alloc change=0 bytes, cpu time=3.23s, real time=3.10s, gc time=201.52ms
memory used=10.76MiB, alloc change=0 bytes, cpu time=127.00ms, real time=127.00ms, gc time=0ns

# Optimized "r=2" computation

z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0:
z_fit := unapply(convert~(%, horner), z);

p := proc(alpha)
  local z:
  z := sqrt(-2*log(1-alpha)):
  z_fit(z):
end proc:

R3 := CodeTools:-Usage(p~(B)):

plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

memory used=1.67MiB, alloc change=0 bytes, cpu time=14.00ms, real time=15.00ms, gc time=0ns

AS stands for Abramowith & Stegun

J_AS := unapply(normal(eval(J(t), [a__0=2.515517, a__1=0.802853, a__2=0.010328, b__1=1.432788, b__2=0.189269, b__3=0.001308])), t):
J_AS(t);

# for comparison:

print():
z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0:
map(sort, %, z);

plot([z_fit(z), J_AS(z)], z=0.5..1, color=[blue, red], legend=[mmcdara, Abramowitz_Stegun], gridlines=true);

print():
R2_AS := CodeTools:-Usage(J_AS~(T~(B))):
print():

plots:-display(
  ScatterPlot(R1, log[10]~(abs~(R2_AS-~R1)), legend=Abramowitz_Stegun, gridlines=true, size=[700, 400]),
  ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red),
  plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6)
);

(0.1308000000e-2*t^4+.1892690000*t^3+1.422460000*t^2+.1971470000*t-2.515517000)/(0.1308000000e-2*t^3+.1892690000*t^2+1.432788000*t+1.)

(0.8875278252e-1*z^4+1.713268121*z^3+2.993818244*z^2-3.527738737*z-2.963729444)/(0.8875278252e-1*z^3+1.713268121*z^2+3.472400103*z+1.)

memory used=2.92MiB, alloc change=0 bytes, cpu time=25.00ms, real time=25.00ms, gc time=0ns

Download InverseNormalCDF.mw

1 2 3 4 5 6

Page 6 of 6

Share via:

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

mmcdara

7896 Reputation

22 Badges

MaplePrimes Activity

Are Maple's pseudo random number generat...

Representing a hierarchical table as a t...

An improved approximation of the Inverse...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again