Carl Love

Carl Love

19692 Reputation

24 Badges

8 years, 47 days
Mt Laurel, New Jersey, United States
My name was formerly Carl Devore.

MaplePrimes Activity


These are Posts that have been published by Carl Love

When re-initializing a module (for example, while executing its ModuleApply), you'd likely want to re-initialize all of the remember tables and caches of its procedures. Unfortunately, forget(thismodule) (a direct self reference) doesn't work (I wonder why not?), and it doesn't even tell you that it didn't work. You could do forget(external name of module(an indirect self reference), but I consider that not robust because you'd need to change it if you change the external name. Even less robust is forget~([procedure 1, procedure 2, submodule 3, ...]). Here's something that works and is robust:

forget~([seq(x, x= thismodule)])[]

Or, using Maple 2019 or later syntax,

forget~([for local x in thismodule do x od])[]

Both of these handle submodules and ignore non-procedures. Both handle both locals and exports.

We need a check-off box for Maple Companion in the Products category of Question and Post headers.

While you're looking at that, there's also a bug that the Product indication gets stripped off when converting a Post to a Question, which is a common Moderator action.

I experienced a significant obstacle while trying to generate independent random samples with Statistics:-Sample on different nodes of a Grid multi-processing environment. After many hours of trial-and-error, I discovered an astonishing workaround, and I achieved excellent time and memory performance. Since this seems like a generally useful computation, I thought that it was worthy of a Post.

This Post is also worth reading to learn how to use Grid when you need to initialize a substantial environment on each node before using Grid:-Map or Grid:-Seq.

All remaining details are in the following worksheet.
 

How to use Statistics:-Sample in the `Grid` environment

Author: Carl Love <carl.j.love@gmail.com> 1 August 2019

 

I experienced a significant obstacle while trying to generate indenpendent random samples with Statistics:-Sample on the nodes of a multi-processor Grid (on a single computer). After several hours of trial-and-error, I discovered that two things are necessary to do this:

1. 

The random number generator needs to be seeded differently in each node. (The reason for this is easy to understand.)

2. 

The random variables generated by Statistics:-RandomVariable need to have different names in each node. This one is mind-boggling to me. Afterall, each node has its own kernel and so its own memory It's as if the names of random variables are stored in a disk file which all kernels access. And also the generator has been seeded differently in each node.

 

Once these things were done, the time and memory performance of the computation were excellent.

restart
:

Digits:= 15
:

#Specify the size of the computation:
(n1,n2,n3):= (100, 100, 1000):
# n1 = size of each random sample;
# n2 = number of samples in a batch;
# n3 = number of batches.

#
#Procedure to initialize needed globals on each node:
Init:= proc(n::posint)
local node:= Grid:-MyNode();
   #This is wrapped in parse so that it'll act globally. Otherwise, an environment
   #variable would be reset when this procedure ends.
   parse("Digits:= 15;", 'statement');

   randomize(randomize()+node); #Initialize independent RNG for this node.
   #If repeatability of results is desired, remove the inner randomize().

   (:-X,:-Y):= Array(1..n, 'datatype'= 'hfloat') $ 2;

   #Perhaps due to some oversight in the design of Statistics, it seems necessary that
   #r.v.s in different nodes **need different names** in order to be independent:
   N||node:= Statistics:-RandomVariable('Normal'(0,1));
   :-TRS:= (X::rtable)-> Statistics:-Sample(N||node, X);
   #To verify that different names are needed, change N||node to N in both lines.
   #Doing so, each node will generate identical samples!

   #Perform some computation. For the pedagogical purpose of this worksheet, all that
   #matters is that it's some numeric computation on some Arrays of random Samples.
   :-GG:= (X::Array, Y::Array)->
      evalhf(
         proc(X::Array, Y::Array, n::posint)
         local s, k, S:= 0, p:= 2*Pi;
            for k to n do
               s:= sin(p*X[k]);  
               S:= S + X[k]^2*cos(p*Y[k])/sqrt(2-sin(s)) + Y[k]^2*s
            od
         end proc
         (X, Y, n)
      )      
   ;
   #Perform a batch of the above computations, and somehow numerically consolidate the
   #results. Once again, pedagogically it doesn't matter how they're consolidated.  
   :-TRX1:= (n::posint)-> add(GG(TRS(X), TRS(Y)), 1..n);
   
   #It doesn't matter much what's returned. Returning `node` lets us verify that we're
   #actually running this on a grid.
   return node
end proc
:

The procedure Init above uses the :- syntax to set variables globally for each node. The variables set are X, Y, N||node, TRS, GG, and TRX1. Names constructed by concatenation, such as N||node, are always global, so :- isn't needed for those.

#
#Time the initialization:
st:= time[real]():
   #Send Init to each node, but don't run it yet:
   Grid:-Set(Init)
   ;
   #Run Init on each node:
   Nodes:= Grid:-Run(Init, [n1], 'wait');
time__init_Grid:= time[real]() - st;

Array(%id = 18446745861500764518)

1.109

The only purpose of array Nodes is that it lets us count the nodes, and it lets us verify that Grid:-MyNode() returned a different value on each node.

num_nodes:= numelems(Nodes);

8

#Time the actual execution:
st:= time[real]():
   R1:= [Grid:-Seq['tasksize'= iquo(n3, num_nodes)](TRX1(k), k= [n2 $ n3])]:
time__run_Grid:= time[real]() - st

4.440

#Just for comparison, run it sequentially:
st:= time[real]():
   Init(n1):
time__init_noGrid:= time[real]() - st;

st:= time[real]():
   R2:= [seq(TRX1(k), k= [n2 $ n3])]:
time__run_noGrid:= time[real]() - st;

0.16e-1

24.483

R1 and R2 will be different because different random numbers were used, but they should have similar histograms.

plots:-display(
   Statistics:-Histogram~(
      <R1 | R2>, #side-by-side plots
      'title'=~ <<"With Grid\n"> | <"Without Grid\n">>,
      'gridlines'= false
   )
);

(Plot output deleted because MaplePrimes cannot handle side-by-side plots!)

They look similar enough to me!

 

Let's try to quantify the benefit of using Grid:

speedup_factor:= time__run_noGrid / time__run_Grid;

5.36319824753560

Express that as a fraction of the theoretical maximum speedup:

efficiency:= speedup_factor / num_nodes;

.670399780941950

I think that that's really good!

 

The memory usage of this code is insignificant, which can be verified from an external memory monitor such as Winodws Task Manager. It's just a little bit more than that needed to start a kernel on each node. It's also possible to measure the memory usage programmatically. Doing so for a Grid:-Seq computation is a little bit beyond the scope of this worksheet.

 


 

Download GridRandSample.mw

Here are the histograms:

The procedure presented here does independence tests of a contingency table by four methods:

  1. Pearson's chi-squared (equivalent to Statistics:-ChiSquareIndependenceTest),
  2. Yates's continuity correction to Pearson's,
  3. G-chi-squared,
  4. Fisher's exact.

(All of these have Wikipedia pages. Links are given in the code below.) All computations are done in exact arithmetic. The coup de grace is Fisher's. The first three tests are relatively easy computations and give approximations to the p-value (the probability that the categories are independent), but Fisher's exact test, as its name says, computes it exactly. This requires the generation of all matrices of nonnegative integers that have the same row and column sums as the input matrix, and for each of these matrices computing the product of the factorials of its entries. So, there are relatively few implementations of it, and perhaps none that do it exactly. (Could some with access check Mathematica please?)

Our own Joe Riel's amazing and fast Iterator package makes this computation considerably easier and faster than it otherwise would've been, and I also found inspiration in his example of recursively counting contingency tables found at ?Iterator,BoundedComposition

ContingencyTableTests:= proc(
   O::Matrix(nonnegint), #contingency table of observed counts 
   {method::identical(Pearson, Yates, G, Fisher):= 'Pearson'}
)
description 
   "Returns p-value for Pearson's (w/ or w/o Yates's continuity correction)" 
   " or G chi-squared or Fisher's exact test."
   " All computations are done in exact arithmetic."
;
option
   author= "Carl Love <carl.j.love@gmail.com>, 27-Oct-2018",
   references= (                                                           #Ref #s:
      "https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test",         #*1
      "https://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity",  #*2
      "https://en.wikipedia.org/wiki/G-test",                               #*3
      "https://en.wikipedia.org/wiki/Fisher%27s_exact_test",                #*4
      "Eric W Weisstein \"Fisher's Exact Test\" _MathWorld_--A Wolfram web resource:"
      " http://mathworld.wolfram.com/FishersExactTest.html"                 #*5
   )
;
uses AT= ArrayTools, St= Statistics, It= Iterator;
local
   #column & row sums: 
   C:= AT:-AddAlongDimension(O,1), R:= AT:-AddAlongDimension(O,2),
   r:= numelems(R), c:= numelems(C), #counts of rows & columns
   n:= add(R), #number of observations
   #matrix of expected values under null hypothesis (independence):
   #(A 0 entry would mean a 0 row or column in the original, which is not allowed.)
   E:= Matrix((r,c), (i,j)-> R[i]*C[j], datatype= 'positive') / n,
   #Pearson's, Yates's, and G all use a chi-sq statistic, each computed by 
   #slightly different formulae.
   Chi2:= add@~table([
       'Pearson'= (O-> (O-E)^~2 /~ E),                     #see *1
       'Yates'= (O-> (abs~(O - E) -~ 1/2)^~2 /~ E),        #see *2
       'G'= (O-> 2*O*~map(x-> `if`(x=0, 0, ln(x)), O/~E))  #see *3
   ]), 
   row, #alternative rows generated for Fisher's
   Cutoff:= mul(O!~), #cut-off value for less likely matrices
   #Generate recursively all contingency tables whose row and column sums match O.
   #Compute their probabilities under independence. Sum probabilities of all those
   #at most as probable as O. (see *5, *4)
   #Parameters: 
   #   C = column sums remaining to be filled; 
   #   F = product of factorials of entries of contingency table being built;
   #   i = row to be chosen this iteration
   AllCTs:= (C, F, i)->
      if i = r then #Recursion ends; last row is C, the unused portion of column sums. 
         (P-> `if`(P >= Cutoff, 1/P, 0))(F*mul(C!~))
      else
         add(
            thisproc(C - row[], F*mul(row[]!~), i+1), 
            row= It:-BoundedComposition(C, R[i])
         )
      fi      
;
   userinfo(1, ContingencyTableTests, "Table of expected values:", print(E));
   if method = 'Fisher' then AllCTs(C, 1, 1)*mul(R!~)*mul(C!~)/n!
   else 1 - St:-CDF(ChiSquare((r-1)*(c-1)), Chi2[method](O)) 
   fi   
end proc:

The worksheet below contains the code above and one problem solved by the 4 methods


 

 

DrugTrial:= <
   20, 11, 19;
   4,  4,  17
>:

infolevel[ContingencyTableTests]:= 1:

ContingencyTableTests(DrugTrial, method= Pearson):  % = evalf(%);

ContingencyTableTests: Table of expected values:

Matrix(2, 3, {(1, 1) = 16, (1, 2) = 10, (1, 3) = 24, (2, 1) = 8, (2, 2) = 5, (2, 3) = 12})

exp(-257/80) = 0.4025584775e-1

#Compare with:
Statistics:-ChiSquareIndependenceTest(DrugTrial);

hypothesis = false, criticalvalue = HFloat(5.991464547107979), distribution = ChiSquare(2), pvalue = HFloat(0.04025584774823787), statistic = 6.425000000

infolevel[ContingencyTableTests]:= 0:
ContingencyTableTests(DrugTrial, method= Yates):  % = evalf(%);

exp(-1569/640) = 0.8615885805e-1

ContingencyTableTests(DrugTrial, method= G):  % = evalf(%);

exp(-20*ln(5/4)+4*ln(2)-11*ln(11/10)-4*ln(4/5)-19*ln(19/24)-17*ln(17/12)) = 0.3584139703e-1

CodeTools:-Usage(ContingencyTableTests(DrugTrial, method= Fisher)):  % = evalf(%);

memory used=0.82MiB, alloc change=0 bytes, cpu time=0ns, real time=5.00ms, gc time=0ns

747139720973921/15707451356376611 = 0.4756594205e-1

 


 

Download FishersExact.mw

The attached worksheet shows how to evaluate and graphically analyze an autonomous first-order nonlinear recurrence with two dependent variables and multiple symbolic parameters. 

This worksheet shows how a small module that simply encapsulates the given information of a problem combined with some use statements can greatly facilitate the organization of one's work, can encapsulate the setting of parameter values, and can allow one to work with symbolic parameters.

Edit: In the first version of this Post, I forgot to include the qualifier "autonomous".  The system being autonomous substantially simplifies its treatment.
 

Autonomous first-order nonlinear recurrences with parameters and multiple dependent variables

Author: Carl Love <carl.j.love@gmail.com> 20-Oct-2018

 

The techniques used in this worksheet can be applied to most autonomous first-order nonlinear recurrences with multiple dependent variables and parameters.

 

This worksheet shows how a small module that simply encapsulates the given information of a problem combined with some use statements

• 

can greatly facilitate the organization of one's work,

• 

can encapsulate the setting of parameter values,

• 

can allow one to work with symbolic parameters.

 

A Problem from MaplePrimes: A discrete Lottka-Volterra population model is applied to an isolated island with a population of predators (foxes), R, and prey (rabbits), K. [Note that R is the foxes, not the rabbits! Perhaps this problem statement originated in another language.] The change over one time period is given by

K[n+1]:= K[n]*(-b*R[n]+a+1);  R[n+1]:= R[n]*(b*e*K[n]-c+1),

where a, b, c, e are parameters of the model. In this problem we will use a= 0.15, b= 0.01, c= 0.02, e= 0.01, when numeric values are needed.

 

a) Show that there exists an equilibrium (values of K[n] and R[n] such that K[n+1] = K[n] and R[n+1] = R[n]).

 

b) Write Maple code that solves the recurrence numerically. Assume that if any population is less than 0.5 then it has gone extinct and set the value to 0. Check that your program is idempotent at the equilibrium.

 

restart:

We begin by collecting all the given information (except for specific numeric values) into a module. The ModuleApply lets the user set the numeric values later.

 

For all two-element vectors used in this worksheet, K is the first value and R is the second value.

KandR:= module()
local
   a, b, c, e, #parameters

   #procedure that lets user set parameter values:
   ModuleApply:= proc({
       a::algebraic:= KandR:-a, b::algebraic:= KandR:-b,
       c::algebraic:= KandR:-c, e::algebraic:= KandR:-e
   })
   local k;
      for k to _noptions do thismodule[lhs(_options[k])]:= rhs(_options[k]) od;
      return
   end proc,

   Extinct:= (x::realcons)-> `if`(x < 0.5, 0, x) #force small, insignificant values to 0
;
export
   #Procedure that does one symbolic iteration
   #(Note that this procedure uses Vector input and output.)
   iter_symb:= KR-> KR *~ <-b*KR[2]+a+1, b*e*KR[1]-c+1>, 

   #Such simple treatment as above is only possible for autonomous
   #recurrences.

  
   iter_num:= Extinct~@iter_symb #one numeric iteration
;
end module:

#The following expression is the discrete equivalent of the derivative (or gradient).
#It represents the change over one time period.
P:= <K,R>:  
OneStep:= KandR:-iter_symb(P) - P

Vector(2, {(1) = K*(-R*b+a+1)-K, (2) = R*(K*b*e-c+1)-R})

#An equilibrium occurs when the gradient is 0.
Eq:= <K__e, R__e>:
Eqs:= solve({seq(eval(OneStep=~ 0, [seq(P=~ Eq)]))}, [seq(Eq)]);

[[K__e = 0, R__e = 0], [K__e = c/(b*e), R__e = a/b]]

#We're only interested here in nonzero solutions.
EqSol:= remove(S-> 0 in rhs~(S), Eqs)[];

[K__e = c/(b*e), R__e = a/b]

#Set parameters:
KandR(a= 0.15, b= 0.01, c= 0.02, e= 0.01);

#Show idempotency at equilibrium:
use KandR in Eq0:= eval(Eq, EqSol); print(Eq0 = iter_num(Eq0)) end use:

(Vector(2, {(1) = 200.0000000, (2) = 15.00000000})) = (Vector(2, {(1) = 200.0000000, (2) = 15.00000000}))

#procedure that fills a Matrix with computed values of a 1st-order recurrence.
#(A more-efficient method than this can be used for linear recurrences.)
#This procedure has no dependence on the module.
Iterate:= proc(n::nonnegint, iter, init::Vector[column])
local M:= Matrix((n+1, numelems(init)), init^+, datatype= hfloat), i;
   for i to n do M[i+1,..]:= iter(M[i,..]) od;
   M
end proc:

We want to see what happens if the initial conditions deviate slightly from the equilibrium. It turns out that any deviation (as long as the
initial values are still nonnegative!) will cause the same effect. I simply chose the deviation <7,2> because it was the smallest for which

the plot clearly showed what happens using the scale that I wanted to show the plot at. By using a finer scale, it is possible to see the

"outward spiral" efffect from even the tiniest deviation.

dev:= <7,2>:
use KandR in KR:= Iterate(1000, iter_num, Eq0 + dev) end use:

plot(
   [
       KR, #trajectory of population
       KR[[1,1],..], #1st point
       KR[-[1,1],..], #last point,
       <Eq0|Eq0>^+, #equilibrium
       #every 100th point (helps show time scale):
       KR[100*[$1..iquo(numelems(KR[..,1]), 100)-1], ..]
   ],
   #This group of options are all lists, each element of which corresponds
   #to one of the above components of the plot:
   style= [line, point$4],
   symbol= [solidcircle$4, soliddiamond],
   symbolsize= [18$4, 12],
   color= [black, green, red, brown, blue],
   thickness= [0$5],
   legend= [`pop.`, init, final, equilibrium, `100 periods`],

   #This group of options are lists, each element of which corresponds to one
   #coordinate axis (horizontal, then vertical).
   view= [0..max(KR[..,1]), 0..max(KR(..,2))],
   labels= [rabbits, foxes],
   labeldirections= [horizontal,vertical],
   size= [700,700], #measured in pixels

   #options applied to whole plot:
   labelfont= [TIMES, BOLDITALIC, 14],
   title= "Population of foxes and rabbits over time" "\n", titlefont= [TIMES,16],
   caption=
      "\n" "Choosing an initial point near the equilibrium causes"
      "\n" "outward spiraling divergence." "\n",
   gridlines= false
);
 

A fieldplot helps show what happens for any starting values. An arrow is drawn from each of a 2-D grid of point. The magnitude and direction of the arrow show the gradient (as a vector) in this case.

plots:-fieldplot(
   rtable_eval(OneStep),
   K= 0..max(KR[..,1]),  R= 0..max(KR[..,2]), grid= [16,16],

   #arrow-specific options:
   anchor= tail, fieldstrength= log, arrows= slim, color= "DarkGreen",

   #other options (same as any 2D plot):
   labels= [rabbits, foxes], labeldirections= [horizontal,vertical],
   labelfont= [TIMES, BOLDITALIC, 14],
   title= "One-step population changes from any point" "\n", titlefont= [TIMES,16],
   caption= "\n" "All trajectories spiral outward from the equilibrium." "\n",
   size= [700,700],
   gridlines= false
);

The above plot is computed only from the symbolic discrete gradient expression OneStep; it does not use the computed population values from the first plot. It only uses the maxima of those computed values to determine the length of the axes.

 

Conclusion: While this is interesting stuff mathematically, and makes for great plots, divergence from the equilibrium doesn't seem realistic to me.

 


 

Download FoxesAndRabbits.mw

1 2 3 4 5 Page 1 of 5