Carl Love

Carl Love

23953 Reputation

25 Badges

9 years, 311 days
Natick, Massachusetts, United States
My name was formerly Carl Devore.

MaplePrimes Activity


These are Posts that have been published by Carl Love

About a half hour ago, there was a Post titled "Read binary file" from a new user. I converted it to a Question. Now that I want to Answer the Question, I can't find it. If you are the author of that Question, and you still want an answer, please post it again, but put it in the Questions area.

(I think that there may be a bug in MaplePrimes that makes this happen. I've had it happen several times before.)

Anyway, please respond to this regardless of what you want regarding the Question. That'll help me figure out what went wrong.

Yesterday, user @lcz , while responding as a third party to one of my Answers regarding GraphTheory, asked about breadth-first search. So, I decided to write a more-interesting example of it than the relatively simple example that was used in that Answer. I think that this is general enough to be worthy of a Post.

This application generates all maximal paths in a graph that begin with a given vertex. (I'm calling a path maximal if it cannot be extended and remain a path.) This code requires Maple 2019 or later and 1D input. This works for both directed and undirected graphs. Weights, if present. are ignored.

restart:

AllMaximalPaths:= proc(G::GRAPHLN, v)
description 
    "All maximal paths of G starting at v by breadth-first search"
;
option `Author: Carl Love <carl.j.love@gmail.com> 2021-Mar-17`;
uses GT= GraphTheory;
local 
    P:= [rtable([v])], R:= rtable(1..0),
    VL:= GT:-Vertices(G), V:= table(VL=~ [$1..nops(VL)]),
    Departures:= {op}~(GT:-Departures(G))
;
    while nops(P) <> 0 do
        P:= [
            for local p in P do
                local New:= Departures[V[p[-1]]] minus {seq}(p);
                if New={} then R,= [seq](p); next fi;                
                (
                    for local u in New do 
                        local p1:= rtable(p); p1,= u
                    od
                )       
            od
        ]
    od;
    {seq}(R)  
end proc
:
#large example:
GT:= GraphTheory:
K9:= GT:-CompleteGraph(9):
Pa:= CodeTools:-Usage(AllMaximalPaths(K9,1)):
memory used=212.56MiB, alloc change=32.00MiB, 
cpu time=937.00ms, real time=804.00ms, gc time=312.50ms

nops(Pa);
                             40320
#fun example:
P:= GT:-SpecialGraphs:-PetersenGraph():
Pa:= CodeTools:-Usage(AllMaximalPaths(P,1)):
memory used=0.52MiB, alloc change=0 bytes, 
cpu time=0ns, real time=3.00ms, gc time=0ns

nops(Pa);
                               72

Pa[..9]; #sample paths
    {[1, 2, 3, 4, 10, 9, 8, 5], [1, 2, 3, 7, 8, 9, 10, 6], 
      [1, 2, 9, 8, 7, 3, 4, 5], [1, 2, 9, 10, 4, 3, 7, 6], 
      [1, 5, 4, 3, 7, 8, 9, 2], [1, 5, 4, 10, 9, 8, 7, 6], 
      [1, 5, 8, 7, 3, 4, 10, 6], [1, 5, 8, 9, 10, 4, 3, 2], 
      [1, 6, 7, 3, 4, 10, 9, 2]}

Notes on the procedure:

The two dynamic data structures are

  • P: a list of vectors of vertices. Each vector contains a path which we'll attempt to extend.
  • R: a vector of lists of vertices. Each list is a maximal path to be returned.

The static data structures are

  • V: a table mapping vertices (which may be named) to their index numbers.
  • Departures: a list of sets of vertices whose kth set is the possible next vertices from vertex number k.

On each iteration of the outer loop, P is completely reconstructed because each of its entries, a path p, is either determined to be maximal or it's extended. The set New is the vertices that can be appended to the (connected to vertex p[-1]). If New is empty, then p is maximal, and it gets moved to R


The following code constructs an array plot of all the maximal paths in the Petersen graph. I can't post the array plot, but you can see it in the attached worksheet: BreadthFirst.mw

#Do an array plot of each path embedded in the graph:
n:= nops(Pa):
c:= 9: 
plots:-display(
    (PA:= rtable(
        (1..ceil(n/c), 1..c),
        (i,j)-> 
            if (local k:= (i-1)*ceil(n/c) + j) > n then 
                plot(axes= none)
            else 
                GT:-DrawGraph(
                    GT:-HighlightTrail(P, Pa[k], inplace= false), 
                    stylesheet= "legacy", title= typeset(Pa[k])
                )
            fi
    )),
    titlefont= [Times, Bold, 12]
);

#And recast that as an animation so that I can post it:
plots:-display(
    [seq](`$`~(plots:-display~(PA), 5)),
    insequence
); 

 

When re-initializing a module (for example, while executing its ModuleApply), you'd likely want to re-initialize all of the remember tables and caches of its procedures. Unfortunately, forget(thismodule) (a direct self reference) doesn't work (I wonder why not?), and it doesn't even tell you that it didn't work. You could do forget(external name of module(an indirect self reference), but I consider that not robust because you'd need to change it if you change the external name. Even less robust is forget~([procedure 1, procedure 2, submodule 3, ...]). Here's something that works and is robust:

forget~([seq(x, x= thismodule)])[]

Or, using Maple 2019 or later syntax,

forget~([for local x in thismodule do x od])[]

Both of these handle submodules and ignore non-procedures. Both handle both locals and exports.

We need a check-off box for Maple Companion in the Products category of Question and Post headers.

While you're looking at that, there's also a bug that the Product indication gets stripped off when converting a Post to a Question, which is a common Moderator action.

I experienced a significant obstacle while trying to generate independent random samples with Statistics:-Sample on different nodes of a Grid multi-processing environment. After many hours of trial-and-error, I discovered an astonishing workaround, and I achieved excellent time and memory performance. Since this seems like a generally useful computation, I thought that it was worthy of a Post.

This Post is also worth reading to learn how to use Grid when you need to initialize a substantial environment on each node before using Grid:-Map or Grid:-Seq.

All remaining details are in the following worksheet.
 

How to use Statistics:-Sample in the `Grid` environment

Author: Carl Love <carl.j.love@gmail.com> 1 August 2019

 

I experienced a significant obstacle while trying to generate indenpendent random samples with Statistics:-Sample on the nodes of a multi-processor Grid (on a single computer). After several hours of trial-and-error, I discovered that two things are necessary to do this:

1. 

The random number generator needs to be seeded differently in each node. (The reason for this is easy to understand.)

2. 

The random variables generated by Statistics:-RandomVariable need to have different names in each node. This one is mind-boggling to me. Afterall, each node has its own kernel and so its own memory It's as if the names of random variables are stored in a disk file which all kernels access. And also the generator has been seeded differently in each node.

 

Once these things were done, the time and memory performance of the computation were excellent.

restart
:

Digits:= 15
:

#Specify the size of the computation:
(n1,n2,n3):= (100, 100, 1000):
# n1 = size of each random sample;
# n2 = number of samples in a batch;
# n3 = number of batches.

#
#Procedure to initialize needed globals on each node:
Init:= proc(n::posint)
local node:= Grid:-MyNode();
   #This is wrapped in parse so that it'll act globally. Otherwise, an environment
   #variable would be reset when this procedure ends.
   parse("Digits:= 15;", 'statement');

   randomize(randomize()+node); #Initialize independent RNG for this node.
   #If repeatability of results is desired, remove the inner randomize().

   (:-X,:-Y):= Array(1..n, 'datatype'= 'hfloat') $ 2;

   #Perhaps due to some oversight in the design of Statistics, it seems necessary that
   #r.v.s in different nodes **need different names** in order to be independent:
   N||node:= Statistics:-RandomVariable('Normal'(0,1));
   :-TRS:= (X::rtable)-> Statistics:-Sample(N||node, X);
   #To verify that different names are needed, change N||node to N in both lines.
   #Doing so, each node will generate identical samples!

   #Perform some computation. For the pedagogical purpose of this worksheet, all that
   #matters is that it's some numeric computation on some Arrays of random Samples.
   :-GG:= (X::Array, Y::Array)->
      evalhf(
         proc(X::Array, Y::Array, n::posint)
         local s, k, S:= 0, p:= 2*Pi;
            for k to n do
               s:= sin(p*X[k]);  
               S:= S + X[k]^2*cos(p*Y[k])/sqrt(2-sin(s)) + Y[k]^2*s
            od
         end proc
         (X, Y, n)
      )      
   ;
   #Perform a batch of the above computations, and somehow numerically consolidate the
   #results. Once again, pedagogically it doesn't matter how they're consolidated.  
   :-TRX1:= (n::posint)-> add(GG(TRS(X), TRS(Y)), 1..n);
   
   #It doesn't matter much what's returned. Returning `node` lets us verify that we're
   #actually running this on a grid.
   return node
end proc
:

The procedure Init above uses the :- syntax to set variables globally for each node. The variables set are X, Y, N||node, TRS, GG, and TRX1. Names constructed by concatenation, such as N||node, are always global, so :- isn't needed for those.

#
#Time the initialization:
st:= time[real]():
   #Send Init to each node, but don't run it yet:
   Grid:-Set(Init)
   ;
   #Run Init on each node:
   Nodes:= Grid:-Run(Init, [n1], 'wait');
time__init_Grid:= time[real]() - st;

Array(%id = 18446745861500764518)

1.109

The only purpose of array Nodes is that it lets us count the nodes, and it lets us verify that Grid:-MyNode() returned a different value on each node.

num_nodes:= numelems(Nodes);

8

#Time the actual execution:
st:= time[real]():
   R1:= [Grid:-Seq['tasksize'= iquo(n3, num_nodes)](TRX1(k), k= [n2 $ n3])]:
time__run_Grid:= time[real]() - st

4.440

#Just for comparison, run it sequentially:
st:= time[real]():
   Init(n1):
time__init_noGrid:= time[real]() - st;

st:= time[real]():
   R2:= [seq(TRX1(k), k= [n2 $ n3])]:
time__run_noGrid:= time[real]() - st;

0.16e-1

24.483

R1 and R2 will be different because different random numbers were used, but they should have similar histograms.

plots:-display(
   Statistics:-Histogram~(
      <R1 | R2>, #side-by-side plots
      'title'=~ <<"With Grid\n"> | <"Without Grid\n">>,
      'gridlines'= false
   )
);

(Plot output deleted because MaplePrimes cannot handle side-by-side plots!)

They look similar enough to me!

 

Let's try to quantify the benefit of using Grid:

speedup_factor:= time__run_noGrid / time__run_Grid;

5.36319824753560

Express that as a fraction of the theoretical maximum speedup:

efficiency:= speedup_factor / num_nodes;

.670399780941950

I think that that's really good!

 

The memory usage of this code is insignificant, which can be verified from an external memory monitor such as Winodws Task Manager. It's just a little bit more than that needed to start a kernel on each node. It's also possible to measure the memory usage programmatically. Doing so for a Grid:-Seq computation is a little bit beyond the scope of this worksheet.

 


 

Download GridRandSample.mw

Here are the histograms:

1 2 3 4 5 Page 1 of 5