Grid computing toolbox on local multi-core server

michalkvasnicka — Mon, 22 Feb 2021 16:21:17 Z

Is possible to operate Grid Computing Toolbox together with Maple on one local multicore machine?

If yes, what is the proper instalation/configuration procedure on Windows or Linux?

Grid Toolbox C Compile

kfli — Tue, 31 Dec 2019 10:36:14 Z

Hello,

is it possible to send compiled Maple functions (C function) to the nodes with the Grid toolbox? I can run the non-compiled function on the nodes but when I try to send a compiled version of the same function I get segmentation error. I have run the compiled function just fine in other nonparallel codes, and also with the Threads package. So I was wondering if there is something else you need to do to get it to work? Maybe compile on each node?

How to use the Grid Computing Toolbox with Groebner?

emendes — Fri, 11 Nov 2016 20:54:45 Z

Hello

I have a procedure that builds an ideal from a specific set of polynomials and then calls the Groebner basis package to eliminate some of the variables. Even though the procedure is running on a machine with 2 processors, 24 cores and 72 GB of ram, only one core has been used (and is always on a 100% usage). Would the Grid Computing Toolbox be of some hope in this case? If so, how to insert the Grid commands so that Maple sends the calculations to the other cores (I find the document rather confusing)? If I am talking non sense, please let me know.

Many thanks

Grid Setup - HPC

rstellian — Wed, 10 Aug 2016 16:31:31 Z

Hi everyone,

For my first question, I am looking for some help about the following. I have the opportunity to run a worksheet in parallel on a cluster of sixteen workstations, each one endowed with twelve CPUs, through the GRID Computing Toolbox. However, I have troubles concerning how to do that.

I join the worksheet at issue: abmm_ma_1.mw The aim is to run large-scale numerical simulations of a dynamic system, depending on the values given to the initial conditions and to the parameters. The worksheet is organized in four execution groups:

The required packages (combinat and LinearAlgebra).
Calibration of the parameters and initial conditions.
The system, which is embedded into a procedure called SIM.
The activation of SIM, whose outputs are nine .mla files, each one being made of a real-number matrix.

The truth is, I do not clearly see how to modify the worksheet with some elements of the GRID package. Besides, the cluster operates under HTCondor so that running the worksheet requires beforehand the creation of a .sub file. This should be done in consistency with the aforesaid modification.

Any help is welcome, thanks a lot.

Money Ball with Maple: How to Optimize the 2013 Blue Jays Lineup So They Can Start Winning

Paul — Mon, 27 May 2013 18:45:33 Z

One of the most basic decisions a baseball manager has to make is how to arrange the batting order. There are many heuristics and models for estimating the productivity of a given order. My personal favourite is the use of simulation, but by far the most elegant solution from a mathematical perspective uses probability matrices and Markov chains. An excellent treatment of this topic can be found in Dr. Joel S. Sokol's article, An Intuitive Markov Chain Lesson From Baseball.

The basic idea is to define a set of states that correspond to the number of outs and the configuration of runners on base. There are 8 possible runner locations and 3 possible number of outs, giving a total of 8x3 = 24 states. With each state there is a corresponding probability of getting to that state. 24 states is good enough for a half inning of baseball. The model needs to be extended to cover 9 innings, plus a "game over" state. By iteratively applying transition matrices in the same order as the lineup, a game can be simulated to calculate the expected number of runs scored.

The full worksheet can be found here: BJMarkovBall.mw

Let's use one of the starting lineups the first quarter of the 2013 Blue Jays. We'll use the 2012 batting data to build the model:

data := Matrix( [ #HR 3B 2B 1B BB OUT Name [11, 3, 26, 95, 33, 359, Brett_Lawrie_3B], [11,10, 25, 113, 36, 400, Melky_Cabrera_DH], [27, 0, 14, 39, 59, 252, Jose_Bautista_RF], [42, 0, 24, 86, 84, 390, Edwin_Encarnacion_1B], [18, 0, 16, 47, 18, 266, JP_Arencibia_C], [ 8, 3, 24, 80, 29, 332, Rajai_Davis_LF], [23, 5, 21, 77, 47, 439, Colby_Rasmus_CF], [ 0+1, 0+4, 5+3, 11+55, 14+25, 69+181, Mark_DeRosa_2B_Emilio_Bonifacio_2B], [ 2, 0, 11, 61, 25, 215, Maicer_Izturis_SS] ]):#no Adam Lind

The code for setting up these calculations is in the code-region below.

# Markov Model
createmat := proc( stats )
     local i, j, total, h, t, d, s, w, o, smallmat, transmat; 
     if min(stats) < 0 then
       error "probabilities can not be negative";
     end if;

     # convert counts to probabilities
     total := add(i,i in stats);
     h := stats[1] / total;
     t := stats[2] / total;
     d := stats[3] / total;
     s := stats[4] / total;
     w := stats[5] / total;
     o := stats[6] / total;

     # this submatrix will appear for each inning and each number of outs -
     # it gives the probabilities of changing states when an out does not
     # occur.

     smallmat := Matrix(8,8,datatype=float[8]);
     smallmat(1,..) := ;
     smallmat(2,..) := ;
     smallmat(3,..) := ;
     smallmat(4,..) := ;
     smallmat(5,..) := ;
     smallmat(6,..) := ;
     smallmat(7,..) := ;
     smallmat(8,..) := ;

     transmat := Matrix( 9*24+1 , 9*24+1, datatype=float[8] );

     for i from 1 to 9*3 do
       transmat( (i-1)*8+1 .. (i-1)*8+8 , (i-1)*8+1 .. (i-1)*8+8 ) := smallmat;
     end do;

     # Now, when an out occurs and it's not the third out, just advance to
     # the same inning and same on-base state with one more out.

     for i from 1 to 9 do
       for j from 1 to 2 do
         transmat( (i-1)*24+(j-1)*8+1 .. (i-1)*24+(j-1)*8+8 , 
                   (i-1)*24+(j-1)*8+9 .. (i-1)*24+(j-1)*8+16 ) := 
                o . LinearAlgebra:-IdentityMatrix(8);
       end do;

       # In each inning, the third out goes to the next inning's
       # "0 out, 0 on base" state regardless of who was on base before.
    
       transmat( (i-1)*24+17 .. (i-1)*24+24 , i*24+1 ) := o . Matrix(8,1,fill=1);
     end do;

     # The final "game over" state can only go to itself.

     transmat( 9*24+1 , 9*24+1 ) := 1;

end proc:
    

playermatrices := Array(1..9):
for i from 1 to 9 do 
     playermatrices[i] := createmat(data[i,1..-2]);
end do:

smallmat := Matrix([
    [ 1, 0, 0, 0, 0, 0, 0, 0 ],
    [ 2, 1, 1, 1, 0, 0, 0, 0 ],
    [ 2, 1, 1, 1, 0, 0, 0, 0 ],
    [ 2, 1, 1, 1, 0, 0, 0, 0 ],
    [ 3, 2, 2, 2, 1, 1, 1, 0 ],
    [ 3, 2, 2, 2, 1, 1, 1, 0 ],
    [ 3, 2, 2, 2, 1, 1, 1, 0 ],
    [ 4, 3, 3, 3, 2, 2, 2, 1 ]
]):

createrunmatrix := proc( smalllmat )
    local i;
    local runmatrix := Matrix(9*24+1,9*24+1,datatype=float[8]);

    for i from 1 to 9*3 do
      runmatrix( (i-1)*8+1 .. (i-1)*8+8 , (i-1)*8+1 .. (i-1)*8+8 ) := smallmat;
    end do;
    runmatrix;
end proc:
runmatrix := createrunmatrix(smallmat):

calculate := proc( ord, playermatrices, runmatrix)
  uses ArrayTools;
  local M, situation, runs, batter;

  situation := Matrix( 1 , 9*24+1, datatype=float[8] );
  situation(1) := 1;

  runs := 0;
  batter := 0;

  while situation(9*24+1) < 0.99  do
      M := playermatrices[ord[batter+1]];
      runs := runs + add(i, i in situation . (runmatrix *~ M));
      situation := situation . M;

      # Get next batter
      batter := (batter + 1) mod 9;
  end do;

  runs;
end proc:

MaxWithIndex := proc( r ) 
    rtable_scanblock(r, [1 .. upperbound(r,1)], 
        (val, ind, res)->`if`(res[2] < val, [ind, val], res), [[1], r[1]]);
end proc:

MinWithIndex := proc( r ) 
    rtable_scanblock(r, [1 .. upperbound(r,1)], 
        (val, ind, res)->`if`(res[2] > val, [ind, val], res), [[1], r[1]]);
end proc:

Now we simply pick an order permutation, say, in the given order:

ord := [1,2,3,4,5,6,7,8,9]:

Then we can calculate the expected number of runs this lineup will generate.

runs := calculate(ord,playermatrices,runmatrix);

4.330150441827209

The permutation that has the 9th batter lead-off will, as expected, generate fewer runs (but not by much!).

ord := [9,2,3,4,5,6,7,8,1]: 
runs := calculate(ord,playermatrices,runmatrix);

4.320882424139622

But what is the best possible ordering? We can iterate through all possibilities. There are 9! permutations

combinat:-numbperm(9);

362880

Trying a smaller sample -- the first 100 permutations, we can get a sense of how long this will take. At the same time we'll store the result of each calculation in an array. The unrankperm command helps us iterate through all possible permutations.

Nperm := 100:
num_players := upperbound(playermatrices,1):
r := Array(1..Nperm,datatype=float[8]):
t := time[real]():
 for i from 1 to Nperm do
    ord := combinat:-unrankperm(i,num_players);
    r[i] := calculate(ord,playermatrices,runmatrix);
 od:
printf("serial time for %d permutations: %0.3f seconds\n",Nperm,time[real]()-t);

serial time for 100 permutations: 7.997 seconds

Before optimizing the code, the time to calculate 20 permutations was 12.044 seconds. That means it would have taken about 12.044*9!/(60*(20*60)) = 60.70176001 hours to iterate through all possibilities. Even with optimized serial code, I was looking at an 8 hour computation. Here's how the code looks after being restructured to split up the problem over many cpus.


calculateall := proc( playermatrices, runmatrix, { NP::posint := 0 }  )
    uses Grid, combinat;
    local r, me, a, b, span, ord, n, i, num_perm, num_players, file;
    global calculate;

    num_players := upperbound(playermatrices,1);
    num_perm := `if`( NP > 0, NP, numbperm(num_players));

    # decide which partition this node should work on
    me := MyNode();
    n := NumNodes();
    span := num_perm / n;
    a := trunc(me*span+1);
    b := trunc((me+1)*span);

    # calculate results
    r := Array(1..b-a+1,datatype=float[8]):
    printf("Node %d/%d handling permutations %d .. %d\n", me,n,a,b);
    for i from a to b do
        ord := combinat:-unrankperm(i,num_players);
        r[i-a+1] := calculate(ord,playermatrices,runmatrix);
    od:

    #uncomment to store results
    #file := cat("markovresult",me,".m");
    #save(r, file);

    # send results back to node 0
    if me = 0 then
        ArrayTools:-Concatenate( 2, r, seq(Receive(i),i=1..n-1) );
    else
        Send(0,r);
    end if;

end proc:

Here's a sample run using the parallel method.

t := time[real]():
r := Grid:-Launch(calculateall,playermatrices,runmatrix,NP=100,imports=[calculate],allexternal=false );
printf("parallel time for 100 permutations: %0.3f\n",time[real]()-t);


Node 1/4 handling permutations 26 .. 50
Node 2/4 handling permutations 51 .. 75
Node 3/4 handling permutations 76 .. 100
Node 0/4 handling permutations 1 .. 25
[1 .. 100 Array      ]
[Data Type: anything ]
[Storage: rectangular]
[Order: Fortran_order]
parallel time for 100 permutations: 5.375

This parallel method simply splits the calculations up evenly based on the number of nodes in the pool. It doesn't account for situations where it takes longer to process some of the permutations. Scaling this up to NP=1000, we observed a serial time of 71.7 seconds, compared to a parallel time of 35.4 seconds on an older 4-core box. It is my guess that memory contention severely limited the performance on a single computer, where 4 processors had to share access to the heap, and at 1000 permutations, about 5GB of memory is accessed -- "memory used" from a pool of 95MB per process. We'll look at performance over a distributed cluster below, after we analyse the results. Examining the result computed after the first set of iterations, we see a better predicted runs from the 26th permutation. The recommendation so far is to swap the #5 and #6 hitters and drop the 8th batter down to the 9th spot.

MaxWithIndex(r);

[[26], 4.332527095304076]

combinat:-unrankperm(26,9);

[1, 2, 3, 4, 6, 5, 7, 9, 8]

We need to run through the full computation to find the best results.

It turns out the best result is as follows:

best_perm := [[298945], 4.35010071731]:
ord := combinat:-unrankperm(best_perm[1][1],9);

[8, 4, 3, 2, 5, 1, 6, 7, 9]

best := data[ord];

This seems like a very surprising result until you show some further statistics -- each batters slugging average, on-base-percentage, and on-base-plus-slugging. The OPS column hints at how the new order was arrived at. The stand-out statistic is something called on-base-plus-slugging (OPS), which is a measure of a player's ability to both get on base, and get extra-base hits. It also turns out that the #1 spot in the lineup is not the most important one, rather #2, #3, and #4 are the key spots. And yes, even though Jose Bautista hit only .241 last year, he still deserves to be in one of the clean-up spots (at least in this Jays lineup).

show_stats := proc( lineup )
   local S, i, Hits, AB;
   S := Array(1..10,1..11):
   S[1,..] := :
   S[2..10,..] := lineup:
   for i from 1 to 9 do
       Hits := add(lineup[i,j],j=1..4);
       AB := Hits + lineup[i,6];
       S[i+1,8] := evalf(Hits/AB,3);
       S[i+1,9] := evalf((4*lineup[i,1]+3*lineup[i,2]+2*lineup[i,3]+lineup[i,4])/AB,3);
       S[i+1,10] := evalf((Hits+best[i,5])/AB,3):
       S[i+1,11] := S[i+1,8] + S[i+1,9];
   od:
   interface(rtablesize=15):
   S;
end proc:
show_stats( best );

A key thing to note is that changing the order of a lineup of major-leaguers does not make a huge impact on the expected run production (from a statistical point of view). Here we see the worst possible ordering will only lead to a drop in run production of 1 run every 5 or 6 games. The impact is much greater when putting together a beer-league lineup where the players have a much wider range of ability.


worst_perm := [[141719], 4.17480299108879]:

ord := combinat:-unrankperm(worst_perm[1][1],9);

[4, 6, 1, 8, 9, 7, 5, 2, 3]

show_stats(data[ord]);


ExpectedChangeInRunProduction := best_perm[2] - worst_perm[2];

0.175297726

This result was computed using 256 nodes on an Opteron-based computer in 192 seconds. A sample run of 100 permutations in serial took 12.79 seconds, leading to an estimate of 12.79*9!*(1/100) = 46412.35200 seconds for a full serial run. This is very close to the total parallel time -- 192*256 = 49152 seconds, indicating that the distribitued computation scaled linearly. But, instead of waiting almost 13 hours for that serial run to complete, the parallel answer was available in 3.2 minutes (using only 3% of the capacity of orca's 8320 cores).


Statistics:-ColumnGraph([46412, 192], 
   datasetlabels = ["serial", "parallel"], 
   labels = ["","time in seconds"]);

MaplePrimes - Grid Computing Posts and Questions

Grid computing toolbox on local multi-core server

Grid Toolbox C Compile

How to use the Grid Computing Toolbox with Groebner?

Grid Setup - HPC

Money Ball with Maple: How to Optimize the 2013 Blue Jays Lineup So They Can Start Winning