You wrote of transmission0 and reflection0, that, "all they contain are simple floats."
As described, however, the code is generating a great many so-called software floats. Maple will dispose of those, once no longer needed, during garbage collection. But that extra garbage collection adds to the total overhead. If you really just need the Matrices to store floating-point numbers, and if hardware double-precision is sufficient, then consider placing an appropriate datatype on the Matrices.
For example,
> transmission0:=Matrix(1200,2,datatype=float[8]):
and similarly for reflection0, re, and te.
You might also be able to fill the first column of both transmission0 and reflection0 more quickly, by pulling the assignments (into their first columns) out of the loop. Completely outside of the loop, create a Vector,
> firstcol := Vector(1200,(j)->0.0416666666*j,datatype=float[8]):
and also do, outside the loop,
> transmission0[1..1200,1] := firstcol:
> reflection0[1..1200,1] := firstcol:
I hope I got that right.
There might be more efficient ways to initialize the second column too, but it's not possible to say without knowing how rcwa writes to te and re depending on M and how M changes with m.
Also, if you made rcwa accept re and te as parameters (it should still be able to update them efficiently, inplace, if they had float[8] datatype), while also making rcwa return M instead of writing to it as a global, then perhaps you could then run rcwa under the faster purely floating-point evalhf interpreter. See the help-page ?evalhf for details.
acer

This functionality appears to be new to Maple 10. I didn't find it under the ?updates help-pages.
Previously, map() would return a new object that had no indexing function. And so it given a Matrix created with shape=identity it would return a new Matrix with no such indexing function and would do the mapping as requested.
It's clearly tricky to get this functionality just right, so that everyone's happy. For example one might not want the idiosyncratic behaviour of LinearAlgebra:-Map, which can act twice as it walks the elements of a Matrix with symmetric indexing function. And it seems that map is now aware of the symmetric indexing function, and only operates once per pair of symmetrically equal entries.
But is map aware of all built-in system indexing functions? For example, is this right below?
> m := Matrix(2,2,[[0,2],[-2,0]],shape=skewsymmetric,storage=rectangular);
[ 0 2]
m := [ ]
[-2 0]
> newm := map(x->x+1,m);
[ 0 3]
newm := [ ]
[-1 0]
> lprint(newm);
Matrix(2,2,{(1, 2) = 3, (2, 1) = -1},datatype = anything,storage = rectangular, order = Fortran_order,shape = [skewsymmetric])
And surely map will not know what to do with user-defined indexing functions. So it may not just be empty-storage indexing functions which give strange results.
Is it inconsistent for map to be clever about the symmetric indexing function but not about the identity?
acer

There are a number of possible efficiency improvements that might help you, but it depends on the answers to a few questions.
When you speak of a "larger data set" do you mean simply more iterations, or Matrices larger than 2x2? If this is really not a pared down version of your program, then there are considerable savings to be had. But I'll mention some things which are more generally true as well, just in case you don't intend to work with the 2x2 case.
I couldn't tell, about whether the code was producing what you really wanted. The Matrix p, for example, seems to stay the same at each iteration. Maybe it depends on whether you plan to change B and L.
First of all, it does look like there's a memory leak in high precision operations done on Matrices (rtables) with datatype=sfloat. Ie, software floating-point Matrices. One way to get around that would be to use linalg and lower-case matrix instead of LinearAlgebra and upper-case Matrix. Even if you don't try that, try not to mix and match the two, for example when you call linalg[exponential] on -B which is a Matrix. So try not to put convert() calls inside your loop in order to mix linalg and LinearAlgebra, as that's an expense which can be relieved by creating the appropriate object in the first place.
Second, a great deal of the computation time is taken up in function calls. Your loop is going to have 100000-300000 iterations, after all. So if you do use LinearAlgebra functions within your loop then make sure that you use, say, MatrixMatrixMultiply instead of `.`, since that is more direct, and saves a few function calls. (I think that `.` can call `rtable/Product` which can call LinearAlgebra:-Multiply which can call LinearAlgebra:-MatrixMatrixMultiply, or something like that.) And don't just use LinearAlgebra:-MatrixMatrixMultiply, but save yet another function call and use LinearAlgebra:-LA_Main:-MatrixMatrixMultiply, where all the optional parameters are specified precisely.
By the way, does it matter to you that your calls to stats[random], as written, will not produce numbers with more digits, even when you raise Digits? If it does matter to you, then you could replace,
uniformVariate := random[uniform]();
inside the loop as follows. Outside the loop, do,
randgen := random[uniform]('generator'[Digits]);
and then inside the loop do,
uniformVariate := randgen();
Now, let's assume that your 2x2 code is really the simple case that you want. You can simply "unroll" a lot of the matrix arithmetic.
Even if you don't just want the 2x2 case, observe that p could be created as a row Vector instead of a Matrix, and eps could be a column Vector. Using Vectors for those, then some of the calculations would naturally produce scalar values, and you could drop a lot of the [1,1] indexing of elements in the code.
But let's look at the objects more closely. The Matrix -B is diagonal, so MatrixExponential(-B,t) is just going to produce,
Matrix(2,2,[[exp(-B[1,1]*t),0],[0,[exp(-B[2,2]*t)]])
The MatrixExponential routine may recognize your Matrix as being diagonal, and produce this result without computing eigenvalues. But you can save more function calls by forming this result directly.
And of course the multiplications can be unrolled as elementwise sums of products. If the MatrixExponential result really does have zeroes in it, then quite a few of the terms in the sums will vanish. No need to make Maple multiply 0 times some software float a few hundreds of thousands of times.
By the way, with so much of the cost being in function calls, it's not even going to matter so much whether any external Matrix operations are done in hardware double precision or software extended precision.
Here's a version of the code, with everything in the loop done explicitly in scalar arithmetic. This does 100000 iterations in about six minutes on a fast machine. The resident memory stays below 20MB. There is some small leakage due to the operations on software float Matrices done outside the loop.
restart:
with(linalg):
with(LinearAlgebra):
with(stats):
Digits := 30;
randomize();
randgen := random[uniform]('generator'[Digits]);
lambda := 5.0;
c := 1.0;
Gamma := 0.0; #Fix Gamma for correlation decay slope
P := 1/2 + (1/2)*(sqrt( (c^2 - 1)/(c^2 + 1) ));
p := Vector[row](2,[P, (1-P)],datatype=float);
B := Matrix(2,2,[[lambda*2*P, 0],[0, lambda*2*(1-P)]],datatype=float);
MB := Matrix(-B,datatype=anything);
mb := convert(MB,matrix);
eps := Vector[column](2,1,datatype=float);
GDiag := Matrix(2,2,[[Gamma,0],[0,Gamma]],datatype=float);
V := MatrixInverse(B);
Y := (1-Gamma)*eps.p + GDiag;
L := MatrixMatrixMultiply(B,Y);
mean := (p.V.eps);
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
squaredCoeffOfVar := var/(mean^2);
numerator := (p.(V^2).eps) - ( (p.V.eps))^2 ;
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
lag1CorrCoeff := (numerator/var)*(Gamma^1);
####################################
TEMPMATRIX1 := MatrixExponential(MB,t).eps;
pNew := Vector[row](2,datatype=float);
temp1 := Matrix(2,2,datatype=float);
temp2 := Matrix(2,2,datatype=float);
for i from 1 to 10000 do
uniformVariate := randgen();
fun := p[1]*TEMPMATRIX1[1] + p[2]*TEMPMATRIX1[2] = (1.0 - uniformVariate);
nextIAT := fsolve(fun,t,0..infinity);
############# compute entrance vector for next event ###########
temp1[1,1],temp1[2,2] := exp(mb[1,1]*nextIAT),exp(mb[2,2]*nextIAT);
temp2[1,1],temp2[2,1],temp2[1,2],temp2[2,2] :=
temp1[1,1]*L[1,1], temp1[2,2]*L[2,1],
temp1[1,1]*L[1,2], temp1[2,2]*L[2,2];
pNew[1],pNew[2] := p[1]*temp2[1,1]+p[2]*temp2[2,1],
p[1]*temp2[1,2]+p[2]*temp2[2,2];
temp2 := pNew[1]*eps[1]+pNew[2]*eps[2];
p[1],p[2] := pNew[1]/temp2,pNew[2]/temp2;
if( is( (i mod 500)=0) )
then
printf("Processing arrival: %f \n",i);
fi;
end do:
Here's a version using the LinearAlgebra:-LA_Main routines inside the loop. This is faster than the original by several times, but still exhibits the memory leak in Maple 10 when Digits is greater than trunc(evalhf(Digits)). Look at this if you're going to be changing things to be larger than 2x2. Sorry if my indentation gets lost.
restart:
with(linalg):
with(LinearAlgebra):
with(stats):
Digits := 30;
randomize();
randgen := random[uniform]('generator'[Digits]);
lambda := 5.0;
c := 1.0;
Gamma := 0.0; #Fix Gamma for correlation decay slope
P := 1/2 + (1/2)*(sqrt( (c^2 - 1)/(c^2 + 1) ));
p := Vector[row](2,[P, (1-P)],datatype=float);
B := Matrix(2,2,[[lambda*2*P, 0],[0, lambda*2*(1-P)]],datatype=float);
MB := Matrix(-B,datatype=anything);
eps := Vector[column](2,1,datatype=float);
GDiag := Matrix(2,2,[[Gamma,0],[0,Gamma]],datatype=float);
V := MatrixInverse(B);
Y := (1-Gamma)*eps.p + GDiag;
L := MatrixMatrixMultiply(B,Y);
mean := (p.V.eps);
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
squaredCoeffOfVar := var/(mean^2);
numerator := (p.(V^2).eps) - ( (p.V.eps))^2 ;
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
lag1CorrCoeff := (numerator/var)*(Gamma^1);
####################################
TEMPMATRIX1 := MatrixExponential(MB,t).eps;
for i from 1 to 10000 do
uniformVariate := randgen();
fun := p[1]*TEMPMATRIX1[1] + p[2]*TEMPMATRIX1[2] = (1.0 - uniformVariate);
nextIAT := fsolve(fun,t,0..infinity);
############# compute entrance vector for next event ###########
pNew := LA_Main:-VectorMatrixMultiply(
p,
LA_Main:-MatrixMatrixMultiply(
LA_Main:-MatrixExponential(
MB,
nextIAT,
'outputoptions'=[]),
L,
'inplace'='false',
'outputoptions'=[]),
'outputoptions'=[]);
pNewNormalized := LA_Main:-VectorScalarMultiply(
pNew,
1/LA_Main:-DotProduct(
pNew,
eps,
NoUserValue,
'conjugate'='false'),
'inplace'='false',
'outputoptions'=[]);
p := pNewNormalized;
if( is( (i mod 500)=0) )
then
printf("Processing arrival: %f \n",i);
fi;
end do:
Here's a mix. This version has the explicit elementwise arithmetic within the loop, but without assumptions about the structure of B and L. Hence the sums of products have all their terms in them. Also, it uses linalg[exponential] inside the loop, since like elementwise access that too doesn't elicit the memory leak problem. It takes about ten minutes for 100000 iterations, and the resident memory use stays below 20MB. There is some small leakage due to the operations on software float Matrices done outside the loop.
restart:
with(linalg):
with(LinearAlgebra):
with(stats):
Digits := 30;
randomize();
randgen := random[uniform]('generator'[Digits]);
lambda := 5.0;
c := 1.0;
Gamma := 0.0; #Fix Gamma for correlation decay slope
P := 1/2 + (1/2)*(sqrt( (c^2 - 1)/(c^2 + 1) ));
p := Vector[row](2,[P, (1-P)],datatype=float);
B := Matrix(2,2,[[lambda*2*P, 0],[0, lambda*2*(1-P)]],datatype=float);
MB := Matrix(-B,datatype=anything);
mb := convert(MB,matrix);
eps := Vector[column](2,1,datatype=float);
GDiag := Matrix(2,2,[[Gamma,0],[0,Gamma]],datatype=float);
V := MatrixInverse(B);
Y := (1-Gamma)*eps.p + GDiag;
L := MatrixMatrixMultiply(B,Y);
mean := (p.V.eps);
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
squaredCoeffOfVar := var/(mean^2);
numerator := (p.(V^2).eps) - ( (p.V.eps))^2 ;
var := 2*(p.(V^2).eps) - ( (p.V.eps) )^2;
lag1CorrCoeff := (numerator/var)*(Gamma^1);
####################################
TEMPMATRIX1 := MatrixExponential(MB,t).eps;
pNew := Vector[row](2,datatype=float);
temp2 := Matrix(2,2,datatype=float);
for i from 1 to 10000 do
uniformVariate := randgen();
fun := p[1]*TEMPMATRIX1[1] + p[2]*TEMPMATRIX1[2] = (1.0 - uniformVariate);
nextIAT := fsolve(fun,t,0..infinity);
############# compute entrance vector for next event ###########
temp1 := exponential(mb,nextIAT);
temp2[1,1],temp2[2,1],temp2[1,2],temp2[2,2] :=
temp1[1,1]*L[1,1]+temp1[1,2]*L[2,1],
temp1[2,1]*L[1,1]+temp1[2,2]*L[2,1],
temp1[1,1]*L[1,2]+temp1[1,2]*L[2,2],
temp1[2,1]*L[1,2]+temp1[2,2]*L[2,2];
pNew[1],pNew[2] := p[1]*temp2[1,1]+p[2]*temp2[2,1],
p[1]*temp2[1,2]+p[2]*temp2[2,2];
temp2 := pNew[1]*eps[1]+pNew[2]*eps[2];
p[1],p[2] := pNew[1]/temp2,pNew[2]/temp2;
if( is( (i mod 500)=0) )
then
printf("Processing arrival: %f \n",i);
fi;
end do:
The original posted version took over 2.5 minutes to do only 10000 iterations instead of 100000, and the resident memory use had already gotten to 750MB (while Maple's incorrectly showed bytesused as 11MB).
I'll leave off here. Be sure to check my work, before relying on it. We all can make mistakes. Hopefully some of the comments are more generally useful as well.