Axel made some good points. I believe that his comment on Theta being uninitialized is mistaken, however. I too am not a big fan of 2Dmath for authoring maple programs. I hope that my conversion of to 1dmath input is correct. Here are some suggestions and comments. - Use float[8] an complex[8] datatypes, if possible. This helps avoid temporary memory allocation by pushing some of the computation out to compiled external code. - Initialize x, R, and Theta using the initializer available within the Matrix constructor. This is more efficient than initializing them after creation, with loops. - Vpq is enormous, requiring >800MB of memory with complex[8] datatype. This may not be avoidable. More on this at the end. - Rpq is as big as Vpq, but it is not needed. I understand that Vpq may be wanted, after the work is done. (It could be returned, with moments, instead of being a global.) But it seems that only two or three entries of Rpq are needed at any time within the inner part of the double-loop at the end of the code. So by getting rid of Rpq you might be able to save >800MB. - Use ArrayTools[Alias] instead of ArrayTools[Reshape]. The latter produces an unnecessary copy, while the former does not. - Use evalhf where possible. This gives you double-precision, and actually saves memory too. - In the double-loop to initialize pq, observe that the inner ("and") condition of q<=p can be dropped if the inner loop variable q is only allowed to go as high as p. - Use evalhf for all the computations done in the nasty, super expensive double loop at the end of the procedure. More on this later. - The moments Vector can be created by the muliplication. You don't need to create it, and then loop to assign to it. - Don't use A . B . C for efficient linear algebra. Make use of the LinearAlgebra commands directly. That saves at least one unnecessary copy, or perhaps two. Make sure you control these operations. Scale moments by coef, inplace after the muliplication. (The worst scenario would be scaling Vpq which is enormous by coef. Bad because it's so big, and horrible if it produces an enourmous scaled copy.) For the dim=200,pmax=70,qmax=70 case this code below took about 830MB and 3h20min on a fast P4. Please check that I didn't make mistakes in the translation. You can try running smaller examples against original and mine. If I did make mistakes then they can likely be corrected while remaining efficient. Most of my changes are in somewhat independent chunks. I tried to check that moments[1] stayed the same, but I didn't test Vpq results. zernike:=proc(dim,pmax,qmax,image) local i,j,y,kmax,k,r,t,Z,Ztot,coef,exponente,size,p,q,pq,s, H1,H2,H3,im,imtot,Theta,R,Image,Vpq,moments,x, Rpqis,Rpqis1,Rpqis2,bainit,st,oldDigits; st,bainit := time(),kernelopts(bytesalloc); # Raise Digits during initialization of x, R, and Theta. oldDigits,Digits := Digits,max(Digits,trunc(evalhf(Digits))); x:=Matrix(dim,1,(i)->evalf(((sqrt(2)*(i-1)))/((dim-1))-1/(sqrt(2))), datatype=float); y:=x; R:=Matrix(dim,dim,(j,i)->evalf(sqrt(x[i,1]^2+y[dim-j+1,1]^2)), datatype=float); Theta:=Matrix(dim,dim,(j,i)->evalf(arctan(y[dim-j+1,1],x[i,1])), datatype=float); Digits := oldDigits; if type(pmax,odd)=true then size:= 1/4*(pmax+1)*(pmax+3); pq:=Matrix(size,2,datatype=float); else size:=1/4*(pmax+2)^2; pq:= Matrix(size,2, datatype=float ); end if; userinfo(1,`zernike`,`stage 1`, time()-st, kernelopts(bytesalloc)-bainit ); i:=0; for p from 0 to pmax do for q from 0 to p do if type(p-abs(q),even)=true then i:= i+1; pq[i,1]:= p; pq[i,2]:= q; end if; end do; end do; userinfo(1,`zernike`,`stage 2`, time()-st, kernelopts(bytesalloc)-bainit ); R:=ArrayTools[Alias](R,[dim^2,1]); Theta:=ArrayTools[Alias](Theta,[dim^2,1]); Image:=ArrayTools[Alias](image,[dim^2]); Vpq:=Matrix(size,dim^2,datatype=complex(float)); userinfo(1,`zernike`,`stage 3`, time()-st, kernelopts(bytesalloc)-bainit ); Rpqis1,Rpqis2:=0.0,0.0; for s from size by -1 to 1 do for i to dim^2 do if pq[s,1]=pq[s,2] then Rpqis:=evalhf( R[i,1]^(pq[s,1]) ); elif pq[s,1]-pq[s,2]=2 then Rpqis:= evalhf( pq[s,1]*R[i,1]^(pq[s,1]) - (pq[s,1]-1)*R[i,1]^((pq[s,1]-2)) ); else H3:= evalhf( (-4*(pq[s+2,2]-2)*(pq[s+2,2]-3))/ ((pq[s,1]+pq[s+2,2]-2)*(pq[s,1]-pq[s+2,2]+4)) ): H2:= evalhf( (H3*(pq[s,1]+pq[s+2,2])*(pq[s,1]-pq[s+2,2]+2))/ (4*(pq[s+2,2]-1))+(pq[s+2,2]-2) ): H1:= evalhf( (pq[s+2,2]*(pq[s+2,2]-1))/2 -pq[s+2,2]*H2+ (H3*(pq[s,1]+pq[s+2,2]+2)*(pq[s,1]-pq[s+2,2]))/8 ): Rpqis:= evalhf( H1*Rpqis2+(H2+H3/(R[i,1]^2))*Rpqis1 ): Rpqis2,Rpqis1:=Rpqis1,Rpqis; end if; Vpq[s,i]:= evalhf( Rpqis*(exp(I*pq[s,2]*Theta[i,1])) ); end do; coef:=evalhf( ((2*pq[s,1]+2))/(Pi*((dim))^2) ); end do; userinfo(1,`zernike`,`stage 4`, time()-st, kernelopts(bytesalloc)-bainit ); moments := LinearAlgebra[MatrixVectorMultiply](Vpq,Image); LinearAlgebra[VectorScalarMultiply](moments,coef,inplace=true); userinfo(1,`zernike`,`stage 5`, time()-st, kernelopts(bytesalloc)-bainit ); return moments, Vpq; end proc: kernelopts(printbytes=false): #(dim,pmax,qmax):=200,70,70: (dim,pmax,qmax):=20,7,7; # Just for testing. mI := LinearAlgebra[RandomMatrix](dim,dim,outputoptions=[datatype=complex(float)]): # Let it be verbose, about how it's doing. infolevel[`zernike`]:=1: solmom,solVpq:=zernike(dim,pmax,qmax,mI): solmom[1]; Now, what can be said about this double loop, with something like 1200 and 10000 iterations at each level? That's where almost all of the 3hr20min of computation time goes. If it didn't have exp(I*pq[s,2]*Theta[i,1]) in it you could stick it inside its own procedure and run Compiler[Compile] against it. You still could do that, actually, by wrapping an eval() around that problematic line involving nonreal, complex computation. But then memory usage might go up high. I'm not sure how much garbage collection is possible when a Compiled routine runs. In order to try this, you would have to use ArrayTools[ComplexAsFloat] on Vpq. I was thinking something like this... first, get rid of the double loop at the end of zernike, and instead make that just one line like, say, coef := stage4_compiled(dim,size,pq,R,real_Vpq,Theta); And then, outside of zernike, have something like all this. st4_r := proc(x,y,z) local temp; temp := x*exp(I*y*z); return Re(temp); end proc: st4_i := proc(x,y,z) local temp; temp := x*exp(I*y*z); return Im(temp); end proc: stage4 := proc(dim::integer,size::integer,pq::Matrix(datatype=float[8]),R::Matrix(datatype=float[8]),Vpq::Matrix(datatype=float[8]),Theta::Matrix(datatype=float[8])) local Rpqis, Rpqis1, Rpqis2, H1, H2, H3, s, i, coef; global st4_r,st4_i; Rpqis1,Rpqis2:=0.0,0.0; for s from size by -1 to 1 do for i to dim^2 do if pq[s,1]=pq[s,2] then Rpqis:= R[i,1]^(pq[s,1]); elif pq[s,1]-pq[s,2]=2 then Rpqis:= pq[s,1]*R[i,1]^(pq[s,1]) - (pq[s,1]-1)*R[i,1]^((pq[s,1]-2)); else H3:= (-4*(pq[s+2,2]-2)*(pq[s+2,2]-3))/ ((pq[s,1]+pq[s+2,2]-2)*(pq[s,1]-pq[s+2,2]+4)): H2:= (H3*(pq[s,1]+pq[s+2,2])*(pq[s,1]-pq[s+2,2]+2))/ (4*(pq[s+2,2]-1))+(pq[s+2,2]-2): H1:= (pq[s+2,2]*(pq[s+2,2]-1))/2 -pq[s+2,2]*H2+ (H3*(pq[s,1]+pq[s+2,2]+2)*(pq[s,1]-pq[s+2,2]))/8: Rpqis:= H1*Rpqis2+(H2+H3/(R[i,1]^2))*Rpqis1: Rpqis2,Rpqis1:=Rpqis1,Rpqis; end if; Vpq[2*s-1,i]:= eval(st4_r(Rpqis,pq[s,2],Theta[i,1])); Vpq[2*s,i]:= eval(st4_i(Rpqis,pq[s,2],Theta[i,1])); end do; coef:= ((2*pq[s,1]+2))/(Pi*((dim))^2); end do; return coef; end proc: stage4_compiled:=Compiler[Compile](stage4); You'd almost certainly want to check that I hadn't done the wrong thing, with the assignments into Vpq above. You might also try to run CodeGeneration[C] on something like stage4, and replacing the exp call by a dummy unrecognized name. Then afterwards you could replace that dummy in the produced C souce with a call to cexp(). Then you could compile it into a dynamic library outside of Maple. And then replace all the stuff for it in the example by setting up a Maple procedure to point to the compiled external function using define_external(). acer