MaplePrimes - answers and comments on Question, Advice on profiling?

LinearAlgebra efficiency

Robert Israel — Thu, 10 Jan 2008 00:09:05 Z

LinearAlgebra does make extensive use of compiled external code, the NAG routines. In fact, this might be connected to the troubles you're having in profiling. See the help page "Efficient Numeric Linear Algebra Computation" (?EffNumLA) for pointers on how to make your code more efficient. In particular, it may make a significant difference if you use datatype=float[8].

code sample

Dave L — Thu, 10 Jan 2008 00:39:36 Z

If you can post a portion of the code, that might give the best chances for useful advice. Dave Linder Mathematical Software, Maplesoft

Digits=14

DJKeenan — Thu, 10 Jan 2008 01:24:32 Z

Having Digits=14 does not guarantee that the computations are done with hardware arithmetic. The Digits value indicates how many digits of precision you would like in the result of a primitive computation. The computation itself might be done with, say, Digits+2 precision, in order to achieve Digits precision in the result. There are some Maple routines that increase Digits internally. I do not know if that is that case with the LinearAlgebra routines you are using. To force hardware floats, I think that you should set UseHardwareFloats=true and ensure that your Matrices/Vectors have datatype set appropiately. See ?EffNumLA for details. I do not have much experience with this. Perhaps others can suggest more.

Two example procedures...

quantum — Thu, 10 Jan 2008 03:58:32 Z

Thank you for your comments. I know that my question was/is rather general and therefore hard to answer. In principle I am aware of the following rules of thumb: - use appropriate datatypes in the matrices, e.g. float[8] or complex[8] - use programming layer commands of the LinearAlgebra package, i.e. LA_Main:-... - use cache/remember tables when suitable - use inlining (typically not possible) Below I'll show some sample procedures to give some idea...


#################################################################################
#################################################################################

Parametrize_SU_Euler := proc(N::posint, params)

#
# returns a NxN SU(N) unitary matrix as described by the given list of
# parameters (or the keyword "random").
# The procedure follows [Tilma, Sudarshan, J. Phys. A 35 (2002) 10467]
# (see Eq. (19))
#

options hfloat;

local lambda, used_lambdas, temp, i, j, m, l, k, A, alpha, X, param_list, param_ranges, U;

if params::list then
   #
   # if a list of parameters [alpha[1], ..., alpha[N^2-1]]
   # is given they are checked to be within the valid ranges.
   #
   param_ranges := evalf(Feynman_parameters("SU", "Euler angles", N));
   if not nops(param_ranges) = nops(params) then
      error(`\: incorrect number of parameters! Expected a list of`, nops(param_ranges),` parameters.`);
   end if;

   alpha := params;

elif params = "random" then
   #
   # if no explicit list of numerical parameters is given but
   # the key "random" then the necessary parameters are generated
   # randomly
   #
   X := Statistics[RandomVariable](Uniform(0,1000));
   alpha := convert(Statistics[Sample](X, N^2-1), list);


else
   error(`\: Either a list of numerical parameters is expected as second argument or the keyword "random".`)
end if;

#
# first, a list of the generalized Gell-Mann matrices is needed as
# a hermitian basis of the space of NxN matrices.
#
lambda := Hermitian_basis(N, "float");

#
# define auxiliary function j(m) as in the reference
#
#j := m -> piecewise(m=N, 0, sum(2*(m+l), l=0..N-m-1));
#
# actually the same but easier is the following
j := m -> (N+m-1)*(N-m);


#
# create a list of the lambda matrices that are actually used
# (in the order of later use, including multiple occurrences)
#
used_lambdas := Vector(N^2-1, datatype=integer):
i := 1;
for m from N to 2 by -1 do
   for k from 2 to m do
      used_lambdas[i]   := 3;
      used_lambdas[i+1] := (k-1)^2 + 1;
      i := i+2;
   end do:
end do:
for m from 2 to N do
   used_lambdas[i] := m^2-1;
   i := i+1;
end do:

temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(
              lambda[used_lambdas[1]], I*alpha[1],
              inplace=false, outputoptions=[datatype=complex[8]]);
U := LinearAlgebra:-LA_Main:-MatrixFunction(
              temp, exp(dummy), dummy, outputoptions=[datatype=complex[8]]);

for k from 2 to op(1,used_lambdas) do
   temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(
                 lambda[used_lambdas[k]], I*alpha[k],
                 inplace=false, outputoptions=[]);
   U := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(
                 U, LinearAlgebra:-LA_Main:-MatrixFunction(
                             temp, exp(dummy), dummy, outputoptions=[]),
                             inplace=false, outputoptions=[]);
end do:

end proc:

#################################################################################
#################################################################################


#################################################################################
#################################################################################


Partial_trace := proc(rho::'Matrix'(square), d::list(posint), trace_list::list(posint))

#
# returns the partial trace of a square (density) matrix with respect
# to one or more subsystems. 'd' is the list of the dimensions of each
# subspace, 'trace_list' is the list of the subsystem indices which are
# traced out.
# A matrix is returned
#

local rho_dim, i, j, k, d_in, d_out, in_list, out_list, U, reordered_rho,
      new_rho;

#
# check if dimension of the given matrix is compatible with the specified subspace
# dimensions
#
rho_dim := mul(d[x], x=1..nops(d));
if not LinearAlgebra[RowDimension](rho) = rho_dim then
   error(`\: The given matrix does not match the specified subspace dimensions.`);
end if;

if max(op(trace_list)) > nops(d) then
   error(`\: One (or more) of the specified target subspaces is invalid.`);
end if;


#
# reorder the subsystems so that the surviving ones (in ascending order)
# come first and the one to be traced out are last
#
out_list, in_list := selectremove(has, [seq(1..nops(d))], trace_list);
d_in  := mul(d[i], i=in_list);  # dimension of the remaining subspace
d_out := mul(d[i], i=out_list); # dimension of the subspace to be traced out

if not [op(in_list), op(out_list)] = [seq(i,i=1..nops(d))] then
   if rho[1,1]::complex[8] then
      U := Matrix(rho_dim, rho_dim, Permutation_matrix([op(in_list), op(out_list)], d), datatype=complex[8]);
      new_rho := Matrix(d_in, d_in, datatype=complex[8]);
   else
      U := Feynman_permutation_matrix([op(in_list), op(out_list)], d);
      new_rho := Matrix(d_in, d_in);
   end if;
   reordered_rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(U, rho, inplace=false, outputoptions=[]), LinearAlgebra:-LA_Main:-HermitianTranspose(U, inplace=false, outputoptions=[]), inplace=false, outputoptions=[]);
else
   reordered_rho := rho;
   if rho::'Matrix'(complex[8]) then
      new_rho := Matrix(d_in, d_in, datatype=complex[8]);
   else
      new_rho := Matrix(d_in, d_in);
   end if;
end if;

for i from 1 to d_in do
   for j from 1 to d_in do
      new_rho[i,j] := add(reordered_rho[(i-1)*d_out+k, (j-1)*d_out+k], k=1..d_out);
   end do;
end do;


return(new_rho);

end proc:


#################################################################################
#################################################################################

Problems become apparent, e.g. when I combine several of such procedures, often involving eigenvalues of matrices, in one higher-level command. When such a procedure is used e.g. in connection with the Optimization package or the Global Optimization Toolbox, then evalhf is typically not possible (as most commands involving matrices). In such scenarios, some procedures are are called very often. Then the bottlenecks really kick in...

a little more

acer — Sat, 12 Jan 2008 08:50:30 Z

A little more. LinearAlgebra:-LA_Main:-MatrixScalarMultiply( temp, I*alpha[1], inplace=true, outputoptions=[datatype=complex[8]]): in the final loop could be replaced by msm:=LinearAlgebra:-LA_Main:-LA_External:-MatrixScalarMultiply: msm(temp,I*alpha[1]): # safer is msm(temp,evalf(I*alpha[1])) where the assignment to `msm` is done right after that to `mmm`, before the loop begins. Also, there is a call to LinearAlgebra:-LA_Main:-MatrixScalarMultiply inside `matexp`. This could be set up to directly call an external function, just like is done for addition, norm, etc. The following lines could be added in `matexp` in the relevant places, and ExtMSM declared as a new local. ExtMSM := ExternalCalling:-DefineExternal('hw_f06jdf', extlib); ExtMSM := ExternalCalling:-DefineExternal('sw_f06jdf', extlib); Then the call in `matexp` LinearAlgebra:-LA_Main:-MatrixScalarMultiply(a, M, 'inplace' = 'true', 'outputoptions' = []); could be replaced by ExtMSM(n*n, M, a, 1); Together those give another 10%-15% or time savings over the orginal at size 16x16. Keep in mind that all this is deliberately bypassing a lot of sanity checks. The Matrices had better be complex[8] datatypes with full rectangular storage, or else it will crash and burn. It's not just garbage collection that slows Maple down for these computations. It's also the cost and overhead of Maple function calls, some of which have been avoided in the code I've posted on this example. Having Maple be a general system capable of exact or arbitrary precision floating-point computations brings with it the overhead of smart runtime selection of modes. Systems like Matlab don't necessarily have that sort of overhead, if they are primarily purely hardware double precision engines. There are alternative schemes for the general purpose system like Maple. For example, on-the-fly generation of code tailored for just a single mode of computations (exact, hardware, arbitrary precision, etc) is one possibility. Another possibility is making very low-level routines like the BLAS get direct, individual interfaces at the user level. acer

Perfect target for partial evaluation

JacquesC — Tue, 22 Jan 2008 06:53:07 Z

As I have mentionned before, this kind of inlining and optimization of Maple code can actually be automated via partial evaluation. Thanks for providing an additional test case.