Actually, I would be interested to hear if there are any updates for Maple 12 (or future versions?) concerning the use of optimized external libraries. Moreover, I would appreciate some comments on the equivalent "hacks" for Windows of the above Linux "hacks". Are there possibilities to make better use of the now widespread multi-core processors? Thanks for any comments...

Yes, I see the idea. This is will be helpful when working with Matrices/Arrays. However, I seem to have some problems with accessing an external Fortran library from Maple under Windows. Even simple examples like adding two complex number and returning the sum's absolute value seem to be difficult and require the 'WRAPPER' keyword in 'define_external'.

E.g. consider the Fortran code

DOUBLEPRECISION function f (a, b)

!DEC$ ATTRIBUTES DLLEXPORT::f

COMPLEX*8 :: a, b

f = cabs(a+b)

return

end function f

which I compile to a DLL library.

Then I start Maple from the Visual C++ command prompt (to make sure all environment variables for the C compiler are set).

I try to access the function from within Maple as follows:

fortran_adder := define_external(

'myadder', 'FORTRAN', 'WRAPPER',

LIB="C:\\shared\\myadder.dll",

'a'::complex[8],

'b'::complex[8],

'RETURN'::float[8],

INC_PATH="C:\\Program Files\\Maple 12\\extern\\include");

Whatever I do (with or without WRAPPER option) I get errors claiming that Maple tries to load the C runtime library...

I must be missing something. With simple C/C++ examples it seemed to work better...

Yes, I see the idea. This is will be helpful when working with Matrices/Arrays. However, I seem to have some problems with accessing an external Fortran library from Maple under Windows. Even simple examples like adding two complex number and returning the sum's absolute value seem to be difficult and require the 'WRAPPER' keyword in 'define_external'.

E.g. consider the Fortran code

DOUBLEPRECISION function f (a, b)

!DEC$ ATTRIBUTES DLLEXPORT::f

COMPLEX*8 :: a, b

f = cabs(a+b)

return

end function f

which I compile to a DLL library.

Then I start Maple from the Visual C++ command prompt (to make sure all environment variables for the C compiler are set).

I try to access the function from within Maple as follows:

fortran_adder := define_external(

'myadder', 'FORTRAN', 'WRAPPER',

LIB="C:\\shared\\myadder.dll",

'a'::complex[8],

'b'::complex[8],

'RETURN'::float[8],

INC_PATH="C:\\Program Files\\Maple 12\\extern\\include");

Whatever I do (with or without WRAPPER option) I get errors claiming that Maple tries to load the C runtime library...

I must be missing something. With simple C/C++ examples it seemed to work better...

Thanks for pointing that out. I should have noticed that and it is also documented... Although it would have been nice if the CodeGeneration command would respect the explicit definition of my argument and result types as complex[8] and not complex[4]!

However, even after correcting that mistake it seems one has to fidddle around with some wrapper generation as discussed below.

Thanks for pointing that out. I should have noticed that and it is also documented... Although it would have been nice if the CodeGeneration command would respect the explicit definition of my argument and result types as complex[8] and not complex[4]!

However, even after correcting that mistake it seems one has to fidddle around with some wrapper generation as discussed below.

Yes, you're right. For personal use I also prefer to have code in text files. However, I was unaware of the $include or macro facility. So thanks for that hint.

Concernig the Digits variable, I will have to live with the fact that I cannot change it when loading my package. Would have been nice, but it's not a tragedy...

Yes, you're right. For personal use I also prefer to have code in text files. However, I was unaware of the $include or macro facility. So thanks for that hint.

Concernig the Digits variable, I will have to live with the fact that I cannot change it when loading my package. Would have been nice, but it's not a tragedy...

Ok, thanks for clarifying. As you mentioned before, it might be anyway more worthwile to identify the real Maple bottlenecks in one's code.

It is interesting to know that 32bit vs 64bit really makes such a difference. I actually didn't expect that...

I read your note www.mapleprimes.com/blog/dave-l/blas-in-maple#comment-7274

and I already use OMP_NUM_THREADS=2. But there you also mention the 'unofficial' upgrade to BLAS (under Linux). Is there a special reason why this should not be possible under Windows? If not, how could I use BLAS as the standard library in Maple? By the way, thanks for all thes nice insider tricks! :-)

Just for curiosity, why is the above code so much slower on my laptop (Core2 2.33GHz, 2GB RAM, 32bit WinXP).

restart;
Digits := trunc(evalhf(Digits));
with(LinearAlgebra):
TotalTime:=0:
for i from 1 by 1 to 100 do a:=RandomMatrix(1500,density=0.5,generator=0.0..1.0,outputoptions=[datatype=float[8]]):
t:=time():
#Map(erf,a):
map[evalhf,inplace](erf,a):
TotalTime:=TotalTime+time()-t:
end do:
print(TotalTime);

yields 49.266 sec. As the processor should be roughly comparable I wonder if it's the 64bit version that gives you a speed up of more than a factor 2!?

I also wanted to try this on my WinXP installation. When adapting the path LIB="c:/.../libgsl.dll" to my copy of the GSL library, it looks Maple finds the "libgsl.dll". However, it returned some error about not being able to find the "libgslcblas.dll" which is present in the same directory. Could somebody give a hint what I'm doing wrong here? Thanks!

EDIT: OK, sorry... I just realized that location of libgslcblas.dll has to be in the system PATH variable or the worksheet and libgslcblas.dll have to be in the same directory.

As I am one of those people who have profited tremendously from Acer's experience and his forum replies I think this award is really justified. Acer's posts often show techniques that are of practical use and also help to gain a better general understanding of how Maple works and how to make efficient use of this great tool.
So, thank you Acer and keep it up! :-)

This sounds quite interesting, but also complicated. It seems one really has to dive deep into Maple's guts...
Do have some estimate how much time this would save? I mean there should be a balance between readability/maintainability of the code on the one hand and possible performance gains on the other hand.
Whatever the answer may be, the measures you explained above were already very interesting and helpful (also for understanding better the (performance) differences between Maple and Matlab).

This sounds quite interesting, but also complicated. It seems one really has to dive deep into Maple's guts...
Do have some estimate how much time this would save? I mean there should be a balance between readability/maintainability of the code on the one hand and possible performance gains on the other hand.
Whatever the answer may be, the measures you explained above were already very interesting and helpful (also for understanding better the (performance) differences between Maple and Matlab).

by using your code (including the the above msm and ExtMSM tweaks) I ran again the whole Parametrize_SU_Euler procedure that now uses the optimized code. Unfortunately, even "only" 1000 8x8 matrices take 44 sec (where Matlab needs 11.76 sec for 10,000(!) 8x8 matrices). The profiling below shows that the main chunk is of course taken by the newly optimized matrix_exp() funtion which was called 63,000 times.
So I guess the (relatively) poor performance is due to the many procedure calls and I cannot do anything further about that, right?
```
Parametrize_SU_Euler := proc(N::posint, params)
local lambda, used_lambdas, temp, i, j, m, l, k, A, alpha, X, param_list, param_ranges, U, a, temp1, temp2, s, RESMAT, workuaf, onevec, PROD, mmm, msm, old_hardware_floats;
|Calls Seconds Words|
PROC | 1000 44.001 297621258|
1 | 1000 0.000 1836| if params::list then
2 | 0 0.000 0| param_ranges := evalf(Feynman_parameters("SU","Euler angles",N));
3 | 0 0.000 0| if not nops(param_ranges) = nops(params) then
4 | 0 0.000 0| error `: incorrect number of parameters! Expected a list of`, nops(param_ranges), ` parameters.`
end if;
5 | 0 0.000 0| alpha := params
elif params = "random" then
6 | 1000 0.921 681478| X := Statistics[RandomVariable](Uniform(0,1000));
7 | 1000 0.609 5121203| alpha := convert(Statistics[Sample](X,N^2-1),list)
else
8 | 0 0.000 0| error `: Either a list of numerical parameters is expected as second argument or the keyword "random".`
end if;
9 | 1000 0.078 303927| lambda := Feynman_hermitian_basis(N,"float");
10 | 1000 0.000 31963| j := m -> (N+m-1)*(N-m);
11 | 1000 0.031 749337| used_lambdas := Vector(N^2-1,datatype = integer);
12 | 1000 0.000 0| i := 1;
13 | 1000 0.000 0| for m from N by -1 to 2 do
14 | 7000 0.032 0| for k from 2 to m do
15 |28000 0.080 232101| used_lambdas[i] := 3;
16 |28000 0.062 242550| used_lambdas[i+1] := (k-1)^2+1;
17 |28000 0.000 0| i := i+2
end do
end do;
18 | 1000 0.000 0| for m from 2 to N do
19 | 7000 0.000 113343| used_lambdas[i] := m^2-1;
20 | 7000 0.000 0| i := i+1
end do;
21 | 1000 0.047 1020696| a := Matrix(N,N,datatype = ('complex')('float'));
22 | 1000 0.079 821553| temp1 := Matrix(N,N,datatype = ('complex')('float'));
23 | 1000 0.031 820575| temp2 := Matrix(N,N,datatype = ('complex')('float'));
24 | 1000 0.000 822536| s := Matrix(N,N,datatype = ('complex')('float'));
25 | 1000 0.062 818949| RESMAT := Matrix(N,N,datatype = ('complex')('float'));
26 | 1000 0.062 402396| workuaf := Vector[row](N,('datatype') = ('float'));
27 | 1000 0.079 513260| onevec := Vector(N,('fill') = 1.0,('datatype') = ('complex')('float'));
28 | 1000 0.047 820535| PROD := Matrix(N,N,datatype = ('complex')('float'));
29 | 1000 0.406 3058690| U := LinearAlgebra:-IdentityMatrix(N,N,compact = false,outputoptions = [datatype = complex[8]]);
30 | 1000 0.030 862553| temp := Matrix(N,N,datatype = complex[8]);
31 | 1000 0.015 2097| kernelopts(opaquemodules = false);
32 | 1000 0.032 4258| mmm := LinearAlgebra:-LA_Main:-LA_External:-MatrixMatrixMultiply;
33 | 1000 0.000 1300| msm := LinearAlgebra:-LA_Main:-LA_External:-MatrixScalarMultiply;
34 | 1000 0.000 0| old_hardware_floats := UseHardwareFloats;
35 | 1000 0.000 14923| UseHardwareFloats := true;
36 | 1000 0.157 54| for k to op(1,used_lambdas) do
37 |63000 0.278 981181| ArrayTools:-Copy(N*N,lambda[used_lambdas[k]],0,1,temp,0,1);
38 |63000 2.325 8443036| msm(temp,I*alpha[k]);
39 |63000 36.270 248603941| matrix_exp(temp,a,temp1,temp2,s,workuaf,onevec,RESMAT);
40 |63000 2.096 21058532| mmm(U,RESMAT,PROD,('inplace') = false);
41 |63000 0.172 1072455| ArrayTools:-Copy(N*N,PROD,0,1,U,0,1)
end do;
42 | 1000 0.000 0| UseHardwareFloats := old_hardware_floats;
43 | 1000 0.000 0| return U
end proc
```