Hello,
I would like to use Maple in Statistics, but I have a problem with a speed of base functions, like generation, Mean, Range and Standard deviation. Maple computes slowly than MathCad.
I have 4 test functions:
Test of generation speed
> restart:
>
> Digits :=10:
>
> test3:=proc(N,n)
> local i,a,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> for i from 1 to N do
> a:=Sample(X,n):
> end do:
> end use;
> end proc:
> time(test3(30000,100));
Test of Mean speed
> restart:
> Digits :=10;
>
> test4:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=Mean(a)
> end do:
> end use;
> end proc:
> time(test4(30000,100));
Test of StandardDeviation speed
> restart:
> Digits :=10;
>
> test5:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=StandardDeviation(a)
> end do:
> end use;
> end proc:
> time(test5(30000,100));
Test of Range speed
> restart:
> Digits :=10;
>
> test6:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=Range(a)
> end do:
> end use;
> end proc:
> time(test6(30000,100));
Times for N=30,000 , n=100
Maple - Mathcad
63,2 - 3,20
13,3 - 0,03
13,9 - 0,33
19,5 - 0,14
I think, that I do something bad. Could you help me please?
Thank you
Vladimir
Comments
Speed of dynamic argument checking in an interpreter
If you trace through what all Range does (for example), you'll see that it spends well over 90% of its time parsing its arguments, dispatching to the right routines (dynamically), setting up complicated options, before finally doing the computation. The other commands are likewise.
Basically you are feeling the cost of implementing a very feature-full package (like Statistics) in a dynamically-typed, interpreted language like Maple, where the implementation is done using a lot of (clever!) dynamic dispatch techniques and generic programming ideas.
MathCad is faster because it has fewer features, and those features are implemented in a compiled language. That means that, in the long run, Maple will gain more features faster than MathCad, but is likely to remain a factor of between 50 and 200 times slower on computations like the above. Unless of course those routines are re-implemented in the Maple kernel, at which time they will be competitive with MathCad's, but the development cost will be quite high (i.e. unlikely to happen soon, IMHO).
How should I trace it?
Hello,
your answer is very interesting for me. How should I trace the procedure to see what you say?
Is the following code the most usefull?
with(CodeTools[Profiling]):
Profile(test3):
PrintProfiles();
Well, How should I use Maple to solve statistics problem?
Thank you
Vladimir
Static and Dynamic trace
You should read my blog post on old timer techniques.
You can use printlevel or CodeTools[Profile]() to get results about absolutely everything (dynamic trace). This can be a lot of data.
Or you can look at the code. You'll need to dig around some, but these were what I did:
> with(Statistics): interface(verboseproc=3): kernelopts(opaquemodules=false):
> showstat(Range);
> showstat(Statistics:-Quantities:-GetValue);
> showstat(Statistics:-Quantities:-GetValueTab[DescriptiveStatistics]);
which was enough to show me all the overhead incurred. One can continue to dig too.
Why use kernelopts(opaquemodules=false): ?
Hello,
I forgot that I use verboseproc=3 to show code of procedure.
e.g.
> interface(verboseproc=3):
> print(dsolve);
I have already used showstat too.
But why should I use kernelopts(opaquemodules=false): ?
Thanks for a link to old timer techniques, too. It is very useful :o)
Vladimir
Most of Statistics is hidden
Some of the things inside Statistics are local variables of the respective modules, and you can't see their values unless you use kernelopts(opaquemodules=false).
Modules being opaque by default is very good from a software engineering point of view, as you do want code to be as opaque as possible (information hiding and all that good stuff). But when you are debugging, all these things seriously get in the way. In this respect, Maple strikes a really nice balance.
Thanks
Thanks a lot. It is very usefull for me.
Vladimir
One possibility is to
One possibility is to perform all of the the preprocessing once. This should make things a bit faster. Consider:
restart:
Digits :=10:
test3:=proc(N,n)
local i,a,X;
use Statistics in
X := RandomVariable(Normal(0,1)):
for i from 1 to N do
a:=Sample(X,n):
end do:
end use;
end proc:
time(test3(30000,100));
12.733
Note, that the call to Sample below creates a procedure for generating random samples drawn from the normal distribution. So it is not longer necessary to do the parameter processing over and over again.
restart:
Digits :=10:
test3:=proc(N,n)
local i,a,X, S;
use Statistics in
X := RandomVariable(Normal(0,1)): S := Sample(X);
for i from 1 to N do
a:=S(n):
end do:
end use;
end proc:
time(test3(30000,100));
1.446
Of course, it is even more efficient to do the simulation once.
restart:
Digits :=10:
test3:=proc(N,n)
local i,a,X, S;
use Statistics in
X := RandomVariable(Normal(0,1)): S := Sample(X);
a := ArrayTools:-Alias(S(N*n), [1..N, 1..n]);
end use;
end proc:
time(test3(30000,100));
0.268
restart:
Digits :=10;
test4:=proc(N,n)
local i,a,b,X;
use Statistics in
X := RandomVariable(Normal(0,1)):
a:=Sample(X,n):
for i from 1 to N do
b:=Mean(a)
end do:
end use;
end proc:
time(test4(30000,100));
10
7.126
Again, we can create a procedure for computing the mean of a data sample.
restart:
Digits :=10;
test4:=proc(N,n)
local i,a,b,X,M;
use Statistics in
X := RandomVariable(Normal(0,1)):
a:=Sample(X,n): M := MakeProcedure(Mean, sample);
for i from 1 to N do
b:=M(a);
end do:
end use;
end proc:
time(test4(30000,100));
10
0.277
Sample all values at once...
test3a :=proc(N,n) local i,a,X,A; use Statistics in X := RandomVariable(Normal(0,1)): A := ArrayTools:-Reshape(Sample(X,n*N),N,n); for i from 1 to N do a := A[i,1..-1]; end do: end use; end proc:This is an order of magnitude faster.I have read it.
Hello,
I have read your comments and thank you for it. But I posted this for my friend. I will try to rewrite the procedures. I thought that the slow is because of Statistics package.
Vladimir
apologies
Hi Vladimir,
I apologize for saying you weren't paying attention. Actually, considering that it took me a few minutes to recall what I had previously posted, I wouldn't be surprised if you had forgotten it. Note that, as Jacques points out, the Maple Statistics package is not going to be the fastest gun around. That being said, there are ways to make things go faster, so we need to learn how to utilize them. By creating all the samples at once, you only have to pay the overhead of invoking the Sample procedure one time.
What does it mean A[i,1..-1]?
Hi,
I do not know, what does A[i,1..-1] mean? Why 1..-1?
Thanks
Vladimir
negative indices in Matrix
In this case it is equivalent to A[i,1..n]. It is a convention used in rtable (Vectors, Matrices) for accessing all the elements of a row/column. You can also access a partial row in the same way: A[1,2..-2]. The -1 means the last value of the index, -2 the previous to last index, etc. This doesn't necessarily work in Arrays, which permit actual negative indices.
More efficiency.
Thank you for your help. If I understand well, I am not able to rewrite test4, test5 and tes6 procedure to get faster because I call the Sample(...) only once. Well, procedures of Statistics are very slow and then my procedures are slow as well. Is it right?
Thank you
Vladimir
memory efficiency
Keeping the total memory allocation down is often an important part of efficiency. But, also, keeping the memory usage (garbage production and cleanup) down can also leads to time savings.
I suspect that ArrayTools:-Reshape produce a full copy of the rtable argument, while ArrayTools:-Alias actually provides an alternate view of the same data in memory without making a copy. So, using Alias can be more efficient than using Reshape.
Also, in the test3 as given, it was not necessary to keep all of the instances of Vector a around. And indeed in Joe's test3a the Vector a was overwritten due to a being reassigned. But each assignment to a in test3a extracts a new row of A, producing a new Vector each time. Memeory usage can be reduced by instead forming the container for Vector a just once, and then doing a fast copy of the appropriate row of A into Vector a.
So, assuming that I coded it correctly,
test3b := proc(N,n)
local i,a,X,A;
use Statistics in
X := RandomVariable(Normal(0,1)):
A := ArrayTools:-Alias(Sample(X,n*N),[N,n],Fortran_order);
a := Vector[row](n,datatype=float,order=Fortran_order);
for i from 1 to N do
ArrayTools:-Copy(n,A,n*(i-1),1,a,0,1);
end do;
end use;
end proc:
Now, to measure the efficiency, let's also look at memory used and memory allocated, as well as time.
> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
18.419, 8518120, 899923608
> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3a(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.711, 56088544, 79456288
> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3b(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.571, 31844664, 35960072
Of course, what would be really nice would be to get the speed of test3b or test3a and the lower memory allocation (by at least a factor of four) of test3. This may not be currently possible. One way to get it might be with an enhancement to the Statistics:-Sample routine, so that it accepts as an optional argument the container Vector for the result. It could re-use the same Vector container, and fill it with new sample data, operating in-place. That could allow the total memory allocation to stay low, to generate only n pieces of data at any one time, but avoid production of a lot of Vectors as garbage.
It's a bit of a shame, that it isn't implemented in this way.
Since at default Digits=10 the Vectors have hardware double precision float[8] datatype, there wouldn't be garbage produced from all the entries as software floats.
ps. It may be that A should be C_order, according to how I set up the strides for ArrayTools:-Alias, so that it walks rows of A and not columns. It shouldn't affect the timings much.
acer
Undocumented Sample extension
While it doesn't improve the efficiency, and is undocumented, Statistics:-Sample permits a more general calling form than Sample(X, n); specifically, n can be a list of ranges, each specifying an rtable index. Thus test3b could be
test3c := proc(N,n) local i,a,X,A; uses Statistics; X := RandomVariable(Normal(0,1)): A := Sample(X,[1..N,1..n]); a := Vector[row](n,datatype=float,order=Fortran_order); for i from 1 to N do ArrayTools:-Copy(n,A,n*(i-1),1,a,0,1); end do; end proc:A has Fortran_order.
Sample documentation
That's quite neat, Joe.
Looking at the help-page, ?Statistics,Sample , I don't see where even the simpler invocation Sample(X) is shown.
When it is called as Sample(X), for a random variable X, then omitting the second parameter N will result a procedure being returned. That procedure can itself be polled, to return separate Vector samples. This usage appears in several of the examples higher up in this thread. The first person to show it may have been user "alex".
The ?Statistics,Sample help-page could be filled with more details on these otherwise hidden bits of functionality. I wonder what other eggs may be found in Statistics.
acer
User "alex"
is actually the original author of the Statistics package. Unfortunately, he has moved on to work in the Financial industry. I do not think he looks at mapleprimes anymore.
Improving Sample (passing an rtable)
While it is true that Sample does not currently allow passing in an rtable, the external code that fills an existing rtable with samples exists. It shouldn't be too hard for Sample to be extended. Here is a proof of concept illustrating that this can be achieved (I have to use kernelopts(opaquemodules=false) to permit accessing some local procedures).
SampleRtable := proc(X) local p,ds,opacity; description "return a procedure that fills an rtable with samples of X"; try opacity := kernelopts('opaquemodules' = false); p := Statistics:-ExternalSupport:-DefineExternal('MapleRandomSample'); # this is a hack, it can be extended to support full options of Sample ds := ( Statistics :-Simulation :-DefaultGenerator :-GetParameters )(X , table([('range') = 'deduce' ,('tolerance') = 1000 ,('updates') = 100 ,('basepoints') = 'deduce' ]) ); finally kernelopts('opaquemodules' = opacity); end try; proc(R) p(R, 'NULL', [ds], true) end proc; end proc: X := RandomVariable(Normal(2,3)): p := SampleRtable(X): R := rtable(1..3, datatype=float[8]): p(R); [0.561974199090320647, 4.35252109919989660, 0.498540275504532837]Here is a test4 that uses SampleRtable
test4 := proc(N,n) local R,X,p; uses Statistics; X := RandomVariable(Normal(0,1)): R := Vector['row'](n, 'datatype=float[8]', 'order=Fortran_order'); p := SampleRtable(X); to N do p(R); end do; NULL; end proc:This is just as fast as test3b, but with reduced memory usage, since R is reused.
brilliant
Very nice, as proof of concept, and in performance.
> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test4(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.322, 3276200, 3816512
Even the popular C rng's re-use arrays like this.
Your code shows the benefit so directly. Both the speed and memory usage are lower. With normal use of current Statistics, it's only possible to get one of the two so low while the other remains several powers of 2 greater.
An interesting question then becomes: should Statistics:-Sample(X) return a proc that allowed an optional Vector argument, or would processing for that option add too much overhead? Would it better to have a new routine like SampleRtable(X) which returned a proc which expected exactly two arguments? What's the tradeoff, between the confusion of more user-level routines and the ability to get very low overhead generated procedures?
I mention the idea of two arguments to the procedure returned by SampleRtable(X), since one might only want to generate less random numbers than fit in the rtable.
acer
cost of branching/type checking
I wondered the same thing, that is, whether Sample(X) should return a procedure that can operate on posints (to return a Vector), lists of ranges (to return a multidimensional rtable), or an an rtable directly. Another possibility is provide a keyword to Sample to tell it what type arguments a generated procedure should expect.
I did some timing, testing the following two variations (of returned procedure):
# original proc(R) p(R, 'NULL', [ds], true) end proc; # type check and branch proc(R :: Or(posint,list(range(integer)),rtable(datatype=float[8]))) if R::rtable then p(R, 'NULL', [ds], true); else error "not implemented"; end if; end proc # just branch proc(R) if R::rtable then p(R, 'NULL', [ds], true); else error "not implemented"; end if; end procThe original and the 'just branch' versions are about the the same. Adding type checking ignificantly slows the operation (3x) and increases memory allocation (40x??) Actually, the real problem appears to be including the 'datatype=float[8]' option in the rtable type declaration. If that is changed to just rtable, then 'type check and branch' is essentially the same as 'just branch'.
What do you expect the optional second argument to do? Specify subdimensions to be filled?
just ideas
These are just ideas for amending Sample to act inplace. They don't actually work.
Joe, the ability to handle ranges nicely, which you recently showed for the copy case, could be trickier.
with(Statistics):
X := RandomVariable(Normal(1,0)):
# This next could operate inplace on a Vector argument,
# and fill its first N entries with values.
S:=Sample(X,N,V);
# This next could return a procedure that expects
# two arguments of types posint,Vector.
S:=Sample(X,inplace=true);
S(N,V);
# This next could be the same as the current Sample(X).
S:=Sample(X,inplace=false):
S(N);
An optional offset=<nonnegint> parameter is also possible, as opposed to a range. It would look slightly awkward, to allow an offset for dealing with the inplace Vector when that's not currently supported in the copy (current, non-inplace) implementation. Also, if were to take an offset option, then why not a stride option too? Thinking about it a bit harder, maybe neither offset nor stride might be implemented since one might be able to modify the programmatic access to the entries to get the same effects?
acer
off stride
Thank you for elaborating. By "programmatic access" do you mean via, say, ArrayTools:-Alias? With the SampleRtable procedure one can do
X := RandomVariable(Normal(2,3)): p := SampleRtable(X): R := Matrix(3,4, 'datatype=float[8]', 'order=Fortran_order'): AR := ArrayTools:-Alias(R,9,[3]): p(AR); R; [0. 0. 0. 0.561974199090319981] [ ] [0. 0. 0. 4.35252109919989660 ] [ ] [0. 0. 0. 0.498540275504532948]That won't, however, allow inserting random variables into a row of a Matrix in column-major order (Fortran_order). Permitting a stride would be useful there; however, I suspect that the utility might be insufficient to justify the additional complexity.
cost of typechecking
I think you have just identified a new, real bug -- adding that extra type-check should not have that dramatic an effect! The time and memory allocation traces there point to a real issue that needs to be looked at by someone in the kernel group.