Speed of Statistics computation

zakyn's picture

Hello,

I would like to use Maple in Statistics, but I have a problem with a speed of base functions, like generation, Mean, Range and Standard deviation. Maple computes slowly than MathCad.

I have 4 test functions:

Test of generation speed
> restart:
>
> Digits :=10:
>
> test3:=proc(N,n)
> local i,a,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> for i from 1 to N do
> a:=Sample(X,n):
> end do:
> end use;
> end proc:
> time(test3(30000,100));


Test of Mean speed

> restart:
> Digits :=10;
>
> test4:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=Mean(a)
> end do:
> end use;
> end proc:
> time(test4(30000,100));


Test of StandardDeviation speed

> restart:
> Digits :=10;
>
> test5:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=StandardDeviation(a)
> end do:
> end use;
> end proc:
> time(test5(30000,100));


Test of Range speed

> restart:
> Digits :=10;
>
> test6:=proc(N,n)
> local i,a,b,X;
> use Statistics in
> X := RandomVariable(Normal(0,1)):
> a:=Sample(X,n):
> for i from 1 to N do
> b:=Range(a)
> end do:
> end use;
> end proc:
> time(test6(30000,100));

Times for N=30,000 , n=100
Maple - Mathcad
63,2 - 3,20
13,3 - 0,03
13,9 - 0,33
19,5 - 0,14

I think, that I do something bad. Could you help me please?

Thank you

Vladimir

Download 551_test3456.mws
View file details

Comments

JacquesC's picture

Speed of dynamic argument checking in an interpreter

If you trace through what all Range does (for example), you'll see that it spends well over 90% of its time parsing its arguments, dispatching to the right routines (dynamically), setting up complicated options, before finally doing the computation. The other commands are likewise.

Basically you are feeling the cost of implementing a very feature-full package (like Statistics) in a dynamically-typed, interpreted language like Maple, where the implementation is done using a lot of (clever!) dynamic dispatch techniques and generic programming ideas.

MathCad is faster because it has fewer features, and those features are implemented in a compiled language. That means that, in the long run, Maple will gain more features faster than MathCad, but is likely to remain a factor of between 50 and 200 times slower on computations like the above. Unless of course those routines are re-implemented in the Maple kernel, at which time they will be competitive with MathCad's, but the development cost will be quite high (i.e. unlikely to happen soon, IMHO).

zakyn's picture

How should I trace it?

Hello,

your answer is very interesting for me. How should I trace the procedure to see what you say?

Is the following code the most usefull?


with(CodeTools[Profiling]):
Profile(test3):
PrintProfiles();

Well, How should I use Maple to solve statistics problem?

Thank you

Vladimir

JacquesC's picture

Static and Dynamic trace

You should read my blog post on old timer techniques.

You can use printlevel or CodeTools[Profile]() to get results about absolutely everything (dynamic trace). This can be a lot of data.

Or you can look at the code. You'll need to dig around some, but these were what I did:

> with(Statistics): interface(verboseproc=3): kernelopts(opaquemodules=false):
> showstat(Range);
> showstat(Statistics:-Quantities:-GetValue);
> showstat(Statistics:-Quantities:-GetValueTab[DescriptiveStatistics]);

which was enough to show me all the overhead incurred. One can continue to dig too.

zakyn's picture

Why use kernelopts(opaquemodules=false): ?

Hello,

I forgot that I use verboseproc=3 to show code of procedure.
e.g.
> interface(verboseproc=3):
> print(dsolve);

I have already used showstat too.

But why should I use kernelopts(opaquemodules=false): ?

Thanks for a link to old timer techniques, too. It is very useful :o)

Vladimir

JacquesC's picture

Most of Statistics is hidden

Some of the things inside Statistics are local variables of the respective modules, and you can't see their values unless you use kernelopts(opaquemodules=false).

Modules being opaque by default is very good from a software engineering point of view, as you do want code to be as opaque as possible (information hiding and all that good stuff). But when you are debugging, all these things seriously get in the way. In this respect, Maple strikes a really nice balance.

zakyn's picture

Thanks

Thanks a lot. It is very usefull for me.

Vladimir

One possibility is to

One possibility is to perform all of the the preprocessing once. This should make things a bit faster. Consider:

restart:

Digits :=10:

test3:=proc(N,n)
local i,a,X;
use Statistics in
X := RandomVariable(Normal(0,1)):
for i from 1 to N do
a:=Sample(X,n):
end do:
end use;
end proc:
time(test3(30000,100));

12.733

Note, that the call to Sample below creates a procedure for generating random samples drawn from the normal distribution. So it is not longer necessary to do the parameter processing over and over again.

restart:

Digits :=10:

test3:=proc(N,n)
local i,a,X, S;
use Statistics in
X := RandomVariable(Normal(0,1)): S := Sample(X);
for i from 1 to N do
a:=S(n):
end do:
end use;
end proc:
time(test3(30000,100));

1.446

Of course, it is even more efficient to do the simulation once.

restart:

Digits :=10:

test3:=proc(N,n)
local i,a,X, S;
use Statistics in
X := RandomVariable(Normal(0,1)): S := Sample(X);
a := ArrayTools:-Alias(S(N*n), [1..N, 1..n]);
end use;
end proc:
time(test3(30000,100));

0.268
restart:
Digits :=10;

test4:=proc(N,n)
local i,a,b,X;
use Statistics in
X := RandomVariable(Normal(0,1)):
a:=Sample(X,n):
for i from 1 to N do
b:=Mean(a)
end do:
end use;
end proc:
time(test4(30000,100));

10
7.126

Again, we can create a procedure for computing the mean of a data sample.

restart:
Digits :=10;

test4:=proc(N,n)
local i,a,b,X,M;
use Statistics in
X := RandomVariable(Normal(0,1)):
a:=Sample(X,n): M := MakeProcedure(Mean, sample);
for i from 1 to N do
b:=M(a);
end do:
end use;
end proc:
time(test4(30000,100));

10
0.277

Sample all values at once...

You weren't paying attention in my response to a previous post of yours. Go see very-slow-evaluating-matrix-computation. As an example, here is your test3 rewritten:
test3a :=proc(N,n)
local i,a,X,A;
    use Statistics in
        X := RandomVariable(Normal(0,1)):
        A := ArrayTools:-Reshape(Sample(X,n*N),N,n);
        for i from 1 to N do
            a := A[i,1..-1];
        end do:
    end use;
end proc:
This is an order of magnitude faster.
zakyn's picture

I have read it.

Hello,

I have read your comments and thank you for it. But I posted this for my friend. I will try to rewrite the procedures. I thought that the slow is because of Statistics package.

Vladimir

apologies

Hi Vladimir,

I apologize for saying you weren't paying attention. Actually, considering that it took me a few minutes to recall what I had previously posted, I wouldn't be surprised if you had forgotten it. Note that, as Jacques points out, the Maple Statistics package is not going to be the fastest gun around. That being said, there are ways to make things go faster, so we need to learn how to utilize them. By creating all the samples at once, you only have to pay the overhead of invoking the Sample procedure one time.

zakyn's picture

What does it mean A[i,1..-1]?

Hi,

I do not know, what does A[i,1..-1] mean? Why 1..-1?

Thanks

Vladimir

negative indices in Matrix

In this case it is equivalent to A[i,1..n]. It is a convention used in rtable (Vectors, Matrices) for accessing all the elements of a row/column. You can also access a partial row in the same way: A[1,2..-2]. The -1 means the last value of the index, -2 the previous to last index, etc. This doesn't necessarily work in Arrays, which permit actual negative indices.

zakyn's picture

More efficiency.

Thank you for your help. If I understand well, I am not able to rewrite test4, test5 and tes6 procedure to get faster because I call the Sample(...) only once. Well, procedures of Statistics are very slow and then my procedures are slow as well. Is it right?

Thank you

Vladimir

acer's picture

memory efficiency

Keeping the total memory allocation down is often an important part of efficiency. But, also, keeping the memory usage (garbage production and cleanup) down can also leads to time savings.

I suspect that ArrayTools:-Reshape produce a full copy of the rtable argument, while ArrayTools:-Alias actually provides an alternate view of the same data in memory without making a copy. So, using Alias can be more efficient than using Reshape.

Also, in the test3 as given, it was not necessary to keep all of the instances of Vector a around. And indeed in Joe's test3a the Vector a was overwritten due to a being reassigned. But each assignment to a in test3a extracts a new row of A, producing a new Vector each time. Memeory usage can be reduced by instead forming the container for Vector a just once, and then doing a fast copy of the appropriate row of A into Vector a.

So, assuming that I coded it correctly,

test3b := proc(N,n)
local i,a,X,A;
use Statistics in
X := RandomVariable(Normal(0,1)):
A := ArrayTools:-Alias(Sample(X,n*N),[N,n],Fortran_order);
a := Vector[row](n,datatype=float,order=Fortran_order);
for i from 1 to N do
ArrayTools:-Copy(n,A,n*(i-1),1,a,0,1);
end do;
end use;
end proc:

Now, to measure the efficiency, let's also look at memory used and memory allocated, as well as time.

> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
18.419, 8518120, 899923608

> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3a(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.711, 56088544, 79456288

> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test3b(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.571, 31844664, 35960072

Of course, what would be really nice would be to get the speed of test3b or test3a and the lower memory allocation (by at least a factor of four) of test3. This may not be currently possible. One way to get it might be with an enhancement to the Statistics:-Sample routine, so that it accepts as an optional argument the container Vector for the result. It could re-use the same Vector container, and fill it with new sample data, operating in-place. That could allow the total memory allocation to stay low, to generate only n pieces of data at any one time, but avoid production of a lot of Vectors as garbage.

It's a bit of a shame, that it isn't implemented in this way.

Since at default Digits=10 the Vectors have hardware double precision float[8] datatype, there wouldn't be garbage produced from all the entries as software floats.

ps. It may be that A should be C_order, according to how I set up the strides for ArrayTools:-Alias, so that it walks rows of A and not columns. It shouldn't affect the timings much.

acer

Undocumented Sample extension

While it doesn't improve the efficiency, and is undocumented, Statistics:-Sample permits a more general calling form than Sample(X, n); specifically, n can be a list of ranges, each specifying an rtable index. Thus test3b could be

test3c := proc(N,n)
local i,a,X,A;
uses Statistics;
    X := RandomVariable(Normal(0,1)):
    A := Sample(X,[1..N,1..n]);
    a := Vector[row](n,datatype=float,order=Fortran_order);
    for i from 1 to N do
        ArrayTools:-Copy(n,A,n*(i-1),1,a,0,1);
    end do;
end proc:

A has Fortran_order.

acer's picture

Sample documentation

That's quite neat, Joe.

Looking at the help-page, ?Statistics,Sample , I don't see where even the simpler invocation Sample(X) is shown.

When it is called as Sample(X), for a random variable X, then omitting the second parameter N will result a procedure being returned. That procedure can itself be polled, to return separate Vector samples. This usage appears in several of the examples higher up in this thread. The first person to show it may have been user "alex".

The ?Statistics,Sample help-page could be filled with more details on these otherwise hidden bits of functionality. I wonder what other eggs may be found in Statistics.

acer

JacquesC's picture

User "alex"

is actually the original author of the Statistics package. Unfortunately, he has moved on to work in the Financial industry. I do not think he looks at mapleprimes anymore.

Improving Sample (passing an rtable)

While it is true that Sample does not currently allow passing in an rtable, the external code that fills an existing rtable with samples exists. It shouldn't be too hard for Sample to be extended. Here is a proof of concept illustrating that this can be achieved (I have to use kernelopts(opaquemodules=false) to permit accessing some local procedures).

SampleRtable := proc(X)
local p,ds,opacity;
description "return a procedure that fills an rtable with samples of X";
    try
        opacity := kernelopts('opaquemodules' = false);
        p := Statistics:-ExternalSupport:-DefineExternal('MapleRandomSample');
        # this is a hack, it can be extended to support full options of Sample
        ds := ( Statistics
                :-Simulation
                :-DefaultGenerator
                :-GetParameters
              )(X , table([('range')       = 'deduce'
                           ,('tolerance')  = 1000
                           ,('updates')    = 100
                           ,('basepoints') = 'deduce'
                          ])
               );
    finally
        kernelopts('opaquemodules' = opacity);
    end try;
    proc(R) p(R, 'NULL', [ds], true) end proc;
end proc:

X := RandomVariable(Normal(2,3)):
p := SampleRtable(X):
R := rtable(1..3, datatype=float[8]):
p(R);
                     [0.561974199090320647, 4.35252109919989660, 0.498540275504532837]

Here is a test4 that uses SampleRtable

test4 := proc(N,n)
local R,X,p;
uses Statistics;
    X := RandomVariable(Normal(0,1)):
    R := Vector['row'](n, 'datatype=float[8]', 'order=Fortran_order');
    p := SampleRtable(X);
    to N do
        p(R);
    end do;
    NULL;
end proc:

This is just as fast as test3b, but with reduced memory usage, since R is reused.

acer's picture

brilliant

Very nice, as proof of concept, and in performance.

> (st,ba,bu):=time(),kernelopts(bytesalloc),kernelopts(bytesused):
> test4(30000,100):
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
0.322, 3276200, 3816512

Even the popular C rng's re-use arrays like this.

Your code shows the benefit so directly. Both the speed and memory usage are lower. With normal use of current Statistics, it's only possible to get one of the two so low while the other remains several powers of 2 greater.

An interesting question then becomes: should Statistics:-Sample(X) return a proc that allowed an optional Vector argument, or would processing for that option add too much overhead? Would it better to have a new routine like SampleRtable(X) which returned a proc which expected exactly two arguments? What's the tradeoff, between the confusion of more user-level routines and the ability to get very low overhead generated procedures?

I mention the idea of two arguments to the procedure returned by SampleRtable(X), since one might only want to generate less random numbers than fit in the rtable.

acer

cost of branching/type checking

I wondered the same thing, that is, whether Sample(X) should return a procedure that can operate on posints (to return a Vector), lists of ranges (to return a multidimensional rtable), or an an rtable directly. Another possibility is provide a keyword to Sample to tell it what type arguments a generated procedure should expect.

I did some timing, testing the following two variations (of returned procedure):

 
# original
    proc(R) p(R, 'NULL', [ds], true) end proc;
# type check and branch
    proc(R :: Or(posint,list(range(integer)),rtable(datatype=float[8])))
        if R::rtable then
            p(R, 'NULL', [ds], true);
        else
            error "not implemented";
        end if;
    end proc
# just branch
    proc(R)
        if R::rtable then
            p(R, 'NULL', [ds], true);
        else
            error "not implemented";
        end if;
    end proc

The original and the 'just branch' versions are about the the same. Adding type checking ignificantly slows the operation (3x) and increases memory allocation (40x??) Actually, the real problem appears to be including the 'datatype=float[8]' option in the rtable type declaration. If that is changed to just rtable, then 'type check and branch' is essentially the same as 'just branch'.

What do you expect the optional second argument to do? Specify subdimensions to be filled?

acer's picture

just ideas

These are just ideas for amending Sample to act inplace. They don't actually work.

Joe, the ability to handle ranges nicely, which you recently showed for the copy case, could be trickier.

with(Statistics):

X := RandomVariable(Normal(1,0)):

# This next could operate inplace on a Vector argument,
# and fill its first N entries with values.
S:=Sample(X,N,V);

# This next could return a procedure that expects
# two arguments of types posint,Vector.
S:=Sample(X,inplace=true);
S(N,V);

# This next could be the same as the current Sample(X).
S:=Sample(X,inplace=false):
S(N);

An optional offset=<nonnegint> parameter is also possible, as opposed to a range. It would look slightly awkward, to allow an offset for dealing with the inplace Vector when that's not currently supported in the copy (current, non-inplace) implementation. Also, if were to take an offset option, then why not a stride option too? Thinking about it a bit harder, maybe neither offset nor stride might be implemented since one might be able to modify the programmatic access to the entries to get the same effects?

acer

off stride

Thank you for elaborating. By "programmatic access" do you mean via, say, ArrayTools:-Alias? With the SampleRtable procedure one can do

X := RandomVariable(Normal(2,3)):
p := SampleRtable(X):
R := Matrix(3,4, 'datatype=float[8]', 'order=Fortran_order'):
AR := ArrayTools:-Alias(R,9,[3]):
p(AR);
R;
                       [0.    0.    0.    0.561974199090319981]
                       [                                      ]
                       [0.    0.    0.    4.35252109919989660 ]
                       [                                      ]
                       [0.    0.    0.    0.498540275504532948]

That won't, however, allow inserting random variables into a row of a Matrix in column-major order (Fortran_order). Permitting a stride would be useful there; however, I suspect that the utility might be insufficient to justify the additional complexity.

JacquesC's picture

cost of typechecking

I think you have just identified a new, real bug -- adding that extra type-check should not have that dramatic an effect! The time and memory allocation traces there point to a real issue that needs to be looked at by someone in the kernel group.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
}