acer

32562 Reputation

29 Badges

20 years, 26 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are Posts that have been published by acer

Way back in Maple 6, the rtable was introduced. You might be more familiar with its three types: Array, Matrix, and Vector. The name rtable is named after "rectangular table", since its entries can be stored contiguously in memory which is important in the case of "hardware" datatypes. This is a key aspect of the external-calling mechanism which allows Maple to use functions from the NAG and CLAPACK external libraries. In essence, the contiguous data portion of a hardware datatype rtable can be passed to a compiled C or Fortran function without any need for copying or preliminary conversion. In such cases, the data structure in Maple is storing its numeric data portion in a format which is also directly accessible within external functions.

You might have noticed that Matrices and Arrays with hardware datatypes (eg. float[8], integer[4], etc) also have an order. The two orders, Fortran_order and C_order, correspond to column-major and row-major storage respectively. The Wikipedia page row-major  explains it nicely.

There is even a help-page which illustrates that the method of accessing entries can affect performance. Since Fortran_order means that the individual entries in any column are contiguous in memory then code which accesses those entries in the same order in which they are stored in memory can perform better. This relates to the fact that computers cache data: blocks of nearby data can be moved from slower main memory (RAM) to very fast cache memory, often as a speculative process which often has very real benefits.

What I'd like to show here is that the relatively small performance improvement (due to matching the entry access to the storage order) when using evalhf can be a more significant improvement when using Maple's Compile command. For procedures which walk all entries of a hardware datatype Matrix or multidimensional Array, to apply a simple operation upon each value, the improvement can involve a significant part of the total computation time.

What makes this more interesting is that in Maple the default order of a float[8] Matrix is Fortran_order, while the default order of a float[8] Array used with the ImageTools package is C_order. It can sometimes pay off, to write your for-do loops appropriately.

If you are walking through all entries of a Fortran_order float[8] Matrix, then it can be beneficial to access entries primarily by walking down each column. By this I mean accessing entries M[i,j] by changing i in ther innermost loop and j in the outermost loop. This means walking the data entries, one at a time as they are stored. Here is a worksheet which illustrates a performance difference of about 30-50% in a Compiled procedure (the precise benefit can vary with platform, size, and what else your machine might be doing that interferes with caching).

Matrixorder.mw

If you are walking through all entries of an m-by-n-by-3 C_order float[8] Array (which is a common structure for a color "image" used by the ImageTools package) then it can be beneficial to access entries A[i,j,k] by changing k in the innermost loop and i in the outermost loop. This means walking the data entries, one at a time as they are stored. Here is a worksheet which illustrates a performance difference of about 30-50% in a Compiled procedure (the precise benefit can vary with platform, size, and what else your machine might be doing that interferes with caching).

Arrayorder.mw

Using techniques previously used for generating color images of logistic maps and complex argument, attached is a first draft of a new Mandelbrot set fractal image applet.

A key motive behind this is the need for a faster fractal generator than is currently available on the Application Center as the older Fractal Fun! and Mandelbrot Mania with Maple entries. Those older apps warn against being run with too high a resolution for the final image, as it would take too long. In fact, even at a modest size such as 800x800 the plain black and white images can take up to 40 seconds to generate on a fast Intel i7 machine when running those older applications.

The attached worksheet can produce the basic 800x800 black and white image in approximately 0.5 seconds on the same machine. I used 64bit Maple 15.01 on Windows 7 for the timings. The attached implementration uses the Maple Compiler to attain that speed, but should fall back to Maple's quick evalhf mode in the case that the Compiler is not properly configured or enabled.

The other main difference is that this new version is more interactive: using sliders and other Components. It also inlines the image directly (using a Label), instead of as a (slow and resource intensive) density plot.

Run the Code Edit region, to begin. Make sure your GUI window is shown large enough for you to see the sides of the GUI Table conveniently.

The update image appearing in the worksheet is stored in a file, the name of which is currently set to whatever the following evaluates to in your Maple,

cat(kernelopts('homedir'),"/mandelbrot.jpg"):

You can copy the current image file aside in your OS while experimenting with the applet, if you want to save it at any step. See the start of the Code Edit region, to change this filename setting.

Here's the attachment. Comments are welcome, as I'd like to make corrections before submitting to the Application Center. Some examples of images (reduced in size for inclusion here) created with the applet are below.

 

The Locator object is a nice piece of Mathematica's Manipulate command's functionality. Perhaps Maple's Explore command could do something as good.

Here below is a roughly laid out example, as a Worksheet. Of course, this is not...

Here is a short wrapper which automates repeated calls to the DirectSearch 2 curve-fitting routine. It offers both time and repetition (solver restart) limits.

The global optimization package DirectSearch 2 (see Application Center link, and here) has some very nice features. One aspect which I really like is that it can do curve-fitting: to fit an expression using tabular data. By this, I mean that it can find optimal values of parameters present in an expression (formula) such that the residual error between that formula and the tabular data is minimized.

Maple itself has commands from the CurveFitting and Statistics packages for data regression, such as NonlinearFit, etc. But those use local optimization solvers, and quite often for the nonlinear case one may need a global optimizer in order to produce a good fit. The nonlinear problem may have local extrema which are not even close to being globally optimal or provide a close fit.

Maplesoft offers the (commercially available) GlobalOptimization package as an add-on to Maple, but its solvers are not hooked into those mentioned curve-fitting commands. One has to set up the proper residual-based objective function onself in order to use this for curve-fitting, and some of the bells and whistles may be harder to do.

So this is why I really like the fact that the DirectSearch 2 package has its own exported commands to do curve-fitting, integrated with its global solvers.

But as the DirectSearch package's author mentions, the fitting routine may sometimes exit too early. Repeat starts of the solver, for the very same parameter ranges, can produce varying results due to randomization steps performed by the solver. This post is branched off from another thread which involved such a problematic example.

Global optimization is often a dark art. Sometimes one may wish to simply have the engine work for 24 hours, and produce whatever best result it can. That's the basic enhancement this wrapper offers.

Here is the wrapper, and a few illustrative calls to it on the mentioned curve-fitting example that show informative  progress status messages, etc. I've tried to make the wrapper pretty generic. It could be reused for other similar purposes.

Other improvements are possible, but might make it less generic. A target option is possible, where attainment of the target would cause an immediate stop. The wrapper could be made into an appliable module, and the running best result could be stored in a module local so that any error (and ensuing halt) would not wipe out the best result from potentially hours and hours worth of conputation.

restart:
randomize():

repeater:=proc(  funccall::uneval
               , {maxtime::numeric:=60}
               , {maxiter::posint:=10}
               , {access::appliable:=proc(a) SFloat(a[1]); end proc}
               , {initial::anything:=[infinity]}
              )
          local best, current, elapsed, i, starttime;
            starttime:=time[real]();
            elapsed:=time[real]()-starttime;
            i:=1; best:=[infinity];
            while elapsed<maxtime and i<=maxiter do
              userinfo(2,repeater,`iteration `,i);
              try
                timelimit(maxtime-elapsed,assign('current',eval(funccall)));
              catch "time expired":
              end try;
              if is(access(current)<access(best)) then
                best:=current;
                userinfo(1,repeater,`new best `,access(best));
              end if;
              i:=i+1;
              elapsed:=time[real]()-starttime;
              userinfo(2,repeater,`elapsed time `,elapsed);
            end do;
            if best<>initial then
              return best;
            else
              error "time limit exceeded during first attempt";
            end if;
          end proc:


X := Vector([seq(.1*j, j = 0 .. 16), 1.65], datatype = float): 

Y := Vector([2.61, 2.62, 2.62, 2.62, 2.63, 2.63, 2.74, 2.98, 3.66,
             5.04, 7.52, 10.74, 12.62, 10.17, 5, 2.64, 11.5, 35.4],
            datatype = float):

F := a*cosh(b*x^c*sin(d*x^e));

                                    /   c    /   e\\
                         F := a cosh\b x  sin\d x //

infolevel[repeater]:=2: # or 1, or not at all (ie. 0)
interface(warnlevel=0): # disabling warnings. disable if you want.

repeater(DirectSearch:-DataFit(F
                      , [a=0..10, b=-10..10, c=0..100, d=0..7, e=0..4]
                      , X, Y, x
                      , strategy=globalsearch
                      , evaluationlimit=30000
                              ));
repeater: iteration  1
repeater: new best  9.81701944539358706
repeater: elapsed time  15.884
repeater: iteration  2
repeater: new best  2.30718902535293857
repeater: elapsed time  22.354
repeater: iteration  3
repeater: new best  0.627585701120743822e-4
repeater: elapsed time  30.777
repeater: iteration  4
repeater: elapsed time  47.959
repeater: iteration  5
repeater: new best  0.627585700905294148e-4
repeater: elapsed time  55.221
repeater: iteration  6
repeater: elapsed time  60.009
 [0.0000627585700905294, [a = 2.61748237902808, b = 1.71949329097179, 

   c = 2.30924401405164, d = 1.50333106110324, e = 1.84597267458055], 4333]


# without userinfo messages printed
infolevel[repeater]:=0:
repeater(DirectSearch:-DataFit(F
                      , [a=0..10, b=-10..10, c=0..100, d=0..7, e=0..4]
                      , X, Y, x
                      , strategy=globalsearch
                      , evaluationlimit=30000
                              ));

 [0.0000627585701341043, [a = 2.61748226209478, b = 1.71949332125427, 

   c = 2.30924369227236, d = 1.50333090706676, e = 1.84597294290477], 6050]


# illustrating early timeout
infolevel[repeater]:=2:
repeater(DirectSearch:-DataFit(F
                      , [a=0..10, b=-10..10, c=0..100, d=0..7, e=0..4]
                      , X, Y, x
                      , strategy=globalsearch
                      , evaluationlimit=30000
                              ),
         maxtime=2);

repeater: iteration  1
repeater: elapsed time  2.002
Error, (in repeater) time limit exceeded during first attempt

# illustrating iteration limit cutoff
infolevel[repeater]:=2:
repeater(DirectSearch:-DataFit(F
                      , [a=0..10, b=-10..10, c=0..100, d=0..7, e=0..4]
                      , X, Y, x
                      , strategy=globalsearch
                      , evaluationlimit=30000
                              ),
         maxiter=1);

repeater: iteration  1
repeater: new best  5.68594272127419575
repeater: elapsed time  7.084
 [5.68594272127420, [a = 3.51723075672918, b = -1.48456068506828, 

   c = 1.60544055207338, d = 6.99999999983179, e = 3.72070034285212], 2793]


# giving it a large total time limit, with reduced userinfo messages
infolevel[repeater]:=1:
Digits:=15:
repeater(DirectSearch:-DataFit(F
                      , [a=0..10, b=-10..10, c=0..100, d=0..7, e=0..4]
                      , X, Y, x
                      , strategy=globalsearch
                      , evaluationlimit=30000
                              ),
         maxtime=2000, maxiter=1000);

repeater: new best  3.10971990123465947
repeater: new best  0.627585701270853103e-4
repeater: new best  0.627585700896181428e-4
repeater: new best  0.627585700896051324e-4
repeater: new best  0.627585700895833535e-4
repeater: new best  0.627585700895607885e-4
 [0.0000627585700895608, [a = 2.61748239185387, b = -1.71949328487160, 

   c = 2.30924398692221, d = 1.50333104262348, e = 1.84597270535142], 6502]

Suppose that you wish to animate the whole view of a plot. By whole view, I mean that it includes the axes and is not just a rotation of a plotted object such as a surface.

One simple way to do this is to call plots:-animate (or plots:-display on a list of plots supplied in a list, with its `insequence=true` option). The option `orientation` would contain the parameter that governs the animation (or generates the sequence).

But that entails recreating the same plot each time. The plot data might not even change. The key thing that changes is the ORIENTATION() descriptor within each 3d plot object in the reulting data structure. So this is inefficient in two key ways, in the worst case scenario.

1) It may even compute the plot's numeric results, as many times as there are frames in the resulting animation.

2) It stores as many instances of the grid of computed numeric data as there are frames.

We'd like to do better, if possible, reducing down to a single computation of the data, and a single instance of storage of a grid of data.

To keep this understandable, I'll consider the simple case of plotting a single 3d surface. More complicated cases can be handled with revisions to the techniques.

Avoiding problem 1) can be done in more than one way. Instead of plotting an expression, a procedure could be plotted, where that procedure has `option remember` so that it automatically stores computed results an immediately returns precomputed stored result when the arguments (x and y values) have been used already.

Another way to avoid problem 1) is to generate the unrotated plot once, and then to use plottools:-rotate to generate the other grids without necessitating recomputation of the surface. But this rotates only objects in the plot, and does alter the view of the axes.

But both 1) and 2) can be solved together by simply re-using the grid of computed data from an initial plot3d call, and then constructing each frame's plot data structure component "manually". The only thing that has to change, in each, is the ORIENTATION(...) subobject.

At 300 frames, the difference in the following example (Intel i7, Windows 7 Pro 64bit, Maple 15.01) is a 10-fold speedup and a seven-fold reduction is memory allocation, for the creation of the animation structure. I'm not inlining all the plots into this post, as they all look the same.

restart:
P:=1+x+1*x^2-1*y+1*y^2+1*x*y:

st,ba:=time(),kernelopts(bytesalloc):

plots:-animate(plot3d,[P,x=-5..5,y=-5..5,orientation=[A,45,45],
                       axes=normal,labels=[x,y,z]],
               A=0..360,frames=300);

time()-st,kernelopts(bytesalloc)-ba;

                                1.217, 25685408
restart:
P:=1+x+1*x^2-1*y+1*y^2+1*x*y:

st,ba:=time(),kernelopts(bytesalloc):

g:=plot3d(P,x=-5..5,y=-5..5,orientation=[-47,666,-47],
          axes=normal,labels=[x,y,z]):

plots:-display([seq(PLOT3D(GRID(op([1,1..2],g),op([1,3],g)),
                           remove(type,[op(g)],
                                  specfunc(anything,{GRID,ORIENTATION}))[],
                           ORIENTATION(A,45,45)),
                    A=0..360,360.0/300)],
               insequence=true);

time()-st,kernelopts(bytesalloc)-ba;

                                0.125, 3538296

By creating the entire animation data structure manually, we can get a further factor of 3 improvement in speed and a further factor of 3 reduction in memory allocation.

restart:
P:=1+x+1*x^2-1*y+1*y^2+1*x*y:

st,ba:=time(),kernelopts(bytesalloc):

g:=plot3d(P,x=-5..5,y=-5..5,orientation=[-47,666,-47],
          axes=normal,labels=[x,y,z]):

PLOT3D(ANIMATE(seq([GRID(op([1,1..2],g),op([1,3],g)),
                           remove(type,[op(g)],
                                  specfunc(anything,{GRID,ORIENTATION}))[],
                           ORIENTATION(A,45,45)],
                    A=0..360,360.0/300)));

time()-st,kernelopts(bytesalloc)-ba;

                                0.046, 1179432                            

Unfortunately, control over the orientation is missing from Plot Components, otherwise such an "animation" could be programmed into a Button. That might be a nice functionality improvement, although it wouldn't be very nice unless accompanied by a way to export all a Plot Component's views to GIF (or mpeg!).

The above example produces animations each of 300 frames. Here's a 60-frame version:

4 5 6 7 8 9 10 Last Page 6 of 29