Items tagged with programming programming Tagged Items Feed

If you want better performance then don't use 2D Math mode to enter procedures which call  `.` (dot).

The following timings are not the result of the order in which the cases are performed. The timings stay roughly the same if the blocks delimited by restarts are executed out of order.

Most of the comparisons (that I've seen so far) amongst Maple, Matlab, and Mathematica are either incomplete, inaccurate, or biased.

This collection of three articles [1 (1), 2 (2),

Consider the following C code:

#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/time.h>

#define NUM_THREADS 4
#define NUM_ITERATIONS (1L << 32)

long *data;

void *thread_function( void *args )
{
long i;
long j = (long)args;

for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j] = i;
}

return NULL;
}

int main( int argc, char **argv )
{
long i, j;
pthread_t ids[NUM_THREADS-1];
struct timeval start, end;
double d;

data = (long*)malloc( sizeof(long)*NUM_THREADS );

for ( j = 1; j <= NUM_THREADS; j++ )
{
for ( i = 0; i < NUM_THREADS; i++ )
{
data[i] = 0;
}

gettimeofday( &start, NULL );
for ( i = 0; i < j-1; i++ )
{
pthread_create( ids+i, NULL, thread_function, (void*)i );
}

thread_function( (void*)i );

for ( i = 0; i < j-1; i++ )
{
pthread_join( ids[i], NULL );
}

gettimeofday( &end, NULL );

d = end.tv_sec - start.tv_sec + (1.0e-6)*end.tv_usec - (1.0e-6)*start.tv_usec;

printf( "%ld) %g %g ips\n", j, d, ((double)j*NUM_ITERATIONS)/d );
}

return 0;
}

Basically this code

   for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j] = i;
}

runs in parallel for 1 to NUM_THREADS threads and prints how long it takes to run and the number of iterations executed per second. data is an array and j is a thread specific index into the array. Thus each thread is given its own memory address. This is important to recognize, the threads are not sharing data.

As we increase the number of threads (up to the number of cores), the running time should stay the same, but the number of iterations per second should increase linearly (remember each thread does NUM_ITERATIONS, so adding a new thread adds NUM_ITERATIONS of work). Let's see what happens when we actually run this code:

1) 15.758 2.72558e+08 ips
2) 28.8921 2.97311e+08 ips
3) 40.1948 3.20561e+08 ips
4) 46.6236 3.6848e+08 ips

Well, that is nothing like what we were expecting. The running time increases significantly and the iterations per second increase only slightly. It would appear as though the threads are interfering with each other, however we can see that they don't actually share any data. What's happening?

This is an example of false sharing. False sharing occurs when multiple threads modify data that share the same cache line. I'm not going to go into too much detail about caching, see this wikipedia article for more details. Briefly, each core has a small private cache that stores a copy of a region of main memory (accessing the local copy is faster than accessing main memory). When the corresponding region in main memory changes, the value stored in the cache is no longer correct. Thus if the core wants to use anything in that memory region, it needs to make a new copy. When multiple threads share a single cache line, modification of the memory in one thread will force the other threads to refresh their caches. This refresh forces the threads to wait for memory transfers far more then they would if the thread was running exclusively. If there are enough threads with enough memory requests it is also possible to saturate the memory bus, which can delay memory transfers even more.

To fix this we need to make sure that the threads aren't sharing cache lines. The easiest way to fix a problem like this is to add padding around the data.

#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/time.h>

#define NUM_THREADS 4
#define NUM_ITERATIONS (1L << 32)

#define CACHE_SIZE 128

typedef struct {
long d;
char buffer[CACHE_SIZE-sizeof(long)];
} Data;

Data *data;

void *thread_function( void *args )
{
long i;
long j = (long)args;

for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j].d = i;
}

return NULL;
}

int main( int argc, char **argv )
{
long i, j;
pthread_t ids[NUM_THREADS-1];
struct timeval start, end;
double d;

data = (Data*)malloc( sizeof(Data)*NUM_THREADS );

for ( j = 1; j <= NUM_THREADS; j++ )
{
for ( i = 0; i < NUM_THREADS; i++ )
{
data[i].d = 0;
}

gettimeofday( &start, NULL );
for ( i = 0; i < j-1; i++ )
{
pthread_create( ids+i, NULL, thread_function, (void*)i );
}

thread_function( (void*)i );

for ( i = 0; i < j-1; i++ )
{
pthread_join( ids[i], NULL );
}

gettimeofday( &end, NULL );

d = end.tv_sec - start.tv_sec + (1.0e-6)*end.tv_usec - (1.0e-6)*start.tv_usec;

printf( "%ld) %g %g ips\n", j, d, ((double)j*NUM_ITERATIONS)/d );
}

return 0;
}

We have changed our array from type long, to type Data, a struct that contains padding. This padding guarantees that one struct's data won't share a cache line with other structs, in particular structs used in other threads. Now let's run this code:

1) 15.6487 2.74462e+08 ips
2) 15.0241 5.71744e+08 ips
3) 13.5294 9.52364e+08 ips
4) 12.5411 1.36989e+09 ips

This is what we were expecting (perhaps even a little better).

So how does this effect Maple? I originally tried to write these examples in Maple, but I didn't get significant problems. This is because the overhead of Maple evaluating a statement probably clobbers the shared cache line so it will need to be reloaded in either case. However, as we make Maple faster, it may be possible for false sharing to become a problem in your Maple code.

If you want to download and try executing this code for yourself, be careful of compiler optimizations.  The function that we execute in the threads can easily be optimized into one that does not use the shared memory in each iteration.

I built this example on Linux, using

gcc falsesharing.c -lpthread -o falsesharing

Dear all,

 I have a problem in programming with maple if statement:

 restart;Qs:=24; p:=20; m:=3; q:=Qs/(2*p*m); if (q=1) then
> yc:=Qs/(2*p); elif (frem(q,1)~=0) then if (evalf(Qs/(2*p))<1) then
> yc:=1; else yc:=floor(Qs/(2*p)); end if; else yc:=round(0.8*Qs/(2*p)); end if; 

The result is yc:=0; the result must be yc:=1;

 In Matlab:

 

There is a probem in the Optimization package's (nonlinear programming problem) NLPSolve routine which affects multivariate problems given in so-called Operator form. That is, when the objective and constraints are not provided as expressions. Below is a workaround for this.

Usually, an objective (or constraint) gets processed by Optimization using automatic differentiation tools ...

It has been a while since my last post, mostly because of a combination of getting Maple 14 ready to ship and a lack of meaty topics to write about. I am trying to get back into the habit of posting more regularly. You can help me achieve my goal by posting questions about parallel programming. I'll do my best to answer. However for now, I'll give a brief overview of the new parallel programming features in Maple 14.

A new function has been added to the Task Programming Model. The Threads:-Task:-Return function allows a parallel algorithm implemented on top of the Task Programming Model to perform an early bail out. Lets imagine that you have implemented a parallel search. You are looking for a particular element in a large set of data. Using the Task Programming Model, you've created a bunch of tasks, each searching a particular subset of the data. However one of the first tasks to execute finds the element you are looking for. In Maple 13, there was no built in way of telling the other tasks that the result have been found and they they should not execute. In Maple 14, the Return function allows one task to specify a return value (which will be returned from Threads:-Task:-Start) and signal the other tasks that the algorithm is complete and that additional tasks should not be executed. Tasks that are already running will still run to completion, but tasks that have not started executing will not be started.

You may have noticed that there is a race condition with Return. What happens if two tasks both call Return in parallel? Only one of the values will become the value that is passed to Threads:-Task:-Start. I suppose I could say the "first" value is the one that is used, but really, what does that mean? If you call Return, then the value passed to Return should be an acceptable result for the algorithm.  If you call Return more than once, any of those values should be valid, thus it shouldn't matter which one becomes the return value.  That said, the Return function does give some feedback. In the task that succeeds in having its value accepted, Return will return true. In all other tasks that call Return, it will return false. This allows the code to know if a particular result was or was not accepted.

Maple 14 also adds the Task Programming Model to the C External Calling API. This means that you can write your algorithms in C and make use of the Task Programming Model. The C API is similar to the Maple API, with a few differences. In particular, you need to create each child task individually, instead of as a single call to Continue, as you would in Maple. As well, because it is C code, you need to worry about a few details like memory management that are handled automatically in Maple.  Using External Call is fairly advanced, so I won't go into too much detail here.  If you'd like to see more details of using the Task Programming Model in External Calling, I can write a seperated post dedicated to that.

As with every release of Maple, we spent some time trying to make our existing functionality faster and more stable. For parallel programming, we reduced the overhead of using the Task Programming Model, as well as reducing the locking in the kernel (which should help improve parallelism). Of course many bugs have been fixed, which should make parallel programming more reliable in Maple 14.

On another topic, I was curious about programming style in Maple.

If I declare these large matrices as local outside my procedures, can I pass them in as arguments by reference (like C++)?  Or should I just make them global?  What is the typically protocol here?

I am a little confused with exactly how arguments are passed into procedures.  More specifically, I am unsure what conditions determine whether or not a variable passed into a procedure...

Eleven years ago, one of the Maplesoft developers sent around the office this Maple language port of the first example of obfuscated code here.

This code below is text, for insertion in 1D Maple Notation, and runs in

The program mint, bundled with Maple, is a very useful syntax checker and program analyzer.

As provided, `mint` works best with Maple program source when contained in plaintext files. Inside Maple itself there is a command maplemint which does some of the same tasks as the stand-alone program `mint`. Unfortunately `maplemint` is quite a bit weaker than `mint` is, for quite a selection of procedures. Also, `maplemint` doesn't have the sort of flexible control that `mint` provides through its optional calling parameters.

I had previously posted a Maple language procedure for the purpose of calling out to `mint` while inside Maple (Standard GUI, or other). Here it is below, cleaned up a little. Hopefully it now works better across multiple operating systems, and also provides its optional parameters better.

call stack

November 22 2009 by acer 3342 Maple

Very few people would ever need this, I think, even while programming. But sometimes the details of the call stack are just what one wants.

So, for example,...

> h:=proc(x)
>   debugopts('callstack');
> end proc:

> m:=module() export f; local g;
>   f:=proc(x)
>     g(x^2);
>   end proc:
>   g:=proc(x)
>     global h;
>     h(x^3);
>   end proc:
> end module:

> m:-f(a);
[DEBUGSTACK, h, `debugopts('callstack')`, [a^6...

% in a proc

September 19 2009 by acer 3342 Maple

Could there be a useful performance gain if Maple were changed so that %, %%, and %%% did not function inside a procedure (proc) body?

Should one ever use % in a procedure? Would using it just be obfuscation where none is needed, or could it serve a special purpose?

While on this topic, how good or bad would it be to use a Standard GUI equation label inside the body of a proc that was entered in 2D Math?

acer

I was just reminded of an aspect of Maple GUI Components, new to Maple 13, that has sometimes come in very useful to me. It is the ability to refresh a component programatically.

I should explain. The old (and current default) behaviour is for the GUI not to refresh any other components until the current component is finished (ie. returns control).

The relevant situation in which this matters is when a given component has action code whose

I recently ran into an interesting twist on the infamous Maple anti-pattern:

# A very garbagey way to build a list
l := [];
for i from 1 to 10000 by 100 do
l := [op(l), isqrt(abs(floor((5000-i)/2)))];
end do;

A lot of users fall prey to this method of building a list rather than using :

# generate a list without extraneous garbage
l := [seq(isqrt(abs(floor((5000-i)/2))),i=1..10000,100)];

It would be nicer if the nprofile commandline utility had its own script in $MAPLE/bin similar to the maple and xmaple scripts.

That is the place that one usually either looks for Maple executables or appends to one's PATH. A single location makes more sense and is easier.

It would also be nicer if nprofile help-page mentioned something like the exprofile help-page's comment that, "The preferred method for creating the output  file is with writeto() and/or appendto()."

There are a number of facilities in Maple which may be extended. Included amongst those are `type`, `print`, `evalf`, and `latex`. The help-page ?extension_mechanism claims that all the built-in functions allow for extension. It also mentions a few system Library routines such as `verify` (but does not mention `latex`).

There are some descriptions of varying completeness in a few...

1 2 Page 1 of 2