Maplesoft Blog

The Maplesoft blog contains posts coming from the heart of Maplesoft. Find out what is coming next in the world of Maple, and get the best tips and tricks from the Maple experts.

Latest Posts Latest Blog Posts Feed

Consider the following C code:

#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/time.h>

#define NUM_THREADS 4
#define NUM_ITERATIONS (1L << 32)

long *data;

void *thread_function( void *args )
{
long i;
long j = (long)args;

for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j] = i;
}

return NULL;
}

int main( int argc, char **argv )
{
long i, j;
pthread_t ids[NUM_THREADS-1];
struct timeval start, end;
double d;

data = (long*)malloc( sizeof(long)*NUM_THREADS );

for ( j = 1; j <= NUM_THREADS; j++ )
{
for ( i = 0; i < NUM_THREADS; i++ )
{
data[i] = 0;
}

gettimeofday( &start, NULL );
for ( i = 0; i < j-1; i++ )
{
pthread_create( ids+i, NULL, thread_function, (void*)i );
}

thread_function( (void*)i );

for ( i = 0; i < j-1; i++ )
{
pthread_join( ids[i], NULL );
}

gettimeofday( &end, NULL );

d = end.tv_sec - start.tv_sec + (1.0e-6)*end.tv_usec - (1.0e-6)*start.tv_usec;

printf( "%ld) %g %g ips\n", j, d, ((double)j*NUM_ITERATIONS)/d );
}

return 0;
}

Basically this code

   for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j] = i;
}

runs in parallel for 1 to NUM_THREADS threads and prints how long it takes to run and the number of iterations executed per second. data is an array and j is a thread specific index into the array. Thus each thread is given its own memory address. This is important to recognize, the threads are not sharing data.

As we increase the number of threads (up to the number of cores), the running time should stay the same, but the number of iterations per second should increase linearly (remember each thread does NUM_ITERATIONS, so adding a new thread adds NUM_ITERATIONS of work). Let's see what happens when we actually run this code:

1) 15.758 2.72558e+08 ips
2) 28.8921 2.97311e+08 ips
3) 40.1948 3.20561e+08 ips
4) 46.6236 3.6848e+08 ips

Well, that is nothing like what we were expecting. The running time increases significantly and the iterations per second increase only slightly. It would appear as though the threads are interfering with each other, however we can see that they don't actually share any data. What's happening?

This is an example of false sharing. False sharing occurs when multiple threads modify data that share the same cache line. I'm not going to go into too much detail about caching, see this wikipedia article for more details. Briefly, each core has a small private cache that stores a copy of a region of main memory (accessing the local copy is faster than accessing main memory). When the corresponding region in main memory changes, the value stored in the cache is no longer correct. Thus if the core wants to use anything in that memory region, it needs to make a new copy. When multiple threads share a single cache line, modification of the memory in one thread will force the other threads to refresh their caches. This refresh forces the threads to wait for memory transfers far more then they would if the thread was running exclusively. If there are enough threads with enough memory requests it is also possible to saturate the memory bus, which can delay memory transfers even more.

To fix this we need to make sure that the threads aren't sharing cache lines. The easiest way to fix a problem like this is to add padding around the data.

#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/time.h>

#define NUM_THREADS 4
#define NUM_ITERATIONS (1L << 32)

#define CACHE_SIZE 128

typedef struct {
long d;
char buffer[CACHE_SIZE-sizeof(long)];
} Data;

Data *data;

void *thread_function( void *args )
{
long i;
long j = (long)args;

for ( i = 0; i < NUM_ITERATIONS; i++ )
{
data[j].d = i;
}

return NULL;
}

int main( int argc, char **argv )
{
long i, j;
pthread_t ids[NUM_THREADS-1];
struct timeval start, end;
double d;

data = (Data*)malloc( sizeof(Data)*NUM_THREADS );

for ( j = 1; j <= NUM_THREADS; j++ )
{
for ( i = 0; i < NUM_THREADS; i++ )
{
data[i].d = 0;
}

gettimeofday( &start, NULL );
for ( i = 0; i < j-1; i++ )
{
pthread_create( ids+i, NULL, thread_function, (void*)i );
}

thread_function( (void*)i );

for ( i = 0; i < j-1; i++ )
{
pthread_join( ids[i], NULL );
}

gettimeofday( &end, NULL );

d = end.tv_sec - start.tv_sec + (1.0e-6)*end.tv_usec - (1.0e-6)*start.tv_usec;

printf( "%ld) %g %g ips\n", j, d, ((double)j*NUM_ITERATIONS)/d );
}

return 0;
}

We have changed our array from type long, to type Data, a struct that contains padding. This padding guarantees that one struct's data won't share a cache line with other structs, in particular structs used in other threads. Now let's run this code:

1) 15.6487 2.74462e+08 ips
2) 15.0241 5.71744e+08 ips
3) 13.5294 9.52364e+08 ips
4) 12.5411 1.36989e+09 ips

This is what we were expecting (perhaps even a little better).

So how does this effect Maple? I originally tried to write these examples in Maple, but I didn't get significant problems. This is because the overhead of Maple evaluating a statement probably clobbers the shared cache line so it will need to be reloaded in either case. However, as we make Maple faster, it may be possible for false sharing to become a problem in your Maple code.

If you want to download and try executing this code for yourself, be careful of compiler optimizations.  The function that we execute in the threads can easily be optimized into one that does not use the shared memory in each iteration.

I built this example on Linux, using

gcc falsesharing.c -lpthread -o falsesharing
Maplesoft Employee

Understanding the Question

July 23 2010 by rlopez 263 Maple

The hardest and/or most important part of answering a question is making sure the real question is understood. The July 1, 2010 question Using fsolve with a dispersion relation posted to MaplePrimes seemed to be about obtaining a numeric solution of an equation. Turns out it was more a question about the behavior of an implicit function.

This week, I had the pleasure of attending a rock concert with my son Eric who is now about to turn 15 and who has turned out to possess non-trivial interests and talents in music. The concert was by the band Rush who, to the uninitiated, would be yet another big, loud, over-produced rock band. But to a generation of technocrats (e.g. yours truly) educated from the late 1970’s and on, they are the band of choice due to an intriguing mix of musicianship, technological...

The greatest benefits from bringing Maple into the classroom are realized when the static pedagogy of a printed textbook is enlivened by the interplay of symbolic, graphic, and numeric calculations made possible by technology.  It is not enough merely to compute or check answers with Maple.  To stop after noting that indeed, Maple can compute the correct answer is not a pedagogical breakthrough.

...

It has been a while since my last post, mostly because of a combination of getting Maple 14 ready to ship and a lack of meaty topics to write about. I am trying to get back into the habit of posting more regularly. You can help me achieve my goal by posting questions about parallel programming. I'll do my best to answer. However for now, I'll give a brief overview of the new parallel programming features in Maple 14.

A new function has been added to the Task Programming Model. The Threads:-Task:-Return function allows a parallel algorithm implemented on top of the Task Programming Model to perform an early bail out. Lets imagine that you have implemented a parallel search. You are looking for a particular element in a large set of data. Using the Task Programming Model, you've created a bunch of tasks, each searching a particular subset of the data. However one of the first tasks to execute finds the element you are looking for. In Maple 13, there was no built in way of telling the other tasks that the result have been found and they they should not execute. In Maple 14, the Return function allows one task to specify a return value (which will be returned from Threads:-Task:-Start) and signal the other tasks that the algorithm is complete and that additional tasks should not be executed. Tasks that are already running will still run to completion, but tasks that have not started executing will not be started.

You may have noticed that there is a race condition with Return. What happens if two tasks both call Return in parallel? Only one of the values will become the value that is passed to Threads:-Task:-Start. I suppose I could say the "first" value is the one that is used, but really, what does that mean? If you call Return, then the value passed to Return should be an acceptable result for the algorithm.  If you call Return more than once, any of those values should be valid, thus it shouldn't matter which one becomes the return value.  That said, the Return function does give some feedback. In the task that succeeds in having its value accepted, Return will return true. In all other tasks that call Return, it will return false. This allows the code to know if a particular result was or was not accepted.

Maple 14 also adds the Task Programming Model to the C External Calling API. This means that you can write your algorithms in C and make use of the Task Programming Model. The C API is similar to the Maple API, with a few differences. In particular, you need to create each child task individually, instead of as a single call to Continue, as you would in Maple. As well, because it is C code, you need to worry about a few details like memory management that are handled automatically in Maple.  Using External Call is fairly advanced, so I won't go into too much detail here.  If you'd like to see more details of using the Task Programming Model in External Calling, I can write a seperated post dedicated to that.

As with every release of Maple, we spent some time trying to make our existing functionality faster and more stable. For parallel programming, we reduced the overhead of using the Task Programming Model, as well as reducing the locking in the kernel (which should help improve parallelism). Of course many bugs have been fixed, which should make parallel programming more reliable in Maple 14.

The term “from months to days” is a favorite slogan of mine and I have relied on it religiously for over two decades to illustrate the fundamental benefit of symbolic computation. Whether it’s the efficient development of complex physical models using MapleSim, or exploration of parametric design surface equations (my dissertation) using good old fashioned Maple V Release 2, the punch that symbolic computation provided was to automate the algebraic mechanics...

 

In a recent blog post, I pointed out that Maple did not have a built-in functionality for drawing graphs that arise in computing volumes by slices. However, I did provide several examples of ad-hoc visualizations that one could build with the graphing tools in Maple.

 

Recently, a user called attention to a weakness in the Student Calculus 1 command, VolumeOfRevolution. This command (and the tutor built on it) will draw a surface of revolution bounded by the surfaces generated by revolving the graph of one or two functions.

Points and lines, and the relationships between them, are essential ingredients of so many problems in, for example, calculus. In particular, obtaining the equation of the perpendicular bisector of a line segment, dropping a perpendicular from a point to a given line, and calculating the distance from a point to a line are three tasks treated in elementary analytic geometry that recur in the applications....

I spend much of my time traveling for business. These trips often last a week, and we try to visit as many potential customers as possible, and in the most efficient order. This involves matching our hosts' calendars with our own, booking the most cost effective travel options, and coping with last-minute cancellations and changes. It isn’t easy!

This has become so much easier with the advent of shareable calendars and mapping services, like Google Maps. ...

My wife will testify that I am horrible when it comes to keeping things organized and tidy. My colleagues who have seen my office can attest to this as well. My usual defence is that a messy environment is an indication of how busy you are (consequently how productive you are) and basic creativity. But every once in a while, usually when I hit a mental block, I launch into clean up mode to do something completely different hoping that when I’m done, my mental block will be gone. I just went through one of these moments. This time, my cleansing took me to the bottom of one of my office desk drawers to a pile of photos that I had stashed in there ten years ago. Glancing through these, four immediately jumped out and helped me flash back to some key moments in my life. Yes, my 15 minute sabbatical digging through my desk was one of the most productive quarter hours I’ve had in a long time. Here, then, are these four photos that respectively offer a compelling reason to reflect a bit on the past ...

Back in July of 2005, one of the early Tips & Techniques articles (since updated) in the Maple Reporter was a comparison of two different approaches to fitting a circle to 3D data points. The impetus for the comparison was Carl Cowen's article on the subject. His approach was algebraic - he used the singular value decomposition to obtain a basis for the...

Maplesoft Employee

Parametrizing a Curve

May 13 2010 by rlopez 263

3

0

2

In 1988, Keith Geddes and others involved with the Maple project at the University of Waterloo published a Maple Calculus Workbook of interesting calculus problems and their solutions in Maple. Over the years, I've paged through this book, extracting some of its more unique problems. Recently, I extracted the following problem from this book, and added it to my Clickable Calculus collection, which I use for workshops and web-based presentations.

Maplesoft Employee

Stepwise Solutions in Maple

May 12 2010 by rlopez 263

1

0

0

Three recent articles in the Tips & Techniques series addressed the question of stepwise solutions in Maple. Just what is it that Maple provides by way of stepwise solutions for standard calculations in the mathematical curricula? There are commands, assistants, tutors, and task templates that provide stepwise calculations in precalculus, calculus, linear algebra, and vector calculus. In addition, since Maple can implement nearly any mathematical operation, any stepwise calculation can be reproduced in Maple by assembling the appropriate intermediate steps, just as they would be assembled when working with pencil and paper.

Maplesoft Employee

2

0

2

I have to thank my friend John Wass, an editor from Scientific Computing magazine who began a recent article with the clear warning “Attention Engineers! The developers at Maplesoft rarely sit still for very long.”  This was a comment on the thrilling speed that enhancements are flowing from the MapleSim pipeline. Although his quote refers to a MapleSim 3 article he wrote, I chuckled as the sentiment still rings true as my colleagues and I catch our breaths after the recent release of MapleSim 4.  Yes, the engineering community has definitely taken notice that MapleSim, in such a short amount of time, is already making a big difference in the way we do and think about modeling.

Sometimes the obvious escapes me, and it’s only due to some chance observation that I realize the same fundamental principles are everywhere.

A short time ago, I created a simple hydraulic network in MapleSim, and after experimenting with some of the parameters, found it gave the same behaviour as an electric circuit I’d modeled earlier.

1 2 3 4 5 6 7 Page 1 of 9