Maplesoft Blogger Profile: Darin Ohashi

Senior Software Developer

I am a Senior Software Developer in the Kernel Group, working on the Maple language interpreter. I have been working at Maplesoft since 2001 on many aspects of the Kernel, however recently I have been focusing on enabling parallel programming in Maple. I have added various parallel programming tools to Maple, and have been trying to teaching parallel programming techniques to Maple programmers. I have a Master's degree in Mathematics (although really Computer Science) from the University of Waterloo. My research focused on Algorithm and Data Structure, Design and Analysis.

Posts by Darin Ohashi

This post will explain how to configure the compiler and other tools that will be necessary for you to build the External Calling examples that will come in later posts.  This is an advanced topic and so this post is fairly complex.

First, I am going to be using the compilers via the command line, so you will need to familarize yourself with the terminal program on your particular OS.  You'll have to do this for yourself, but here are a few starting points:

Windows

Apple

I am going to assume that Linux and Solaris users are familar with using the terminal.

For Linux, Apple and Solaris, I am going to use gcc as the compiler.  For Linux you should use your distribution's package management system to get it, for Apple you need to install Xcode and for Solaris, well, gcc is probably already installed or you'll want to talk to you sys admin to have it installed (or if you are your own sys admin, you probably know how to install gcc for yourself).  For Windows, you need to install the Windows Software Development Kit.  If you already have a copy of Visual Studio C++ (Express or Professional) installed, then you already have these tools.

I am also going to use the "make" program to manage the building of the examples, thus you will need to install a version of make as well (you won't need to learn how make works unless you want to modify the examples).  I will be using gnu make, which should be easy to install on Linux and Solaris (similar to how you installed gcc) and it is included in Xcode for Apple.  For Windows, use this:

http://gnuwin32.sourceforge.net/packages/make.htm

Installing 32 bit make on 64 bit windows is fine.

Now you'll need to launch a terminal.  For Linux, Apple and Solaris this should be easy, on Windows go to the Windows SDK folder (or Windows Visual Studio folder) on the Start menu, there should be an icon for Windows SDK Command Prompt.  Click that to launch the terminal.  This version of the terminal has the environment configured to run the compiler.

On Windows you'll also have to add the location you installed make to your path, which can be done on 32 bit windows like this:

path=%PATH%;C:\Program Files\GnuWin32\bin

and on 64 bit Windows like this:

path=%PATH%;C:\Program Files (x86)\GnuWin32\bin

assuing you used the default install location for make.

You can test this by running "make" in the terminal.  If everything is set up correctly, make should run but not find a Makefile and it will raise an error.  If the path is not set properly, make won't be found you'll get a message saying that.

Path not set properly:

C:\Program Files\Microsoft SDKs\Windows\v7.1>make
'make' is not recognized as an internal or external command,
operable program or batch file.

Set the path:

C:\Program Files\Microsoft SDKs\Windows\v7.1>path=%PATH%;"C:\Program Files (x86)
\GnuWin32\bin"

Make is now found, but there is no makefile in the current directory

C:\Program Files\Microsoft SDKs\Windows\v7.1>make
make: *** No targets specified and no makefile found.  Stop.

As a final test, I've attached a small example (test.zip) that contains a Makefile and a simple source file.  If you extract the files to a new directory, go to that new directoy in the terminal and run make (with make added to the path as described above) it should build an executable (test or test.exe).  You can run the executable by executing "test" on the command line.

By default the Makefile is configured for Windows, so Windows users won't need to change it, however other users will need to comment out the

WINDOWS=true

line in Makefile by changing it to

#WINDOWS=true

I know this is a little confusing, especially if you are not familar with the command line interface, therefore I encourage you post replies if you have problems.  Hopefully we will be able to answer your questions.  Once everyone has figured out how to get this simple example to compile and run on their system, the upcoming external calling examples will be (relatively) easy.

Good Luck!

Darin

I've had a few request to provide some more information on External Calling, so I thought I would make a few posts about it. This first post will be a high level description of External Calling and how it works, with examples coming later. As External Calling is an advanced topic, I am going to assume you know how to compile a shared library and are generally familiar with the C language. Although this first post won't require any real programming knowledge.

What is External Calling?

External Calling is the name for Maple's ability to connect to and call functions from other programming languages. Maple uses this for various reasons. We have written our own libraries in C, C++ and Java to solve particular problems. We partner with various labs around the world who have developed code, often in languages like C or C++, so external calling is used to interface with their code. We also connect to high performance libraries like NAG and BLAS to provide those high performance routines in Maple. Of course, you can use External Calling to connect Maple to your code as well.

Although Maple can call various programming languages, the most common languages we connect to are C and C++, and those are the languages I am going to focus on.

How does it work?

In Maple, you call ?define_external or use the ?ExternalCalling package. Both these methods take a description of the function that you want to call and returns a Maple procedure. Normally you would assign the procedure to a name and then call the externally defined function just like any other Maple procedure.

There are a couple different ways to use define_external to connect to a shared library, the differences are mostly concerned with how the parameters given in Maple are converted to parameters used in the external function.

  • Wrapperless external calling. With wrapperless external calling, Maple calls a function implemented in the shared library by automatically converting the values given in Maple into valid types for the external function.
  • Generated wrappers: With generated wrappers, Maple automatically generates a small C library that handles conversions from Maple values to the values used in the external function. Using generated wrappers allows Maple to handle more data types, like call back procedures.
  • Custom wrappers: A custom wrapper is a C function that you write yourself. This function accepts arguments as Maple data structures and returns a Maple data structure. You are responsible for converting the Maple data structures into whatever forms you need and converting your computed value back into a Maple data structure. Maple provides the External Calling API to assist in working with Maple from the externally defined function.

The first two forms of external calling are the easiest to do, however they are also the most limited. Internally we exclusively (I think) use the third, custom wrapper, form of external calling. That is the form I am going to talk about.

Custom Wrapper

The name "custom wrapper" is a bit of a misnomer. The function that you write does not need to "wrap" anything, it can implement anything you want. As long as you can convert the result into a Maple data structure, you can pass it back into Maple. In fact Maple also supports returning generic data, via the ?MaplePointer routines, but that is a more complex topic for a later blog post.

Your external function is simply a C function with the following calling convention:

ALGEB CustomWrapper( MKernelVector kv, ALGEB args )

ALGEB is the C data type that represents a Maple data structure. The MKernelVector is a data structure that acts as an intermediary between your external calling routines and the Maple engine. You will need to pass this structure back into the External Calling API functions. Both of these types, plus the External Calling API functions are defined in a header, maplec.h, that needs to be included in your code. I will provide more details when I provide examples.

The External Calling API

The External Calling API is a set of functions that we make available for working with the Maple Engine from external code. Maple also allows third party applications to load the Maple engine as a shared library, we call this ?OpenMaple. The External Calling functions are also available in OpenMaple, so you will often see OpenMaple used in the Maple help pages. Most functions can be used in both OpenMaple and External Calling, except for a few that are OpenMaple specific and involve starting and stopping the Maple Engine.

Maple's help system documents all the External Calling functions so you can see what is available. There is an overview of the external calling functions on this page, ?ExternalCalling,C,API. Briefly, however there are functions for converting Maple types to C and back, creating and interacting with Maple data structures (list, set, rtable, table, string, etc), creating and interacting with Maple language elements (names, procedures, etc), printing to the Maple interface, memory allocation, evaluating Maple statements and raising exceptions. There is even a C interface to the Task Programming Model.

Next Time...

In my next post I will provide some examples of using the External Calling API to actually do stuff in an externally defined procedure.  However, I am going to spend some time trying to figure out the easiest way for you to get the tools you'll need to be able to develop externally defined functions yourself, so my next post might take a bit of time.

Darin

Consider the following C code:

It has been a while since my last post, mostly because of a combination of getting Maple 14 ready to ship and a lack of meaty topics to write about. I am trying to get back into the habit of posting more regularly. You can help me achieve my goal by posting questions about parallel programming. I'll do my best to answer. However for now, I'll give a brief overview of the new parallel programming features in Maple 14.

A new function has been added to the Task Programming Model. The Threads:-Task:-Return function allows a parallel algorithm implemented on top of the Task Programming Model to perform an early bail out. Lets imagine that you have implemented a parallel search. You are looking for a particular element in a large set of data. Using the Task Programming Model, you've created a bunch of tasks, each searching a particular subset of the data. However one of the first tasks to execute finds the element you are looking for. In Maple 13, there was no built in way of telling the other tasks that the result have been found and they they should not execute. In Maple 14, the Return function allows one task to specify a return value (which will be returned from Threads:-Task:-Start) and signal the other tasks that the algorithm is complete and that additional tasks should not be executed. Tasks that are already running will still run to completion, but tasks that have not started executing will not be started.

You may have noticed that there is a race condition with Return. What happens if two tasks both call Return in parallel? Only one of the values will become the value that is passed to Threads:-Task:-Start. I suppose I could say the "first" value is the one that is used, but really, what does that mean? If you call Return, then the value passed to Return should be an acceptable result for the algorithm.  If you call Return more than once, any of those values should be valid, thus it shouldn't matter which one becomes the return value.  That said, the Return function does give some feedback. In the task that succeeds in having its value accepted, Return will return true. In all other tasks that call Return, it will return false. This allows the code to know if a particular result was or was not accepted.

Maple 14 also adds the Task Programming Model to the C External Calling API. This means that you can write your algorithms in C and make use of the Task Programming Model. The C API is similar to the Maple API, with a few differences. In particular, you need to create each child task individually, instead of as a single call to Continue, as you would in Maple. As well, because it is C code, you need to worry about a few details like memory management that are handled automatically in Maple.  Using External Call is fairly advanced, so I won't go into too much detail here.  If you'd like to see more details of using the Task Programming Model in External Calling, I can write a seperated post dedicated to that.

As with every release of Maple, we spent some time trying to make our existing functionality faster and more stable. For parallel programming, we reduced the overhead of using the Task Programming Model, as well as reducing the locking in the kernel (which should help improve parallelism). Of course many bugs have been fixed, which should make parallel programming more reliable in Maple 14.

Atomic operations are CPU instructions that are guaranteed to execute in a single CPU cycle. This means that the operation is guaranteed to complete without being interrupted by the actions of another thread. Although this may not sound too exciting, careful programming using these instructions can lead to algorithms and data structures that can be used in parallel without needing locks. Maple currently does not support atomic operations, however they are an interesting tool and are used in the kernel to help improve Maple's parallelism in general.

In this post I'll take a closer look at the ways in which Maple code can be thread unsafe. If you have not already seen my post on Thread Safety, consider reading that post first. As a brief review, a procedure is thread safe if it works correctly when run in parallel.

The most obvious way in which procedures can be thread unsafe is if they share data without synchronizing access (using a Mutex, for example). So how can two threads share data?

I'd like to start by thanking all those readers who left feedback on my last post. It was good hear that most of you enjoy reading my posts and that they are generally helpful. I would like to encourage you to continue posting feedback, especially questions or comments about anything that I fail to explain sufficiently.

The following is a discussion of the limitations of parallel programming in Maple. These are the issues that we are aware of and are hoping to fix in future releases.

In my previous posts I have discussed various difficulties encountered when writing parallel algorithms. At the end of the last post I concluded that the best way to solve some of these problems is to introduce a higher level programming model. This blog post will discuss the Task Programming Model, the high level parallel programming model introduced in Maple 13.

In my previous posts I discussed the basic difference between parallel programming and single threaded programming. I also showed how controlling access to shared variables can be used to solve some of those problems. For this post, I am going to discuss more difficulties of writing good parallel algorithms.

Here are some definitions used in this post:

  • scale: the ability of a program to get faster as more cores are available
  • load balancing: how effectively work is distributed over the available cores
  • coarse grained parallelism: parallelizing routines at a high level
  • fine grained parallelism: parallelizing routines at a low level

Consider the following example

In the previous post, I described why parallel programming is hard. Now I am going to start describing techniques for writing parallel code that works correctly.

First some definitions.

  • thread safe: code that works correctly even when called in parallel.
  • critical section: an area of code that will not work correctly if run in parallel.
  • shared: a resource that can be accessed by more than one thread.
  • mutex: a programming tool that controls access to a section of code

In my previous post, I tried to convince you that going parallel is necessary for high performance applications. In this post, I am going to show what makes parallel programming different, and why that makes it harder.

Here are some definitions used in this post:

  • process: the operating system level representation of a running executable. Each process has memory that is protected from other processes.
  • thread: within a single process each thread represents an independent, concurrent path of execution. Threads within a process share memory.

We think of a function as a series of discrete steps. Lets say a function f, is composed of steps f1, f2, f3... e.g.

Computers with multiple processors have been around for a long time and people have been studying parallel programming techniques for just as along. However only in the last few years have multi-core processors and parallel programming become truly mainstream. What changed?

Here are some definitions for terms used in this post:

  • core: the part of a processor responsible for executing a single series of instructions at a time.
  • processor: the physical chip that plugs into a motherboard. A computer can have multiple processors, and each processor can have multiple cores
  • process: a running instance of a program. A process's memory is usually protected from access by other processes.
  • thread: a running instance of a process's code. A single process can have multiple threads, and multiple threads can be executing at the same on multiple cores
  • parallel: the ability to utilize more than one processor at a time to solve problems more quickly, usually by being multi-threaded.

For years, processors designers had been able to increase the performance of processors by increasing their clock speeds. However a few years ago they ran into a few serious problems. RAM access speeds were not able to keep up with the increased speed of processors, causing processors to waste clock cycles waiting for data. The speed at which electrons can flow through wires is limited, leading to delays within the chip itself. Finally, increasing a processor's clock speed also increases its power requirements. Increased power requirements leads to the processor generating more heat (which is why overclockers come up with such ridiculous cooling solutions). All of these issues meant that is was getting harder and harder to continue to increase clock speeds.  The designers realized that instead of increasing the core's clock speed, they could keep the clock speed fairly constant, but put more cores on the chip. Thus was born the multi-core revolution.

My name is Darin Ohashi and I am a senior kernel developer at Maplesoft. For the last few years I have been focused on developing tools to enable parallel programming in Maple. My background is in Mathematics and Computer Science, with a focus on algorithm and data structure design and analysis. Much of my experience with parallel programming has been acquired while working at Maplesoft, and it has been a very interesting ride.

In Maple 13 we added the Task Programming Model, a high level parallel programming system. With the addition of this feature, and a few significant kernel optimizations, useful parallel programs can now be written in Maple. Although there are still limitations and lots more work to be done on our side, adventurous users may want to try writing parallel code for themselves.

To encourage those users, and to help make information about parallel programming more available, I have decided to write a series of blog posts here at Maple Primes. My hope is that I can help explain parallel programming in general terms, with a focus on the tools available in Maple 13. Along the way I may post links to sites, articles and blogs that discuss parallel programming issues, as well as related topics, such as GPU programming (CUDAOpenCL, etc).

My next post, the first real one, I am going to explain why parallel programming has suddenly become such an important topic.

Page 1 of 1