I'd like to start by thanking all those readers who left feedback on my last post. It was good hear that most of you enjoy reading my posts and that they are generally helpful. I would like to encourage you to continue posting feedback, especially questions or comments about anything that I fail to explain sufficiently.
The following is a discussion of the limitations of parallel programming in Maple. These are the issues that we are aware of and are hoping to fix in future releases.
So far I've been discussing the tools that Maple provides for parallel programming. However we must also consider the other Maple routines to see if they are thread safe (for more information on thread safety see my previous post on this topic). As of Maple 13, we have verified that all the routines implemented in the kernel are thread safe, and thus should work properly when run in parallel. The Math Library, the code written directly in the Maple language, has not yet been verified as thread safe. This means that library functions called during parallel processing may not function correctly. To fix this, we are going through the library to identify routines that are thread safe, and determine how best to fix those that are not. However the library is a huge amount of code, so this process is going to take time. Given these limitations, the parallel programming routines should be viewed in a similar vein as Compiler or evalhf, they currently work with a subset of Maple's functionality.
The Maple garbage collector has worked well for a long time, however it was never designed to deal with multiple threads. In developing the multi-threaded kernel, we were able to make the old collector work, however it does not perform particularly well. The only fix for this is to replace the collector. We have begun working on a new garbage collector, however this is a significant change to the kernel and so it will not be available for a release or two.
In Maple 13, a garbage collection mark and sweep is the only stop dead event in the kernel.
There are many locks in the kernel, but most will not have a significant performance impact. We have been optimizing the kernel to allow computations to be as lock free as possible. However the following locks can cause significant blocking:
Currently Maple assumes that external calls are not thread safe, thus it will not allow more than one thread into external call at a time. However the ?define_external routine accepts the THREAD_SAFE option. This tells Maple that an external function is thread safe, thus Maple will allow multiple threads to call this routine at the same time. This lock effects Maple library routines that utilize external call, including functions built using the Compiler. As we verify that external routines are thread safe (and fix those that are not), we will add this flag as necessary.
There is a lock around the communication with the interface. For computations that are interacting with the interface (printing, system commands, embedded components, etc) this can cause threads to block. An obvious example of this is creating a "background" thread while the main thread is waiting at the prompt. In this case, if the background thread attempts to communicate to the interface, it will be blocked by the main thread.
Mutable data structures (tables, rtables, global names, etc) require locking a mutex whenever a value is written to or read from such a structure. This is necessary to allow these assignments to occur correctly. This may lead to contention issues. However when reading from immutable data structures, locking is not required. Therefore for shared values that are only used for reading, it may be preferable to use an immutable data structure instead. For example a list may be better than an rtable.
The Maple memory allocator has both a thread local component and a global component. A thread can allocate memory from the thread local component without needing to acquire a lock. However a local component is feed memory blocks from the global component, and accessing the global component does require locking. So there is some locking associated with memory allocation. This, combined with the garbage collection issues, can make memory intensive algorithms hard to parallelize in Maple 13. The garbage collector work will also include a new memory management system.
The debugger lacks explicit support for parallel debugging. However it does work on a single thread much like it works in single threaded mode. Where a problem can occur is if more that one thread starts running the debugger at a time. In Maple 13 you will get two independent debugging sessions for each thread, without any obvious indication about which session is attached to which thread. Further, when one thread is in the debugger, other threads continue to execute.
These are the issues that have caused problems in the past. As we fix these problems, new issues will be uncovered. Further, as users begin to use the parallel programming tools, they will discover issues as well. This is another reason I started this blog, to encourage users to start experimenting with the parallel programming tools, in hopes of generating more reports of real world experiences. These reports are very important to help guide our development.