Maple

In the previous post, I described why parallel programming is hard. Now I am going to start describing techniques for writing parallel code that works correctly.

First some definitions.

• thread safe: code that works correctly even when called in parallel.
• critical section: an area of code that will not work correctly if run in parallel.
• shared: a resource that can be accessed by more than one thread.
• mutex: a programming tool that controls access to a section of code

Our goal, then, is to write thread safe code. In my previous post I described how parallel programming allows for a huge number of possible orderings for the execution of code. I also showed that some orderings could lead to code working incorrectly. To write thread safe code, we must determine the sections of code where the order of execution is critical and then enforce the order over those sections or determine a way to write code that avoids having unsafe sections. The dangerous sections of code are referred to as critical sections. Critical sections are occur when multiple threads want to access a shared resource. Usually the shared resource is memory (global variables, etc) however they could be other shared resources, such as file handles.

Any value that is not shared can be used safely. Thus values referenced only by function locals are safe. Likewise function parameters are safe, unless the argument is a reference to a shared data structure. Finally, calling a thread safe function is also thread safe.

In many cases, the above restrictions are going to be too tight. Parallel algorithms often need threads to share data. So then the question becomes, how do we share data safely?

Consider this example from my previous post:

```(**) f := proc( n )
(**)     local i;
(**)     global common_variable;
(**)
(**)     for i from 1 to n
(**)     do
(**)         common_variable := common_variable+1;
(**)     end;
(**)     NULL;
(**) end proc;
f := proc(n)
local i;
global common_variable;
for i to n do common_variable := common_variable + 1 end do; NULL
end proc

(**) g := eval(f);
g := proc(n)
local i;
global common_variable;
for i to n do common_variable := common_variable + 1 end do; NULL
end proc

(**) common_variable := 0;
common_variable := 0

(**) print( common_variable );
1684
```

We identified the flaw in this code as the possibility that the following two instructions could be split apart:

```1: tmp := common_variable;
2: common_variable := tmp + 1;
```

If we modified this example to guarantee that these instructions cannot be separated, it should be correct. To do this, we'll introduce the idea of a "mutex". A mutex is a structure that allows programmers to prevent more that one thread from entering a section of code. A mutex has two states, locked and unlocked. A thread that wants to enter the section of code acquires the lock before entering the region. When the thread leaves the region, it releases the lock. If another thread wants to enter the section, it needs to acquire the lock. However if the mutex is already locked, then the thread must wait until the mutex is unlocked.

Maple has a Mutex sub-package of the Threads package. So we can fix this example as follows (I've also simplified the code a bit):

```(**) f := proc( m,n )
(**)     local i;
(**)     global common_variable;
(**)
(**)     for i from 1 to n
(**)     do
(**)         common_variable := common_variable+1;
(**)     end;
(**)     NULL;
(**) end proc;
f := proc(m, n)
local i;
global common_variable;
for i to n do
common_variable := common_variable + 1;
end do;
NULL
end proc

(**) common_variable := 0;
common_variable := 0

(**)
m := 1

(**) print( common_variable );
2000
```

If you run this example multiple times, it should return 2000 every time.

However using a mutex can be problematic. By creating a mutual exclusion zone, you are inherently limiting parallelism. In fact, the parallelism of the above example is very low, as most of the work is performed while holding the lock. In addition, locking and unlocking a mutex can also be relatively slow. So using a mutex in the naive way can seriously limit the performance gains of going parallel.

Thus it is often better to avoid using a mutex. One common way to avoid excessive locking is to use a divide and conquer approach. If a problem can be divided into smaller chunks, then each thread can be given its own distinct part of the problem to work on. Once each of the sub-problems is solved, the results can be combined into the solution. If the sub-problems can be solved without passing data between threads, then there is no need to use a mutex while solving the sub-problems. Consider this slightly less trivial example:

```(**) f := proc( m,A,first,last )
(**)     local i, l;
(**)     global common_variable;
(**)
(**)     l := 0;
(**)     for i from first to last
(**)     do
(**)         l := l+A[i];
(**)     end;
(**)
(**)     common_variable := common_variable + l;
(**) end proc:
(**) common_variable := 0:
(**) A := LinearAlgebra:-RandomVector( 2000 ):
(**) print( common_variable );
1932

(**) print( add( A[i], i=1..2000 ) );
1932
```

Here each thread computes the total of its assigned sub-region of the array. However because the programmer gave each thread a disjoint region of the array, the threads do not need to worry about locking as they access A. Each thread only locks the mutex once at the end of the computation to update the shared variable. Thus we can greatly reduce the amount of locking used by dividing the problem into disjoint sub-problems.

Thread safe coding requires controlling access to shared resources. There are two main approaches, access to the shared resource can be limited using tools like a mutex or code can be written to avoid conflicts. Avoiding conflicts is often a better strategy, however it can be tricky to structure the code in this way and there are times when it is not possible.