## What Makes Parallel Programming Hard?

Maple

In my previous post, I tried to convince you that going parallel is necessary for high performance applications. In this post, I am going to show what makes parallel programming different, and why that makes it harder.

Here are some definitions used in this post:

• process: the operating system level representation of a running executable. Each process has memory that is protected from other processes.
• thread: within a single process each thread represents an independent, concurrent path of execution. Threads within a process share memory.

We think of a function as a series of discrete steps. Lets say a function f, is composed of steps f1, f2, f3... e.g.

```f := proc()
f1;
f2;
f3;
.
.
.
end
```

and another function g is composed of steps g1, g2, g3...

If a single threaded application runs f and g, there are two possibilities:

```f();
g();
```

or

```g();
f();
```

Leading to two possible orders for the steps, f1, f2, f3..., g1, g2, g3... or g1, g2, g3..., f1, f2, f3... Which order occurs is determined by the order that f() and g() are called in the application, which, in turn, is determined by the programmer.

```(**) f := proc( n )
(**)     local i, s;
(**)     for i to n
(**)     do
(**)         s := nprintf( "%s%d", procname, i );
(**)         print(s);
(**)     end;
(**) end proc;
f := proc(n)
local i, s;
for i to n do s := nprintf("%s%d", procname, i); print(s) end do
end proc

(**) g := eval(f);
g := proc(n)
local i, s;
for i to n do s := nprintf("%s%d", procname, i); print(s) end do
end proc

(**) f(10);
f1

f2

f3

f4

f5

f6

f7

f8

f9

f10

(**) g(10);
g1

g2

g3

g4

g5

g6

g7

g8

g9

g10
```

Now let's consider what happens if f and g are called in parallel, that is, at the same time. Although f2 will still execute before f3, there is no control over the order f2 will execute relative to g2. Thus the order could be g1, g2, f1, g3, f2, f3 .... or it could be f1, g1, f2, f3, g2, g3... or it could be any other valid order. Further each time these functions are executed, there could be a different order and given enough executions every possible order will eventually happen.

```(**) Threads:-Wait( op(map( Threads:-Create, [ 'f(10)', 'g(10)' ] )) );
f1

f2

g1

f3

g2

f4

g3

f5

g4

f6

g5

f7

g6

f8

g7

f9

g8

f10

g9

g10
```

Try executing this code multiple times, you'll see a series of different orderings printed.

Suppose that fi must execute before gj for the code to execute correctly, for any i and j. Then executing f and g in parallel will occasionally fail, even if fi is the first step of f and gj is the last step of g.  Every possible ordering will eventually occur. Therefore to write correct parallel code the programmer must guarantee that the code is correct for all possible orderings.

Consider the following example:

```(**) f := proc( n )
(**)     local i;
(**)     global common_variable;
(**)
(**)     for i from 1 to n
(**)     do
(**)         common_variable := common_variable+1;
(**)     end;
(**)     NULL;
(**) end proc;
f := proc(n)
local i;
global common_variable;
for i to n do common_variable := common_variable + 1 end do; NULL
end proc

(**) g := eval(f);
g := proc(n)
local i;
global common_variable;
for i to n do common_variable := common_variable + 1 end do; NULL
end proc

(**) common_variable := 0;
common_variable := 0

(**) print( common_variable );
1684
```

In this example both f and g increment a global variable 1000 times. Thus we would expect the final value of the global variable would be 2000. Clearly it is not. What happened? Within the for loop, there are two steps:

```1: tmp := common_variable;
2: common_variable := tmp + 1;
```

If these steps execute in this order:

```f1: ftmp := common_variable;
g1: gtmp := common_variable;
g2: common_variable := gtmp+1;
f2: common_varialbe := ftmp+1;
```

the increment performed by g is lost.

There are orders where the total will be 2000, and in fact, every value between 1000 and 2000 could occur. So even for this simple example there are 1000 different possible outcomes, and even more possible orderings.

The order in which single threaded code executes is determined by the programmer. This makes the execution of a single threaded application relatively predictable. When run in parallel, the order in which the code executes is not predictable. This makes it much harder to guarantee that code will execute correctly. ﻿