## Atomic Operations

Maple

Atomic operations are CPU instructions that are guaranteed to execute in a single CPU cycle. This means that the operation is guaranteed to complete without being interrupted by the actions of another thread. Although this may not sound too exciting, careful programming using these instructions can lead to algorithms and data structures that can be used in parallel without needing locks. Maple currently does not support atomic operations, however they are an interesting tool and are used in the kernel to help improve Maple's parallelism in general.

Atomic operations are implemented as CPU instructions, thus there is some variation between which instructions are supported on what hardware. That said, most modern hardware (and recent operating systems/compilers/libraries) provide some atomic operations.

• load, store: Most hardware implements reading and writing to a word sized memory location as an atomic operation.
• fetch and add (or subtract, etc): Increment a value by a given amount.

Those are the basic operations, and although they are useful, they aren't too powerful. The real work horse of atomic programming is

• compare and swap: Compare a value stored in memory with another value, if they are the same, store a third value into the memory location.

This one is a bit weird, so here is the operation as Maple code. Of course, this implementation is not atomic:

```(**) compare_and_swap := proc( variable::name, old, new )
(**)     if ( eval(variable) = old ) then
(**)         variable := new;
(**)         true;
(**)     else
(**)         false;
(**)     end;
(**) end:
(**)
(**) a := 1:
(**)
(**) compare_and_swap( 'a', 1, 2 );
true

(**)
(**) a;
2

(**)
(**) compare_and_swap( 'a', 1, 3 );
false

(**)
(**) a;
2
```

Note in C variable would be a pointer to a memory location.

There is some variation on what the return value for a compare and swap should be, but generally it will either return true or false depending on if the swap occurred or it will return the value stored in the variable before the operation occurred. In any event, you can determine from the return value if the swap occurred. The compare and swap operation is interesting because is allows for conditional behaviour, in particular if the value is what I expect it to be, then perform the swap. As an example, we could implement fetch and add using compare and swap like this:

```(**) fetch_and_add := proc( variable::name, value )
(**)     local old, new;
(**)     do
(**)         old := eval( variable );
(**)
(**)         new := old+value;
(**)
(**)         if ( compare_and_swap( variable, old, new ) ) then
(**)             break;
(**)         end;
(**)     end;
(**)
(**)     eval(variable);
(**) end:
(**)
7
```

Why is this correct? The value in new is the correct value to store in variable if and only if the value currently stored in variable is old. If the value has changed between reading variable's value and the compare_and_swap, then the compare_and_swap will fail and fetch_and_add will execute the loop again. Only when we can store the correct value in variable will this operation complete. One might argue that this operation is not atomic, as it can be effected by the actions of other threads. However because of the indeterminism of parallel execution, the actions of other threads that caused the compare_and_swap to fail are effectively happening before the fetch_and_add. When the fetch_and_add completes, we know that the value being stored is the correct value as if the atomic operation had occurred at that moment.

If you consider the above code, we could do just about anything we want between reading variable's value and attempting the compare_and_swap. For example, if we are working with pointers (in C for example) we could make changes to data structures. With a lot of careful planning you can use compare and swaps to implement reasonably complex data structures that can correctly perform operations in a multi-threaded environment without needing to use mutexes.

As an example, in Maple 13 we implemented an atomic simple table for the kernel. The simple table is a hash table used in the kernel to guarantee that we have unique representations for various mathematical objects. The Maple simple table is a heavily used structure and having a lock around the simple table was preventing even simple examples from parallelizing effectively. By implementing an atomic table we significantly reduced the amount of contention around this fundamental data structure. I hope in the future we will be able to re-use much of the code in the atomic simple table to implement atomic tables (as in table()) in Maple.

Unfortunately compare and swap is not perfect. In particular it can fall victim to the ABA problem. Basically between the read of a value and the compare and swap it is possible that the variable's value will change from A to B then back to A. This kind of change will not be detected by a compare and swap because the variable's value is set back to the expected value. For something like the implementation of fetch_and_add this is not a problem, however when manipulating a data structure the ABA problem can be fatal.

One fix for this is using a double compare and swap (DCAS), the DCAS is similar to the compare and swap (CAS) but it works with two words of memory. It has two expected values and two new values. One word is used as it is in a CAS, but the other is used like a time stamp. Each update of the first word is accompanied by a increment to the second word. This allows the DCAS to detect changes like the ABA problem. Unfortunately the DCAS instruction is not yet widely available.

Atomic operations are powerful tools, but are complex and require precise and meticulous programming. And if you brain is twisted like mine, they are also kind of fun

﻿