Using an accelerated, fast BLAS set is important, but after attaining a certain major proportional gain over generic refrence BLAS the smaller differences between fast implmentations becomes less important. An exception to that idea is that threaded parallel implementations for SMP or multi-core machines can still be a further major benefit. It's also often the case that resolving Maple Library level interpreted code bottlenecks is also a fruitful area for focus and improvement, especially once some fast BLAS has been utilized. In other words, once fast BLAS are in use on a platform then further tweaking to get a little more performance out of the BLAS is effort less well spent than would be finding other bottlenecks in the Maple Library -- with multithreading being a possible exception.
Cleve Moler made a brief mention in Cleve's Corner, Winter 2000, to the effect that there was not much difference between MKL and ATLAS. That statement is dated, or course. Any difference is going to change over time, depending on which makes best use of new chipset extensions in a timely fashion.
There is an AMD equivalent to Intel's MKL. It is the AMD ACML.
The past few releases of the Intel MKL have also been available for Linux. The Maple-NAG Connector toolbox allows for use of generic, MKL, or ACML BLAS where the corresponding supported NAG C Library allows.
Tip for today: The environment variable OMP_NUM_THREADS can be set to the number of SMP CPUs or cores, on a MS-Windows machine. The Intel MKL should pick this up and allow parallel computation, especially noticable in large floating-point level-3 BLAS calls. While this can bring a benefit for true multi-core machines, it can degrade performance on a single-core hyperthreaded CPU if nice access to cached data is ruined.
Dave Linder Mathematical Software, Maplesoft