Matrix/Array access: order can be important

June 06 2012 acer 10531
Maple

7

Way back in Maple 6, the rtable was introduced. You might be more familiar with its three types: Array, Matrix, and Vector. The name rtable is named after "rectangular table", since its entries can be stored contiguously in memory which is important in the case of "hardware" datatypes. This is a key aspect of the external-calling mechanism which allows Maple to use functions from the NAG and CLAPACK external libraries. In essence, the contiguous data portion of a hardware datatype rtable can be passed to a compiled C or Fortran function without any need for copying or preliminary conversion. In such cases, the data structure in Maple is storing its numeric data portion in a format which is also directly accessible within external functions.

You might have noticed that Matrices and Arrays with hardware datatypes (eg. float[8], integer[4], etc) also have an order. The two orders, Fortran_order and C_order, correspond to column-major and row-major storage respectively. The Wikipedia page row-major  explains it nicely.

There is even a help-page which illustrates that the method of accessing entries can affect performance. Since Fortran_order means that the individual entries in any column are contiguous in memory then code which accesses those entries in the same order in which they are stored in memory can perform better. This relates to the fact that computers cache data: blocks of nearby data can be moved from slower main memory (RAM) to very fast cache memory, often as a speculative process which often has very real benefits.

What I'd like to show here is that the relatively small performance improvement (due to matching the entry access to the storage order) when using evalhf can be a more significant improvement when using Maple's Compile command. For procedures which walk all entries of a hardware datatype Matrix or multidimensional Array, to apply a simple operation upon each value, the improvement can involve a significant part of the total computation time.

What makes this more interesting is that in Maple the default order of a float[8] Matrix is Fortran_order, while the default order of a float[8] Array used with the ImageTools package is C_order. It can sometimes pay off, to write your for-do loops appropriately.

If you are walking through all entries of a Fortran_order float[8] Matrix, then it can be beneficial to access entries primarily by walking down each column. By this I mean accessing entries M[i,j] by changing i in ther innermost loop and j in the outermost loop. This means walking the data entries, one at a time as they are stored. Here is a worksheet which illustrates a performance difference of about 30-50% in a Compiled procedure (the precise benefit can vary with platform, size, and what else your machine might be doing that interferes with caching).

Matrixorder.mw

If you are walking through all entries of an m-by-n-by-3 C_order float[8] Array (which is a common structure for a color "image" used by the ImageTools package) then it can be beneficial to access entries A[i,j,k] by changing k in the innermost loop and i in the outermost loop. This means walking the data entries, one at a time as they are stored. Here is a worksheet which illustrates a performance difference of about 30-50% in a Compiled procedure (the precise benefit can vary with platform, size, and what else your machine might be doing that interferes with caching).

Arrayorder.mw

acer

Please Wait...