An aggregated data array (or master array) is one which is partitioned such that each partition is replaced replaced by a reference to an independent, non-aggregated array which contains the data, called a sub-array.
Aggregated data arrays arise naturally from the aggregation process, but may be generated by other mechanisms (such the Large Amounts of Massive Arrays (LAMA) functionality of cf-python).
The sub-array referred to by each partition exists either as an actual array (in memory, for example) or as another reference to an array contained within a file.
The master array may be partitioned along any, arbitrary subset of its dimensions. The choice of partition positions does not affect the master array. The only constraint on partition positions is that the matrix of partitions (the partition matrix) must be be hyperrectangular, i.e. it must not ragged along any of the partition dimensions.
To meet the requirement of a hyperrectangular partition matrix, it may be necessary to add virtual partitions to the master array. This will be the case if the master array’s sub-array edges are not all aligned. Virtual partitions allow the master array to view a sub-array as two or more, independent sub-arrays without turning the sub-array itself into an aggregated data array. Each virtual partition is a reference to a part of a sub-array. Virtual partitions are positioned so as to ensure that all partition edges are aligned and therefore that the partition matrix is hyperrectangular (see example 1).
The master array makes no distinction between partitions and virtual partitions so, henceforth, both are referred as partitions, and a partition’s data array is defined to be the part of the sub-array which used by the partition. Note that:
Example 1: Partitions of a 2-dimensional master array
The 2-dimensional 8 x 7 master array in figure 1a is composed from 10 sub-arrays. These sub-arrays do not form a rectangular matrix (not all of their edges are aligned), so virtual partitions are created resulting in 24 partitions arranged in a 4 x 6 matrix (figure 1b).
Figure 1a. The 10 sub-arrays of the master array.
Figure 1b. Each block of colour represents one of the 10 sub-arrays and each of the master array’s partitions is labelled Pyx
There are properties of a sub-array which are arbitrary in the sense that, whilst these properties may differ to their equivalents in the master array, the sub-array may always be altered to conform with the master array with no loss of information.
The properties for which a sub-array may differ from its master array are:
When a partition’s data array is required by the master array, it needs to be conformed by doing any or none of:
It follows that an aggregated data array and its partitions may be completely specified by a small number of parameters.
Note that when a partition has a parameter value equal to the master array then there is some redundancy which may be exploited when storing the array by its parameters.
The master array comprises:
- dtype
- The data type of the master array.
- units
- The units of the master array.
- calendar (if required by units)
- The calendar of the master array.
- dimensions
- An ordered list of the master array’s dimensions.
- shape
- An ordered list of the master array’s dimension sizes. The sizes correspond to the dimensions list.
- directions
- An ordered list of the master array’s dimension directions. The directions correspond to the dimensions list.
- pmdimensions
- An ordered list of the dimensions along which the master array is partitioned. Each of these dimensions is one those specified by the dimensions parameter.
- pmshape
- An ordered list containing the number of partitions along each partitioned dimension of the master array. The sizes correspond to the pmdimensions list.
- Partitions
- A matrix of the master array’s partitions. Each partition is described by its partition parameters.
Each partition of the partition matrix comprises:
- pdimensions
- An ordered list of the partition’s data array’s dimensions.
- pdirections
- An ordered list of the partition’s data array’s dimension directions. The directions correspond to the pdimensions list.
- punits
- The units of the partition’s data array.
- pcalendar (if required by punits)
- The calendar of the partition’s data array.
- location
- An ordered list of the ranges of indices for each dimension of the master array which describe the section of the master array spanned by this partition. The ranges correspond to the dimensions list.
- part
- An ordered list of indices for each dimension of the partition’s sub-array which describes the part of the sub-array which applies to this partition. The indices correspond to the pdimensions list.
- data
- A reference to the partition’s sub-array.