A master data array is one which is constructed from the one or more sub-arrays.
The simplest master data array has just one sub-array which is equal to itself, but in general, a master data array may have up to as many, non-overlapping sub-arrays as is it has elements and each sub-array contains a contiguous subset of the master data array values. See figure 1a and figure 2a for examples of master data arrays and their constituent sub-arrays.
A master data array constructed from sub-arrays arises naturally from an aggregation process, such as by the CF aggregation rules.
Sub-arrays may be be combined to create hyperrectangular [1] master data arrays, which poses storage and access difficulties since there may be different numbers of sub-arrays across different sections of the same master data array dimension. For example, the master data array in figure 2a has three sub-arrays spanning the first two rows (values 0 to 6 and 7 to 13) but only two sub-arrays spanning third row (values 14 to 20). Such an irregular construction would, for example, complicate finding which sub-array contained a particular element or, more importantly, complicate the recombination of sub-arrays into an equivalent master data array (an operation required by the CF aggregation process).
A solution is to decompose the master data array into a hyperrectangular partition matrix such that its elements (called partitions) span all or part of exactly one sub-array. A given partition’s data array is then a reference to a unique part of a unique sub-array. See figure 1b, figure 1c and figure 2b for examples.
The master data array may be therefore be considered as being constructed from a hyperrectangular matrix of partitions. See example 1 and example 2.
A guaranteed hyperrectangular decomposition renders easy any operations requiring knowledge of the partition locations within the master data array (such as element location or partition matrix transposition) and allows the master data array to be aggregated with other master data arrays.
Sub-arrays larger than the available memory may be stored effectively in memory by simply ensuring that such a sub-array is kept in a file and partitioning the master data array so that each partition is small enough to be realized in memory, if required.
A subspace of a master data array may be created easily by discarding partitions which do not overlap the subspace and for each remaining partition, adjusting the definition of the part of the sub-array which comprises the partition’s data array (see the part parameter).
The 2-dimensional 2 x 7 master data array in figure 1a is composed from 3 sub-arrays. The most efficient way of partitioning the master data array into a hyperrectangular partition matrix such that each partition contains all or part of exactly one sub-array is shown in figure 1b. Note that, in this case:
Figure 1a. The 3 sub-arrays of the master data array.
Figure 1b. A 1-dimensional, 3 element partition matrix of the master data array. Each block of colour represents one of the 3 sub-arrays and each partition of the partition matrix is labelled Px. For example, the partition data array of P0 contains values 0 and 7.
Another, equally valid partitioning of the master data array is shown in figure 1c. Note that, in this case:
Figure 1c. A 2-dimensional, 2 x 4 element partition matrix of the master data array. Each block of colour represents one of the 3 sub-arrays and each partition of the partition matrix is is labelled Pyx. For example, the partition data array of P11 contains values 8 and 9 and the partition data array of P12 contains value 10.
The 2-dimensional 8 x 7 master data array in figure 2a is composed from 10 sub-arrays. The most efficient way of partitioning the master data array into a hyperrectangular partition matrix such that each partition contains all or part of exactly one sub-array is shown in figure 2b. Note that, in this case:
Figure 2a. The 10 sub-arrays of the master data array.
Figure 2b. A 2-dimensional, 4 x 6 element partition matrix of the master data array. Each block of colour represents one of the 10 sub-arrays and each partition of the partition matrix is labelled Pyx. For example, the partition data array of P30 contains value 49; the partition data array of P31 contains values 50 and 51; and the partition data array of P32 contains value 52.
There are properties of a partition’s data array which are arbitrary in the sense that, whilst these properties may differ to their equivalents in the master data array, the partition’s data array may always be altered to conform with the master data array with no loss of information.
A partition’s data array inherits these properties, unchanged, from the sub-array which contains it.
The properties for which a partition’s data array may differ from its master data array are:
When a partition’s data array is required by the master data array, it needs to be conformed by doing any or none of:
It follows that a master data array and its partitions may be completely specified by a small number of parameters.
Note
When a partition has a parameter value equal to the master array then there is some redundancy which will be exploited when storing the array by its parameters with the CFA-netCDF conventions.
The master data array comprises:
- dtype
- The data type of the master data array.
- units
- The units of the master data array.
- calendar (if required by units)
- The calendar of the master data array.
- dimensions
- An ordered list of the master data array’s dimensions.
- shape
- An ordered list of the master data array’s dimension sizes. The sizes correspond to the dimensions list.
- Partitions
- A matrix of the master data array’s partitions. Each partition is described by its partition parameters.
- pmdimensions
- An ordered list of the dimensions along which the master data array is partitioned. Each of these dimensions is one those specified by the dimensions parameter.
- pmshape
- An ordered list containing the number of partitions along each partitioned dimension of the master data array. The sizes correspond to the pmdimensions list.
Each partition of the partition matrix comprises:
- pdimensions
- An ordered list of the partition’s data array dimensions.
- punits
- The units of the partition’s data array.
- pcalendar (if required by punits)
- The calendar of the partition’s data array.
- flip
- A collection of the partition’s data array dimensions which run in the run in the opposite direction to those of the master data array.
- location
- An ordered list of ranges of indices, one for each dimension of the master data array, which describe the contiguous section of the master data array spanned by this partition. The ranges correspond to the dimensions list.
- part
- An ordered list of indices for each dimension of the partition’s data array which describe the part of the sub-array which is spanned by this partition. The indices correspond to the pdimensions list.
- subarray
- A reference to the sub-array which contains the partition’s data array.
Footnotes
[1] | The generalization of a rectangle for higher dimensions. |