A conceptual framework for the in-memory storage of aggregated arrays ===================================================================== An **aggregated data array** (or **master array**) is one which is **partitioned** such that each **partition** is replaced replaced by a reference to an independent, non-aggregated array which contains the data, called a **sub-array**. Aggregated data arrays arise naturally from the `aggregation process `_, but may be generated by other mechanisms (such the `Large Amounts of Massive Arrays (LAMA) `_ functionality of `cf-python `_). The sub-array referred to by each **partition** exists either as an actual array (in memory, for example) or as another reference to an array contained within a file. The master array may be partitioned along any, arbitrary subset of its dimensions. The choice of partition positions does not affect the master array. The only constraint on partition positions is that the matrix of partitions (the **partition matrix**) must be be hyperrectangular, i.e. it must not ragged along any of the partition dimensions. To meet the requirement of a hyperrectangular partition matrix, it may be necessary to add **virtual partitions** to the master array. This will be the case if the master array's sub-array edges are not all aligned. Virtual partitions allow the master array to view a sub-array as two or more, independent sub-arrays without turning the sub-array itself into an aggregated data array. Each virtual partition is a reference to a **part** of a sub-array. Virtual partitions are positioned so as to ensure that all partition edges are aligned and therefore that the partition matrix is hyperrectangular (see :ref:`example 1 `). .. _data_array: The master array makes no distinction between partitions and virtual partitions so, henceforth, both are referred as **partitions**, and a **partition's data array** is defined to be the part of the sub-array which used by the partition. Note that: * The partition's data array is always a subset of the partition's sub-array, and may be equal to it. * The partition's data array may be constructed from non-contiguous sections of the sub-array (see the description of the :ref:`part ` attribute for examples). .. _example1: ---- **Example 1: Partitions of a 2-dimensional master array** The 2-dimensional 8 x 7 master array in **figure 1a** is composed from 10 sub-arrays. These sub-arrays do not form a rectangular matrix (not all of their edges are aligned), so virtual partitions are created resulting in 24 partitions arranged in a 4 x 6 matrix (**figure 1b**). .. figure:: partitions1.png **Figure 1a**. The 10 sub-arrays of the master array. .. figure:: partitions2.png **Figure 1b**. Each block of colour represents one of the 10 sub-arrays and each of the master array's partitions is labelled P\ :sub:`yx`\ ---- Accounting for arbitrary sub-array properties --------------------------------------------- There are properties of a sub-array which are arbitrary in the sense that, whilst these properties may differ to their equivalents in the master array, the sub-array may always be altered to conform with the master array with no loss of information. The properties for which a sub-array may differ from its master array are: * The order of dimensions. * The number of size 1 dimensions. * The sense in which dimensions run. * The units of the data values. * The missing data value. When a partition's data array is required by the master array, it needs to be **conformed** by doing any or none of: * Reordering its dimensions to the same order as the master array. * Removing size 1 dimensions which don't exist in the master array. * Adding size 1 dimensions which exist in the master array but not in the sub-array. * Reversing dimensions which run in the opposite direction to the master array. * Converting the data values to have the same units as the master array. * *Either* the missing data value is converted to that of the master array (accounting for conflicts with non-missing data values) *or*, if arrays are stored with ancillary missing data masks (as can be the case with `python numpy arrays `_), the sub-array's missing data value may be ignored. Parameters required for specifying an aggregated data array ----------------------------------------------------------- It follows that an aggregated data array and its partitions may be completely specified by a small number of parameters. Note that when a partition has a parameter value equal to the master array then there is some redundancy which may be exploited when storing the array by its parameters. Master array parameters ~~~~~~~~~~~~~~~~~~~~~~~ The master array comprises: .. _frame-dtype: **dtype** The data type of the master array. .. _frame-units: **units** The units of the master array. **calendar** (*if required by* :ref:`units `) The calendar of the master array. .. _frame-dimensions: **dimensions** An ordered list of the master array's dimensions. **shape** An ordered list of the master array's dimension sizes. The sizes correspond to the :ref:`dimensions ` list. **directions** An ordered list of the master array's dimension directions. The directions correspond to the :ref:`dimensions ` list. .. _frame-pmdimensions: **pmdimensions** An ordered list of the dimensions along which the master array is partitioned. Each of these dimensions is one those specified by the :ref:`dimensions ` parameter. .. _frame-pmshape: **pmshape** An ordered list containing the number of partitions along each partitioned dimension of the master array. The sizes correspond to the :ref:`pmdimensions ` list. .. _frame-Partitions: **Partitions** A matrix of the master array's partitions. Each partition is described by its :ref:`partition parameters `. Partition parameters ~~~~~~~~~~~~~~~~~~~~ Each partition of the partition matrix comprises: .. _frame-pdimensions: **pdimensions** An ordered list of the partition's data array's dimensions. **pdirections** An ordered list of the partition's data array's dimension directions. The directions correspond to the :ref:`pdimensions ` list. .. _frame-punits: **punits** The units of the partition's data array. **pcalendar** (*if required by* :ref:`punits `) The calendar of the partition's data array. **location** An ordered list of the ranges of indices for each dimension of the master array which describe the section of the master array spanned by this partition. The ranges correspond to the :ref:`dimensions ` list. **part** An ordered list of indices for each dimension of the partition's sub-array which describes the part of the sub-array which applies to this partition. The indices correspond to the :ref:`pdimensions ` list. .. _frame-data: **data** A reference to the partition's sub-array.