On the storage of arbitrarily aggregated data arrays (v0.2.2)

David Hassell, NCAS

Abstract

This document proposes a netCDF file format convention for the efficient storage of a CF field created by the CF aggregation rules, whose data array is distributed over one or more files, one of which may be the storage file itself. The storage is efficient because, wherever possible, data are replaced with references to the files containing them. To fully accommodate such fields, this convention allows for:

  • Simultaneous aggregation across more than one dimension.
  • Storage of changes to arbitrary parts of the aggregated array.
  • Aggregation accounting for different but equivalent:
    • order of dimensions
    • number of size 1 dimensions
    • senses in which dimensions run
    • units of the data values
    • missing data values

The convention is derived from a framework for storing such aggregated arrays in memory, which is also described.

Although not required to store such aggregated fields, the framework and convention also allow for:

  • Storage of (not necessarily contiguous) subspaces of aggregated arrays.
  • Aggregation of arrays stored in a mixture of formats (in-memory, netCDF, PP, etc.).
  • Manipulation and subsequent storage of larger-than-memory arrays.

Indices and tables