A proposal for the efficient netCDF file storage of aggregated data arrays
Since the parameters which completely describe an aggregated data array may each be cast as one a set of particular basic types, then these parameters may be easily encoded in a JSON (JavaScript Object Notation) string for simple inclusion in a netCDF file.
JSON is a lightweight data-interchange format which is easy for humans to read and write and easy for machines to parse and generate. There are JSON encoders and decoders for every reasonable language. See the JSON Wikipedia article for some good examples of JSON strings.
These basic types recognised by JSON are (with JSON encoded examples):
With the exception of partition’s data parameter, it is clear that each of the values of the parameters which describe an aggregated data array is one of these basic types. For example, the pmshape parameter is a list of numbers.
The partition’s data parameter may also be described by these basic types, but is more complicated. It will be one of:
In case 1 there is no problem, as it will always be possible to encapsulate the reference with a collection of these basic types. For example, if the reference is to a netCDF file variable then all that is required are its filename (a string), the name of the netCDF variable containing the sub-array (a string) and its shape (a list of numbers) [1].
In case 2, the sub-array in memory gets stored in a netCDF file variable, and so case 1 applies. See the notes on private NCA variables for details.
A multidimensional aggregated data array may be stored in a scalar netCDF variable with the same data type as its master array, one of whose attributes is the JSON encoded string of the aggregated data array parameters. When read, this scalar array variable may then be converted to a multidimensional array variable after the parameters have been decoded. Note that the scalar netCDF variable normal netCDF datum is ignored and need not be set.
Such a variable is called an NCA variable (netCDF aggregate variable) and a file storing NCA variables is called an NCA file (netCDF aggregate file) and should include both ‘CF’ and ‘NCA’ in its global conventions attribute.
For partitions whose data parameter refers to a sub-array in memory, that array must be written to the NCA file as a private NCA variable. This is a normal netCDF variable marked as containing the sub-array for a partition of one or more NCA variables. As this sub-array now exists in a netCDF file (the NCA file itself), the relevant NCA variables may refer to this sub-array as for any netCDF variable, i.e. by specifying the netCDF file name [2], the netCDF variable name and its shape. See example 3.
NetCDF Markup Language (NcML) aggregation doesn’t allow:
Example 2: A simple NCA file:
netcdf temperature.nca {
dimensions:
time = 48 ;
lat = 64 ;
lon = 128 ;
variable:
double time(time) ;
time:long_name = "time" ;
time:units = "days since 0000-1-1" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
float tas ;
tas:standard_name = "air_temperature" ;
tas:units = "K" ;
tas:nca_dimensions = "time lat lon" ;
tas:nca_array = "{directions': {'lat': false,
'time': true,
'lon': true
},
'pshape': [2],
'pdimensions': ['time'],
'Partitions': [{'index': [0],
'data': {'file': '/home/david/test1.nc',
'shape': [12, 64, 128],
'ncvar': 'tas'
},
'location': [[0, 12], [0, 64], [0, 128]],
'format': 'netCDF'
},
{'index': [1],
'data': {'file': '/home/david/test2.nc',
'shape': [36, 64, 128],
'ncvar': 'tas2'
},
'location': [[12, 48], [0, 64], [0, 128]],
'format': 'netCDF'
}
]
}" ;
// global attributes:
:Conventions = "CF-1.5 NCA" ;
data:
time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
// etcetera.
lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
// etcetera.
lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
// etcetera.
Points to note:
Example 3: storing an aggregated data array with an in-memory partition data array:
netcdf temperature2.nca {
dimensions:
time = 48 ;
lat = 64 ;
lon = 128 ;
nca12 = 12 ;
nca64 = 64 ;
nca128 = 128 ;
variable:
double time(time) ;
time:long_name = "time" ;
time:units = "days since 0000-1-1" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
float tas ;
tas:standard_name = "air_temperature" ;
tas:units = "K" ;
tas:nca_dimensions = "time lat lon" ;
tas:nca_array = "{directions': {'lat': false,
'time': true,
'lon': true
},
'pshape': [2],
'pdimensions': ['time'],
'Partitions': [{'index': [0],
'units' : 'K @ 273.15',
'dimensions': ['lon', 'time', lat'],
'directions': {'time': false},
'data': {'shape': [128, 12, 64],
'ncvar': 'nca_45sdf83745'
},
'location': [[0, 12], [0, 64], [0, 128]],
'format': 'netCDF'
},
{'index': [1],
'data': {'file': '/home/david/test2.nc',
'shape': [36, 64, 128],
'ncvar': 'tas2'
},
'location': [[12, 48], [0, 64], [0, 128]],
'format': 'netCDF'
}
]
}" ;
float nca_45sdf83745(nca128, nca12, nca64) ;
nca_45sdf83745:nca_private = 1 ;
// global attributes:
:Conventions = "CF-1.5 NCA" ;
data:
time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
// etcetera.
lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
// etcetera.
lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
// etcetera.
nca_45sdf83745 = -4.5, 3.5, 23.6, -4.45, 13.5, 13.6,
// etcetera.
Points to note:
It is recommended, though not necessary to write the following types of variable as normal (non-NCA) netCDF variables:
Footnotes
[1] | The shape is required since the shape of a multi-character string array in memory may be different to the shape of the array stored in a netCDF file, which may be stored as a character array with an extra trailing dimension. |
[2] | In this case, though, the file name may ommitted, in which case the name of the NCA file is assumed. See the file attribute. |