A proposal for the efficient netCDF file storage of master data arrays
Since the parameters which completely describe a master data array may each be cast as one a set of particular basic types, then these parameters may be easily encoded in a JSON (JavaScript Object Notation) string for simple inclusion in a netCDF file.
JSON is a lightweight data-interchange format which is easy for humans to read and write and easy for machines to parse and generate. There are JSON encoders and decoders for every reasonable language. See the JSON Wikipedia article for some good examples of JSON strings.
The basic types recognised by JSON are (with JSON encoded examples):
It clear that, with the exception of the partition’s subarray parameter, each of the values of the parameters which describe a master data array is one or a combination of these basic types. For example, the pmshape parameter is a list of numbers.
The same is actually true for the partition’s subarray parameter, although it is not so immediately obvious. The issue is that it may be a reference to section of a file or to a multidimensional array in memory. However, in the latter case the sub-array must be written to a file in order to be stored, so this reduces to the former case. A file reference is easily encapsulated by a collection of the basic types. For example, if the reference is to a netCDF file variable then all that is required are the file name (a string), the name of the netCDF variable containing the sub-array (a string) and its shape (a list of numbers) [1].
Thus, for storage purposes, all of the master data array parameters may encoded in a single JSON string.
See the complete description of netCDF NCA attributes for details on how each master data array parameter is encoded.
A multidimensional master data array may be stored in a scalar netCDF variable with the same data type as its master array, one of whose attributes is the JSON encoded string of the master data array parameters. When read, this scalar array variable may then be converted to a multidimensional array variable after the parameters have been decoded. Note that the datum of this scalar netCDF variable is ignored and need not be set.
Such a variable is called an NCA variable (netCDF aggregate variable) and a file storing NCA variables is called an NCA file (netCDF aggregate file) and should include both ‘CF’ and ‘NCA’ in its global conventions attribute. An NCA file may contain a mixture of NCA variables and normal netCDF variables (or even no NCA variables at all).
For partitions whose subarray parameter refers to a sub-array in memory, that sub-array is written to the NCA file itself as an NCA private variable. This is a normal netCDF variable marked as containing the sub-array for a partition of one or more NCA variables and should not be interpreted otherwise according to the CF conventions. As this sub-array now exists in a netCDF file, the relevant NCA variables may refer to this sub-array as for any netCDF variable, i.e. by specifying the netCDF file name [2], the netCDF variable name and its shape. See example 4.
It is recommended, though not necessary to write the following types of variable as normal (non-NCA) netCDF variables:
A simple NCA file:
netcdf temperature.nca {
dimensions:
time = 48 ;
lat = 64 ;
lon = 128 ;
variable:
double time(time) ;
time:long_name = "time" ;
time:units = "days since 0000-1-1" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
float tas ;
tas:standard_name = "air_temperature" ;
tas:units = "K" ;
tas:cf_role = "nca_variable" ;
tas:nca_dimensions = "time lat lon" ;
tas:nca_array = "{'directions': {'lat': false,
'time': true,
'lon': true
},
'pmshape': [2],
'pmdimensions': ['time'],
'Partitions': [{'index': [0],
'data': {'file': '/home/david/test1.nc',
'shape': [12, 64, 128],
'ncvar': 'tas'
},
'location': [[0, 12], [0, 64], [0, 128]],
'format': 'netCDF'
},
{'index': [1],
'data': {'file': '/home/david/test2.nc',
'shape': [36, 64, 128],
'ncvar': 'tas2'
},
'location': [[12, 48], [0, 64], [0, 128]],
'format': 'netCDF'
}
]
}" ;
// global attributes:
:Conventions = "CF-1.5 NCA" ;
data:
time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
// etcetera.
lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
// etcetera.
lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
// etcetera.
Points to note:
Storing a master data array with an in-memory partition data array:
netcdf temperature2.nca {
dimensions:
time = 48 ;
lat = 64 ;
lon = 128 ;
nca12 = 12 ;
nca64 = 64 ;
nca128 = 128 ;
variable:
double time(time) ;
time:long_name = "time" ;
time:units = "days since 0000-1-1" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
float tas ;
tas:standard_name = "air_temperature" ;
tas:units = "K" ;
tas:cf_role = "nca_variable" ;
tas:nca_dimensions = "time lat lon" ;
tas:nca_array = "{directions': {'lat': false,
'time': true,
'lon': true
},
'pmshape': [2],
'pmdimensions': ['time'],
'Partitions': [{'index': [0],
'units' : 'K @ 273.15',
'dimensions': ['lon', 'time', lat'],
'directions': {'time': false},
'data': {'shape': [128, 12, 64],
'ncvar': 'nca_45sdf83745'
},
'location': [[0, 12], [0, 64], [0, 128]],
'format': 'netCDF'
},
{'index': [1],
'data': {'file': '/home/david/test2.nc',
'shape': [36, 64, 128],
'ncvar': 'tas2'
},
'location': [[12, 48], [0, 64], [0, 128]],
'format': 'netCDF'
}
]
}" ;
float nca_45sdf83745(nca128, nca12, nca64) ;
nca_45sdf83745:cf_role = "nca_private" ;
// global attributes:
:Conventions = "CF-1.5 NCA" ;
data:
time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
// etcetera.
lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
// etcetera.
lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
// etcetera.
nca_45sdf83745 = -4.5, 3.5, 23.6, -4.45, 13.5, 13.6,
// etcetera.
Points to note:
Footnotes
[1] | The shape is required since the shape of a multi-character string array in memory may be different to the shape of the array stored in a netCDF file, which may be stored as a character array with an extra trailing dimension. |
[2] | In this case, though, the file name may be omitted, in which case the name of the NCA file is assumed. See the file attribute. |