3. The CFA-netCDF conventions

The CFA-netCDF conventions describe the efficient netCDF file storage of CF fields with master data arrays which have been aggregated within the CFA framework.

3.1. Storing the master data array parameters in a string

Since the parameters which completely describe a master data array may each be cast as one of a set of particular basic types, these parameters may be easily encoded in a JSON (JavaScript Object Notation) string for simple inclusion in a netCDF file.

JSON is a lightweight data-interchange format which is easy for humans to read and write and easy for machines to parse and generate. There are JSON encoders and decoders for every reasonable language. See the JSON Wikipedia article for some good examples of JSON strings.

The basic types recognised by JSON are (with JSON encoded examples):

  • string (double-quoted Unicode, with backslash escaping: "foo", "")
  • number (3.14159, 0)
  • boolean (true or false: true, false)
  • null (empty: null)
  • Array (an ordered, comma-separated sequence of values enclosed in square brackets; the values do not need to be of the same type: [1, "a", [2, null, {"x": false}]])
  • Object (an unordered, comma-separated collection of key:value pairs enclosed in curly braces, with the ‘:’ character separating the key and the value; the keys must be strings and should be distinct from each other: {"x": [4, [true, null]], "y": 2.71828})

It clear that, with the exception of the partition’s subarray parameter, each of the values of the parameters which describe a master data array is one or a combination of these basic types. For example, the pmshape parameter is a list of numbers.

The same is actually true for the partition’s subarray parameter, although it is not so immediately obvious. The issue is that it may be a reference to section of a file or to a multidimensional array in memory. However, in the latter case the sub-array must be written to a file in order to be stored, so this reduces to the former case. A file reference is easily encapsulated by a collection of the basic types. For example, if the reference is to a netCDF file variable then all that is required are the file name (a string), the name of the netCDF variable containing the sub-array (a string) and its shape (a list of numbers) [1].

Thus, for storage purposes, all of the master data array parameters may be encoded in a single JSON string.

See the complete description of CFA-netCDF attributes for details on how each master data array parameter is encoded.

3.2. Creating a netCDF variable for the master data array

A multidimensional master data array may be stored in a scalar netCDF variable with the same data type as its master array, whose attributes include the JSON encoded strings of the master data array parameters. When read, this scalar array variable may then be converted to a multidimensional array variable after the parameters have been decoded. Note that the datum of this scalar netCDF variable is ignored and need not be set.

Such a variable is called a CFA variable and a file storing CFA variables is called a CFA-netCDF file and should include both ‘CF’ and ‘CFA’ in its global conventions attribute. A CFA-netCDF file may contain a mixture of CFA variables and normal netCDF variables, or even no CFA variables at all. In the latter case, the CFA-netCDF file is also a CF-netCDF file.

For partitions whose subarray parameter refers to a sub-array in memory, that sub-array is written to the CFA-netCDF file itself as a CFA private variable. This is a normal netCDF variable marked as containing the sub-array for a partition of one or more CFA variables and should not be interpreted otherwise according to the CF conventions. As this sub-array now exists in a netCDF file, the relevant CFA variables may refer to this sub-array as for any netCDF variable, i.e. by specifying the netCDF file name [2], the netCDF variable name and its shape. See example 4.

3.3. Advantages of netCDF

File storage of an aggregated field stored does not need to use netCDF format (for example, plain text XML or CDL format files would be sufficient) but there some reasons why it is desirable:

  • Nearly all of the discovery metadata in a CFA-netCDF file is easily retrievable with standard, unmodified netCDF libraries. The only exception is the ordered list netCDF dimension names of a CFA variable, and to ameliorate any difficulties associated with this, these are stored in their own netCDF property for easy access (see the cfa_dimensions property).
  • A plain text CDL representation of a CFA-netCDF file is trivial to produce with the ncdump utility.
  • An important part of the CFA conventions is the ability to mix, within the same CFA-netCDF file, normal CF-netCDF variables (possibly with very large data arrays) with CFA variables (which may contain references to CFA private variables, possibly with very large data arrays, in the same file), so a binary format is desirable to keep the file size to a minimum.
  • A CFA variable may contain any CF property such as cell methods, flags, ancillary variables, coordinates, formula_terms, etc. By using a netCDF file, there is no need to invent new encodings for complex field properties which have already been standardized for netCDF variables by the CF conventions.

3.5. Examples

The following example files are described by their CDL representation.

3.5.1. Example 3

A simple CFA-netCDF file:

netcdf temperature.nca {
dimensions:
       time = 48 ;
       lat = 64 ;
       lon = 128 ;
variable:

    double time(time) ;
            time:long_name = "time" ;
            time:units = "days since 0000-1-1" ;
    double lat(lat) ;
            lat:units = "degrees_north" ;
            lat:standard_name = "latitude" ;
    double lon(lon) ;
            lon:units = "degrees_east" ;
            lon:standard_name = "longitude" ;
    float tas ;
            tas:standard_name = "air_temperature" ;
            tas:units = "K" ;
            tas:cf_role = "cfa_variable" ;
            tas:cfa_dimensions = "time lat lon" ;
            tas:cfa_array = '{"pmshape": [2],
                              "pmdimensions": ["time"],
                              "Partitions": [{"index": [0],
                                              "data": {"file": "/home/david/test1.nc",
                                                       "shape": [12, 64, 128],
                                                       "ncvar": "tas"
                                                      },
                                              "location": [[0, 12], [0, 64], [0, 128]],
                                              "format": "netCDF"
                                             },
                                             {"index": [1],
                                              "data": {"file": "/home/david/test2.nc",
                                                       "shape": [36, 64, 128],
                                                       "ncvar": "tas2"
                                                      },
                                              "location": [[12, 48], [0, 64], [0, 128]],
                                              "format": "netCDF"
                                             }
                                            ]
                             }' ;

// global attributes:
               :Conventions = "CF-1.5 CFA" ;
data:

 time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
       // etcetera.

 lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
       // etcetera.

 lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
       // etcetera.

Points to note:

  • The file specifies two conventions.
  • The file contains one CFA variable (tas) and three normal variables (time, lat and lon).
  • The CFA variable stores the master data array’s dimensions in a separate attribute (cfa_dimensions) to facilitate reconstruction of a multidimensional variable without having to decode the cfa_array string.
  • The cfa_array string has been split over many lines for enhanced readability. New lines are permitted in JSON strings.
  • The CFA variable defines its data type and units in the normal manner, so that these parameters of the master array may be omitted from the cfa_array attribute.
  • The CFA variable may have any CF-netCDF attributes, with no restrictions.
  • Partition parameters which are the same as their master array may be omitted.

3.5.2. Example 4

Storing a master data array with an in-memory partition data array:

netcdf temperature2.nca {
dimensions:
        time = 48 ;
        lat = 64 ;
        lon = 128 ;
        cfa12 = 12 ;
        cfa64 = 64 ;
        cfa128 = 128 ;
variable:
        double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 0000-1-1" ;
        double lat(lat) ;
                lat:units = "degrees_north" ;
                lat:standard_name = "latitude" ;
        double lon(lon) ;
                lon:units = "degrees_east" ;
                lon:standard_name = "longitude" ;
        float tas ;
                tas:standard_name = "air_temperature" ;
                tas:units = "K" ;
                tas:cf_role = "cfa_variable" ;
                tas:cfa_dimensions = "time lat lon" ;
                tas:cfa_array = '{"pmshape": [2],
                                  "pmdimensions": ["time"],
                                  "Partitions": [{"index": [0],
                                                  "punits" : "K @ 273.15",
                                                  "pdimensions": ["lon", "time", lat"],
                                                  "flip": ["time"],
                                                  "data": {"shape": [128, 12, 64],
                                                           "ncvar": "cfa_45sdf83745"
                                                          },
                                                  "location": [[0, 12], [0, 64], [0, 128]],
                                                  "format": "netCDF"
                                                 },
                                                 {"index": [1],
                                                  "data": {"file": "/home/david/test2.nc",
                                                           "shape": [36, 64, 128],
                                                           "ncvar": "tas2"
                                                          },
                                                  "location": [[12, 48], [0, 64], [0, 128]],
                                                  "format": "netCDF"
                                                 }
                                                ]
                                 }' ;
        float cfa_45sdf83745(cfa128, cfa12, cfa64) ;
                cfa_45sdf83745:cf_role = "cfa_private" ;


// global attributes:
                :Conventions = "CF-1.5 CFA" ;
data:

 time = 164569, 164599.5, 164630.5, 164660, 164689.5, 164720, 164750.5,
       // etcetera.

 lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
       // etcetera.

 lon = 0, 2.8125, 5.625, 8.4375, 11.25, 14.0625, 16.875, 19.6875, 22.5,
       // etcetera.

 cfa_45sdf83745 = -4.5, 3.5, 23.6, -4.45, 13.5, 13.6,
       // etcetera.

Points to note:

  • The in-memory partition data array has been written to the file with an automatically generated variable name (cfa_45sdf83745), which has an attribute cfa_private to mark it as a private variable according to the CFA convention.
  • The in-memory array had different units and dimension order relative to the master array.
  • The time dimension of the in-memory array runs in the opposite direction to the time dimension of the master data array, but the other dimensions run in the same sense as the master array.
  • The CFA private variable has dimensions which are only used by it and other CFA private variables.

Footnotes

[1]The shape is required since the sub-array may omit (contain) size one dimensions which are (are not) present in the master data array. Also, a multi-character string array in memory may be different to the shape of the array stored in a netCDF file, which may be stored as a character array with an extra trailing dimension.
[2]In this case, though, the file name may be omitted, in which case the name of the CFA-netCDF file is assumed. See the file attribute.