Description of software

From initial clustering information provided by the user, a clustering algorithm is run to apportion each ensemble member into a cluster. Successive members are merged together by computing the correlation between members, combining the two closest members, and repeating the process until the required number of clusters is found.

A file called '' (where FFFFF is the stash code of the field) is created which contains the clustering dendrogram, and the details about the cluster members are printed. Finally the average value of the field in each cluster is plotted at each data time. Note that the clustering is performed using data values at a single, user-specified, time.

The program allows up to 6 clusters; the actual number is determined by the difference in correlation between successive pairing of clusters. The cut-off is deemed to be where the difference in correlation between two-successive cluster joinings is greatest.

Location of source code

The IDL source code can be found at /home/h05/frbr/idl_source_code/

The code uses library routines written by Andy Heaps (NCAS, University of Reading) and these are accesssed by adding these two lines into your .profile file:

export IDL_PATH=+/usr/local/itt
,  /data/nwp1/frbr/TIDLWorkspace/andy_lib/

Method of use - interactive usage

The user must supply information as follows:

  1. A normalisation type. 0=no normalisation, 1=normalisation (the default). Normalisation can be used to select overall patterns, rather than maps with similar field magnitudes.
  2. Specify whether to average the input data spatially over an area. Entering a positive integer (N) will average the input data over a square region of side 2*N+1 gridpoints. Close to the boundaries the data are averaged over a smaller box (down to a 1x1 box on the actual boundaries).
  3. For multi-level fields the user must select the data level for the clustering algorithm to use. For single-level fields enter a value 1.
  4. Enter a value for the time of the field to be used in the single-time clustering algorithm. The default value is 1, which will select the first time-point of the fields. Eg. 36 might mean 36 * 5 minutes if 5min is the frame interval and the first frame is at a time of 5 minutes.
  5. A valid field code for the data under investigation; one of:
  6. For multi-level fields the user must select the level to be used for the output. For single-level fields enter a value 1.
  7. The directory of the input data file.
  8. Substrings to create the data filename, assumed to be FFFFFXXXmmYYY where FFFFF if the field code number, XXX may be of the form _ppn_qwq114.oper (the default), mm is the member number (00 to 23), and YYY may be of the form

Currently, the clustering will work with only the members of the ensemble at a single time. The program needs to be re-run by the user (probably in batch mode using a unix script to generate the control file) if clusters at mnore than one time are required.

Method of use - batch mode

  1. The user should make a copy of the IDL source code in his/her working directory.
  2. Next, create a text file called clusterplot_controlfile in the working directory. This file should contain the following:
    0                            ; normalisation flag (0=off, 1=on)
    -1                           ; spatial averaging flag
    1                            ; model level used for the clustering
    4                            ; N, where N'th time sample determines clustering
    04203                        ; field to be analysed
    1                            ; model level to be used for output
    /export/carrot/raid1/brugge/stefano/no_precip_run_fcst/   ; data directory
    _ppn_qwq114.oper      ; XXX ; where filename is FFFFFXXXmmYYY            ; YYY ; and mm is the member

where the records should been changed to reflect the input data and output requirements of the user. The text to the right of the ';' symbols can be kept as a reminder of the meaning of the data records.

The idl code can now been run in a batch job or, if run interactively, will loop through the specified cycles.

Example output

The figures show (a) a dendrogram showing how the clusters are generated, (b) a cluster plot using normalised data, and (c) a cluster plot based on the same inoput data - but this time not normalised.

Current code developments