A SquareRoot Ensemble Kalman Filter Demonstration with the Lorenz model
The HTML version of this documentation is located here
and a PDF version is located here
This document contains the following:
Section

Description

1↓

A brief introduction to data assimilation.

2↓

A description of a data assimilation method called the ensemble squareroot Kalman Filter.

3↓

A description of the model that this assimilation method is applied to: the (nonlinear and chaotic) Lorenz 63 model.

4↓

Instructions for the demonstration web package.

5↓

Instructions for downloading the source code.

6↓

A worksheet of suggested experiments.

7↓

Appendix deriving the ensemble squareroot Kalman Filter

Revision: 21/02/18, Eq.
(9↓) corrected.
1 Data assimilation
Forecasts from models provide useful information to help predict the future state of a system. Inevitably though predictions will diverge from reality as time progresses. This is an inescapable property of dynamical systems. Data assimilation (DA) is a formal approach used to help correct forecasts by introducing information observed from the environment. Many DA methods are based on the ’Kalman Filter’ (KF) equations
[1]. The KF equations describe how information from forecasts and from observations should be combined in an optimal way which extracts maximum information from each source of information. The KF equations are usually solved in a cyclic fashion: a forecast starts from initial conditions at time
t_{0} and is run to time
t_{1}. This forecast accumulates errors over this period which are reduced by assimilating observations at
t_{1}. In the language of DA, the assimilated forecast/observation product is called the ’analysis state’. The analysis state at
t_{1} is then used as the initial conditions for another forecast to time
t_{2} which is then combined with the next batch of observations at that time.
The KF equations rely heavily on quantitative information about the respective uncertainties of the forecast and observational data which influence how the data are combined (for instance more accurate data are considered with greater importance than less accurate data). The KF equations describe not only the analysis state, but also its uncertainty (for instance the KF equation reflect the fact that analysis uncertainty will be less than the forecast uncertainty due to the information provided by the observations). They also describe how this uncertainty evolves along with the forecast to the time of the next batch of observations.
The KF equations provide a means of quantifying all of this information in an explicit way, but the use of the KF can become prohibitively expensive for systems requiring a large number of pieces of information to describe. Suppose a system requires n variables to describe its state (i.e. for a forecast or analysis). The KF then needs ~ O(n^{2}) pieces of information to describe the uncertainty of the state (uncertainties are described by error covariance matrices which represent Gaussianshaped probability density functions). In numerical weather prediction, the number of variables is n ~ O(10^{7}) (i.e. the number of pieces of information needed to describe meteorological fields on a grid covering all relevant parts of the Earth’s atmosphere). Although presentday technology can handle states with this number of variables, it cannot handle the error covariance matrices, which require n^{2} ~ O(10^{14}) pieces of information. This motivates the need for approximate (but efficient) ways of dealing with large n.
2 The Ensemble SquareRoot Kalman Filter
The impracticality of the KF equations in large systems has lead to the development of approximate forms. One important development lies with ensemble methods. Such methods solve a reduced form of the KF equations which deal with not one forecast/analysis state, but with an ensemble of N forecast/analysis states. Ensemble methods describe a state’s uncertainty by the spread in the ensemble at a given time instead of using error covariance matrices. These methods have proved to be practical when N≪n (e.g. N ~ O(10^{2}) for n ~ O(10^{7})), although they do introduce a range of new problems associated with the approximations.
There are many flavours of ensemble KFs. The ensemble squareroot Kalman Filter (EnSRKF) is one popular variant. This method of DA has the following characteristics:

The EnSRKF uses an ensemble of forecasts (output from a numerical model) to represent uncertainty in the model’s ability to capture reality. The quantification of forecast uncertainty is essential for DA as it informs how well information from the forecast should be trusted when confronting observational information. The filter works by constructing an ensemble of analyses which is designed to have a spread equivalent to that predicted by the ordinary KF.

The EnSRKF is a sequential method. This means that it chops up time into a number of intervals. At times where no observations are available, the method propagates the ensemble forward in time using the model equations (model error may be added during the propagation if model error is to be represented). At times where there are observations, the method combines the forecast ensemble with the observations to create a new ensemble which is consistent with all pieces of information.

The EnSRKF combines the forecast ensemble with the observation(s) using an update equation. This update equation is derived from the ordinary KF equations.

The “squareroot” part of the method’s name applies to the particular flavour of ensemble DA that is used here, which uses the forecast/analysis ensemble perturbations (from their mean) as a ’squareroot’ of the forecast/analysis error covariance matrices. The squareroot formulation (unlike nonsquareroot formulations) does not require the observations to be perturbed during the assimilation procedure.
The equations for the EnSRKF are derived in the appendix for interested readers, but are summarized here (shown for a general nonspecific system). At time
t, let the forecast state of the system be represented by an
nelement state vector,
x^{f}_{k}(t), and suppose that an ensemble of
N such model states exist (
1 ≤ k ≤ N), and let these state vectors comprise the columns of the matrix
A^{f} (an
n × N matrix) as follows:
As long as the ensemble is distributed correctly, this forecast ensemble will represent possible realizations of the real system. The spread (disagreement between) members represents the uncertainty of the forecasts.
2.1 Time steps where observations are available
Suppose that observations are available at time t, and let these be assembled in the pelement vector y(t) (for p observations). The EnSRKF may be used to update the members in such a way as to reduce the spread of the ensemble consistent with the uncertainties of the forecasts and the observations (like models, even observations are not perfect representations of reality).
Now let
x^{f}(t) be the
nelement vector which is the mean of the ensemble members contained in
A^{f}. The first stage of the EnSRKF is to find a new vector,
x^{a}(t), which represents a mean analysis state which is closer to the observations than the forecast state is (the superscript “a” stands for “analysis”). The formula for
x^{a}(t) is:
where the time labels have been removed for convenience. The new symbols that appear in
(2↑) are as follows.
A^{f’} is the
n × N matrix of forecast perturbations from the forecast mean state, ie
H is the
p × n observation operator matrix (
H acts on a model state and outputs the model’s version of those observations that are consistent with the input model state),
S is the
p × N matrix product
and
C is the
p × p matrix
In
(5↑) R is the
p × p observation error covariance matrix, which describes the uncertainty of the observational data. In
(5↑), the matrix
C has been decomposed into its eigenvectors,
Z, and eigenvalues,
Λ, which will be used below.
Equation
(2↑) says how the mean of the ensemble should be updated given the new observations. The following formula specifies how the whole ensemble should be updated. This is found from the
n × N matrix of analysis perturbations,
A^{a’}:
The new symbols that appear in
(7↑) are found as follows. First define the
p × N matrix
X:
The
N × N matrix
X^{T}X is the following, which is also given in its eigendecomposition, with eigenvectors,
V, and eigenvalues
(Σ^{T}Σ):
(the reason for writing the eigenvalues in a composite form
Σ^{T}Σ  rather than with a single symbol  is to allow the matrix
Σ to be identified as the matrix of singular values of
X; this will be important to some readers, but is not essential to the understanding of the method). This completes the definition of all of the symbols in
(7↑). It is useful to note that
(7↑) may be written in the following form
where
T = V(I − Σ^{T}Σ)^{1 ⁄ 2} V^{T}. This is useful because
T may be interpreted as the
N × N matrix of weighting coefficients (in the sense that the
kth analysis perturbation may be regarded as a linear combination of the
N forecast perturbations, whose weighting coefficients are specified in the
kth column of
T).
The final step is to construct the full ensemble from the perturbations and then to propagate this ensemble to the next time step (to give the forecast ensemble for the next time step):
where \strikeout off\uuline off\uwave off
(A^{a}, A^{a}, ⋯, A^{a}) is the
n × N matrix of identical mean analysis states in each column,
ℳ is the (potentially) nonlinear forecast model (acting on each column of its matrix argument separately), and
δA is an
n × N matrix of stochastic error perturbations (to simulate an imperfect model). The stochastic perturbations have specified error covariance
Q.
2.2 Time steps where observations are not available
In the absence of observational information, no data assimilation can be performed and so only the forecast is performed:
\strikeout off\uuline off\uwave off(note that in
(13↑) the initial conditions are the forecast states at time
t, rather than the analysis states in
(11↑)).
3 The model that the EnSRKF will be used with
The EnSRKF will be used with observations to update a run of the Lorenz 63 model (with stochastic forcing if specified). The Lorenz 63 model propagates a state of three variables (
n = 3,
x,
y and
z) according to the following coupled, nonlinear ordinary differential equations:
where
σ = 10,
ρ = 28 and
β = 8 ⁄ 3 are fixed parameters. In the demonstration package, the Lorenz 63 equations are solved using the fourthorder RungeKutta algorithm e.g.
[3] and stochastic model error terms are added if required.
Since n is small for this model, it is actually practical to apply the ordinary KF. We examine the EnSRKF here, not for reasons of constructing a reduced representation of this system, but instead just to demonstrate the workings of the method.
4 The EnSRKF/Lorenz 63 web demonstration
The web interface is located at the web address:
www.met.reading.ac.uk/~darc/training/lorenz_ensrkf/lorenz_ensrkf.html.
This interface allows the reader to run the EnSRKF without the need to be concerned with source code or plotting software. If preferred, the source code may be downloaded  see Sec. 4 (
5↓). The following parameters may be adjusted via the web interface:
Parameter(s)

Description

Default value(s)

Notes

δt

The time step of Lorenz model numerics.

0.01

Default is a good value to use.

N

The number of ensemble members.

6

2 ≤ N ≤ 25.

x^{truth}(0)

The initial conditions (x, y, z) of the true state.

3.0, 3.0, 12.0

Used as a reference run of the model to compare to the analyses and to generate synthetic observations.

B^{1 ⁄ 2}

The initial prior standard deviations of x, y, z.

1.0, 1.0, 1.0

To setup the starting points of the ensemble.

model error?

Switch to turn on/off model error.

off

Used to simulate an imperfect model by periodically adding stochastic noise with the error covariance Q.

Q^{1 ⁄ 2}

The model error standard deviations of x, y, z.

4.0, 4.0, 4.0

Relevant only if model error is switched on.

observe x, y, z?

Switches to turn on/off observations of variables.

all on

Must have at least one variable observed.

R^{1 ⁄ 2}

Observation error standard deviations.

1.0, 1.0, 1.0

R is used in the EnSRKF algorithm. It is also used to specify the covariance of the observation noise used to generate synthetic observations.

Δt^{assim}

The number of timesteps in the assimilation period.

200

Observations are spread throughout the first Δt^{assim} timesteps.

Δt^{fore}

The number of timesteps in forecast period.

400

The last Δt^{fore}timesteps are free forecasts.

N^{obsbatches}

The number of batches of observations (each batch observingx, y, z as specified above) spread uniformly throughout the Δt^{assim} time steps.

5

The code might adjust this parameter slightly if Δt^{assim} ⁄ N^{obsbatches} is noninteger.

seed

The random number seed.

123456

Change this to rerun with different random numbers.

Some notes on these parameters:

The model is run over Δt^{assim} + Δt^{fore} timesteps in total.

Over the first Δt^{assim} timesteps, the ensemble members are influenced by N^{obsbatches} batches of observations.

Over the next (and last) Δt^{fore} timesteps, the ensemble members are pure forecasts.

Observations are synthesized from the true trajectory by adding observation noise. This is the ’identical twin experiment’ method.

The initial ensemble points are synthesized from the true initial condition by adding background noise.

Random numbers are used to simulate errors in the observations, to simulate errors in the initial conditions of the ensemble members and to simulate model errors when propagating the state with the Lorenz model.
5 Downloading the source code for the EnSRKF
For readers who prefer to work with source code, there are two versions of the EnSRKF demo for the Lorenz 63 model.

The C++ source code may be downloaded from this location:
www.met.reading.ac.uk/~darc/training/lorenz_ensrkf/lorenz_ensrkf.cpp.
In order for this code to work without the need for additional mathematical libraries, this code includes an eigenroutine. Note that this routine has not the most efficient available and has not been optimized.

A similar (but not identical) version written for the IDL language, with its own documentation, may be downloaded from this location:
www.nceo.ac.uk/lorenz_enkf.php.
Please let us know if you download the code. We ask that the code should be used for nonprofit uses only.
6 Worksheet of suggested experiments with the EnSRKF

The first run is to familiarise you with the output from the web interface. Check that all settings are at their default values as specified in the table in Sec. 4↑ (if they are not, then click on the “Reset to default values” button on the web page).\begin_inset Separator latexpar\end_inset

Click on the “Run Ensemble Squareroot Kalman Filter” button to start the assimilation. A page of data will appear followed by four graphs.

The lefthand column of data is a summary of the settings used and a key to the colours used in the graphs. The righthand column shows the details of the observations that have been simulated. These observations have been derived from the truth run (shown alongside the observations) with added observation noise.

The first three graphs are plots of the x, y and z variables as a function of time. The blue line is the true trajectory from which the noisy observations and noisy background state are simulated. This is the trajectory that we would like the data assimilation method to follow. The red line is the ensemblemean forecast/analysis trajectory and the grey bars are its onestandarddeviation error bars (found from the ensemble spread). The green dots are the observations and are shown with their onestandarddeviation error bars. What do you notice about the error growth during the forecasts, and when a forecast is confronted by an observation?

The last plot is the truth (blue) and ensemble mean (red) trajectories in phase space.

Hint: To help compare different outputs, use the tabs feature that is available with most web browsers: e.g. for Microsoft and Linux, click with the right mouse button on “Return to settings page” button and choose “Open link in new tab”.

Return to the settings page. Repeat this run, but this time make each observation less accurate, e.g. increase each observation error standard deviation to three times larger than the default values. How does this change affect the ensemble mean trajectory and its error bars? How long are the forecasts of x, y and z (after the last observation is assimilated) useful representations of the truth?

Suppose that the default settings (as in 1) are used, except this time observations of x are absent. Would you expect the x trajectory to be unaffected by observations? Would you expect forecasts of x in this system to be in complete disagreement with the true x? Run the experiment and explain what you find. Try the same by returning to the settings in 1 but now removing y and then z observations.

Rerun the setting that you explored in 3, but this time double the number of measurement times to 10.

Return to the default settings (as in 1), but now use only 2 ensemble members. Does the assimilation work, given that the observations do not sample the signal adequately? Try increasing the number of measurement times to 6, 7, 8, 9, then 10.

The experiments so far have used a perfect model (that is the model used in the data assimilation is the same as that used to generate the truth run, even though the initial conditions differ in each case). Return to the default settings again, but now switch on model error (the Lorenz equations have no model error added for the truth run  only for the model used to propagate the ensemble members in the DA). How does the model error affect the quality of the assimilation? Repeat with error standard deviation 16 units for each variable: does increasing the number of observation times ever make up for the fact that the assimilation model is not perfect?
7 Appendix: derivation of the EnSRKF equations
The formulas shown in Sec.
2↑ for the ensemble squareroot Kalman Filter (EnSRKF) are derived for interested readers in this appendix.
7.1 The basic Kalman filter
The starting point is the statement of the ordinary Kalman Filter (KF) equations. These describe how a single, but imperfectly known state evolves in time. The state evolution is governed by a (linear and imperfect) model, which is partially corrected by the assimilation of (imperfect) observational information. The KF equations are:
The symbols have the following meanings (all symbols refer to time
t unless explicitly specified otherwise):

x^{a} is the analysis state at time t. This represents the best estimate of the system’s state after observations have been considered. It is sometimes called the posterior state. There are n pieces of information in x^{a}, so x^{a} is an nelement (column) vector.

x^{f} is the forecast state at time t. It is a forecast from a previous analysis and is an nelement vector. It is sometimes called the prior state (or background state) because it represents the estimate of the system’s state before observations have been considered.

y represents the observational information at time t. There are p pieces of information in y, so y is a pelement vector.

H is the (linear) observation operator matrix at time t. It acts on x^{f} and gives the model’s version of the observations. H is a p × n matrix.

K is the Kalman gain matrix at time t. It acts on the difference between the actual observations and the model’s observations (this difference is called an innovation vector) and gives the analysis increment vector which is added to the forecast to give the analysis. K is an n × p matrix and is made up of a string of other matrices.

P^{f} is the forecast error covariance matrix at time t. It describes the uncertainty of the forecast state. Suppose δx^{f} is one possible error in x^{f} at time t, then P^{f} = ⟨δx^{f}δx^{fT}⟩, where the angled bracket indicates average over all possible δx^{f} that are consistent with information available about x^{f}. P^{f} is an n × n matrix.

R is the observation error covariance matrix at time t. It describes the uncertainty of the observations. R is a p × p matrix.

ℳ is the forecast model (propagating its nelement argument at time t to an nelement argument at time t + Δt. ℳ can be nonlinear, but the KF equations are designed to be optimal when the forecast model is linear. \strikeout off\uuline off\uwave offδx is a stochastic random noise vector to represent the fact that the forecast model is imperfect.

P^{a} is the analysis error covariance matrix at time t. It describes the uncertainty of the analysis state. It has been written in two forms in (17↑). The second form, (I − KH)P^{f}, indicates that the analysis error covariances are necessarily smaller than the forecast error covariances (by the presence of the minus sign).

M is the Jacobian of ℳ. It is an n × n matrix and is found from M = ∂ℳ ⁄ ∂x evaluated on the model trajectory from the analysis state at time t to the next analysis time at t + Δt.
The KF works in a cycle. A forecast,
x^{f} (with error covariance
P^{f}) at time
t and observations,
y (with error covariance
R) are combined in
(17↑)to give the analysis,
x^{a}. In a similar way the forecast and observation error covariances (
P^{f} and
R respectively) are combined to give the analysis error covariance,
P^{a} in
(17↑). The analysis is propagated to the next observation time at
t + Δt in
(17↑) where it becomes the new forecast state, and the analysis error covariance is propagated in
(17↑) where it becomes the new forecast error covariance matrix. This forecast/update cycle is repeated. At no stage in this cycle is the state of the system known exactly: the forecast stage introduces inevitable forecast errors and the analysis step reduces them  but never to zero.
7.2 The ensemble squareroot Kalman filter
The KF equations are simple to use for systems with small n (e.g. n < O(10^{2})). The storage required to evaluate the KF equations however goes as ~ O(n^{2}) and the scaling of the number of floatingpoint operations required is even higher. Ensemble methods remove the need to represent large matrices explicitly and so are often a feasible way of approximating the KF equations. Since n is only 3 for the Lorenz 63 model, it is actually practical to apply the ordinary KF, but here we examine the EnSRKF just to demonstrate the workings of the method. The starting point for derivation of the EnSRKF equations is the KF equations.
Suppose that we have
N ensemble members, the KF update
(17↑)for
kth member is
All parts of this expression are known except for
P^{f}, but \strikeout off\uuline off\uwave off
P^{f} can be approximated from the ensemble as follows
This is an approximation because
P^{f} is derived from a statistical sample. It will be helpful to write this in the matrix notation used in
(3↑), where
A^{f’} is the
n × N matrix of ensemble member perturbations. Equation
(23↑) may then be written as
where the sum in
(23↑) is implied in the matrix algebra. This result applies equally well to the analysis error covariance matrix:
Equation
(22↑) may also be written in matrix form
where
\uuline default\uwave defaultY is the p × N matrix of identical columns comprising the observation vector y. This matrix equation is equivalent to the vector equation (22↑), but where in (26↑) each ensemble member corresponds to a given column. The ensemble mean of this is
(and noting that Y = Y).
Equation
(24↑) is now substituted into the second equality in
(17↑), with the Kalman update from
(17↑):
P^{a}
=
P^{f} − P^{f}H^{T}(HP^{f}H^{T} + R)^{ − 1} HP^{f},
=
(1)/(N − 1) A^{f’}A^{f’T} − (1)/(N − 1) A^{f’}A^{f’T}H^{T}⎧⎩H(1)/(N − 1) A^{f’}A^{f’T}H^{T} + R⎫⎭^{ − 1} H(1)/(N − 1) A^{f’}A^{f’T}.
In order to simplify the notation, we use matrices
S and
C as defined in
(4↑) and
(5↑) respectively. These make the above into
With
S and
C,
(27↑) is similarly
The idea of a squareroot scheme is to find an ensemble of analysis perturbations (in the matrix
A^{a’}) that have the covariance given by
(28↑) (think of the matrix
A^{a’} as the ’squareroot’ of the analysis error covariance matrix as in
(25↑)). Once such an ensemble matrix is found, it is added to the mean
A^{a} in
(29↑) to give the full ensemble. The next step is to find
A^{a’} that has the property of
(25↑). Firstly, since
C is a square and symmetric matrix, it may be written in its eigendecomposition as in
(5↑), where in
(5↑) Z is the
p × p matrix of eigenvectors (
ZZ^{T} = I) and
Λ is the
p × p matrix of eigenvalues. Using the additional definition of the
p × N matrix
X as in
(8↑) makes
(28↑) into
A^{a’}A^{a’T} = A^{f’}[I − X^{T}X]A^{f’T}.
Now decomposing
X^{T}X into its eigendecomposition as in
(9↑) where
V is the
N × N matrix of eigenvectors (
VV^{T} = I and
V^{T}V = I) and
Σ^{T}Σ is the
N × N matrix of eigenvalues gives:
A^{a’}A^{a’T}
=
A^{f’}[I − VΣ^{T}ΣV^{T}]A^{f’T},
=
A^{f’}V[I − Σ^{T}Σ]V^{T}A^{f’T}.
Matrix \strikeout off\uuline off\uwave off
I − Σ^{T}Σ is a diagonal square matrix, so finding a square root of this matrix is simple. One such squareroot matrix is
[I − Σ^{T}Σ]^{1 ⁄ 2} V^{T} (this matrix times its transpose gives
I − Σ^{T}Σ) and leads to
A^{a’}A^{a’T} = A^{f’}V[I − Σ^{T}Σ]^{1 ⁄ 2} V^{T}V[I − Σ^{T}Σ]^{T ⁄ 2} V^{T}A^{f’T},
which, after comparing to
(29↑) leads to this matrix of analysis ensemble perturbations:
A^{a’} = A^{f’}V[I − Σ^{T}Σ]^{1 ⁄ 2} V^{T},
(as in
(7↑)). The important point here is that the largest matrices we need to store have dimensions
n × N (
A^{f’} and
A^{a’}) and we need to perform an eigendecomposition on only a
p × p matrix (
C) and on an
N × N matrix (
X^{T}X). In real problems, usually
p≪n and
N≪n, which makes the EnSRKF an efficient approach to the DA problem.
Acknowledgements
This documentation and the web interface and the C++ code was written by R.N. Bannister. The IDL code written by S. Migliorini. The Lorenz 63 model was invented by E.N. Lorenz
[2]. An original source of the EnSRKF is given by J.S. Whitaker and T.M. Hamill (2002)
[4].
References
[1] Kalman R.E. (1960). A new approach to linear filtering and prediction problems, Journal of Basic Engineering 82, pp. 35–45.
[2] Lorenz E.N. (1963). Deterministic nonperiodic flow, J. Atmos. Sci. 42, pp.43371.
[3] Press W.H., Teukolsky S.A. ,Vetterling W.T., Flannery B.P., Numerical Recepies in Fortran, Cambridge University Press, 1992, Sec. 16.1.
[4] Whitaker J.S., Hamill T.M. (2002). Ensemble data assimilation without perturbed observations, Mon. Weather Rev. 130, pp. 191324.