Ross Bannister
Data Assimilation Research Centre, University of Reading, UK
Last updated 01/01/02
As we shall see in these notes, the chain rule can be applied to vector as well as scalar derivatives. We will derive the relevant expressions useful in the theory of variational data assimilation and inverse modelling. The results lead us to the concept of adjoint variables and adjoint operators. In section 1 we review the standard notation used in linear algebra.
As is usual notation, scalars and vectors are distinguished from each other by writing vectors in bold. A vector is, by convention, a column vector (here with
elements),
and a vector derivative is a row operator,
The "nabla" version of the derivative, contains the same elements as Eq. (1.2), but is a column vector by convention. The transpose symbol, written as a superscript after an object makes rows into columns and vice-versa.
A matrix, , contains element
in row
and column
. The transpose is thus,
Since a vector is a special case of a matrix with either just one row or one column (depending on whether the vector is row or column vector), the transpose instruction here makes row vectors into column vectors, and vice-versa. In particular, for vector derivatives,
The combination (an inner product) is a scalar. It is found by the summation (
and
must be vectors of the same number of elements,
),
The outer product is written and yields a matrix,
The number of elements of and
need not be the same for the outer product. For
with
elements and
with
elements, the outer product as defined above will be a
matrix.
A matrix acts on one vector to give another vector. The following action,
is valid if the number of rows of (
) is the same as the number of elements in
and the number of columns of
(
) is the same as the number of elements in
. Equation (1.7) is shorthand for,
This action is like performing many inner products, one for each row of . In this respect, the matrix operator is sometimes used as a transformation (or change of basis) where each row of
represents the row vector for a member of the new basis.
Generally, the matrix elements can be thought of as the partial derivatives,
and the whole matrix can be written as,
The inner and outer products and matrix operators applied with vectors and vector derivatives can be used in innovative ways to write compact multi-variable expressions. Such expressions are used in data assimilation.
Consider a scalar that is a function of the elements of
,
. Its derivative with respect to the vector
is the vector,
An important question is: what is in the case that the two sets of variables
and
are related via the transformation,
is sometimes referred to as a Jacobean, and has matrix elements
(as Eq. (1.9)). Let us write an Eq. for the derivative of
with respect to
, expressed explicitly via the chain rule,
Expressions for derivatives with respect to each component of can be assembled into a vector. It can be checked that the following, when expanded using Eqs. (1.3), (1.7) and (1.8), is equivalent to the above.
This is the generalised chain rule for vector derivatives. A column derivative with respect to a vector, such as , is often called an adjoint variable. The operator
(as distinct from the forward operator
, as defined in Eq. (2.2)) is similarly called the adjoint operator. It is important to note that the adjoint of an operator is not generally its inverse: While
transmits information from
to
(Eq. (2.2)),
transmits information in the reverse direction, but for adjoint variables.
Using Eq. (1.10), Eq. (2.5) can be written as,
This has the same appearance as the chain rule for single variable functions (now with vectors in the place of scalars) and is a convenient way of remembering the multi-variable result.
The second derivative with respect to the original variable, , can be written in matrix form as,
Although the right hand side of Eq. (3.1) resembles an inner product (scalar), the 'row' property of derivative vectors (mentioned in section 1) means that this is actually an outer product.
Again, imposing the transformation Eq. (2.2), the result, Eq. (2.5), can be used to rewrite the second derivative matrix in terms of the new, primed variables,
If the function itself is a vector, , then the derivative is a matrix,
where the number of components of (
) is not necessarily the same as the number of components of
(
). Making the same transformation of the independent variable as in section 2, Eq. (2.2), and using the result of Eq. (2.5), allows one to write the derivative in terms of the primed variables as,
All of the results, Eqs. (2.5), (3.5) and (4.2) follow from only one explicit use of the chain rule (in section 2).