Some vector algebra and the generalized chain rule

Ross Bannister

Data Assimilation Research Centre, University of Reading, UK

Last updated 01/01/02

1. Introduction and notation

As we shall see in these notes, the chain rule can be applied to vector as well as scalar derivatives. We will derive the relevant expressions useful in the theory of variational data assimilation and inverse modelling. The results lead us to the concept of adjoint variables and adjoint operators. In section 1 we review the standard notation used in linear algebra.

Vectors and vector derivatives

As is usual notation, scalars and vectors are distinguished from each other by writing vectors in bold. A vector is, by convention, a column vector (here with elements),

equation

and a vector derivative is a row operator,

equation

The "nabla" version of the derivative, contains the same elements as Eq. (1.2), but is a column vector by convention. The transpose symbol, written as a superscript after an object makes rows into columns and vice-versa.

Matrices and the transpose instruction

A matrix, , contains element in row and column . The transpose is thus,

equation

Since a vector is a special case of a matrix with either just one row or one column (depending on whether the vector is row or column vector), the transpose instruction here makes row vectors into column vectors, and vice-versa. In particular, for vector derivatives,

equation

The inner product

The combination (an inner product) is a scalar. It is found by the summation ( and must be vectors of the same number of elements, ),

equation

The outer product

The outer product is written and yields a matrix,

equation

The number of elements of and need not be the same for the outer product. For with elements and with elements, the outer product as defined above will be a matrix.

Matrix operators

A matrix acts on one vector to give another vector. The following action,

equation

is valid if the number of rows of () is the same as the number of elements in and the number of columns of () is the same as the number of elements in . Equation (1.7) is shorthand for,

equation

This action is like performing many inner products, one for each row of . In this respect, the matrix operator is sometimes used as a transformation (or change of basis) where each row of represents the row vector for a member of the new basis.

Generally, the matrix elements can be thought of as the partial derivatives,

equation

and the whole matrix can be written as,

equation

The inner and outer products and matrix operators applied with vectors and vector derivatives can be used in innovative ways to write compact multi-variable expressions. Such expressions are used in data assimilation.


2. Chain rule for scalar functions (first derivative)

Consider a scalar that is a function of the elements of , . Its derivative with respect to the vector is the vector,

equation

An important question is: what is in the case that the two sets of variables and are related via the transformation,

equation

is sometimes referred to as a Jacobean, and has matrix elements (as Eq. (1.9)). Let us write an Eq. for the derivative of with respect to , expressed explicitly via the chain rule,

equation

equation

Expressions for derivatives with respect to each component of can be assembled into a vector. It can be checked that the following, when expanded using Eqs. (1.3), (1.7) and (1.8), is equivalent to the above.

equation

This is the generalised chain rule for vector derivatives. A column derivative with respect to a vector, such as , is often called an adjoint variable. The operator (as distinct from the forward operator , as defined in Eq. (2.2)) is similarly called the adjoint operator. It is important to note that the adjoint of an operator is not generally its inverse: While transmits information from to (Eq. (2.2)), transmits information in the reverse direction, but for adjoint variables.

Using Eq. (1.10), Eq. (2.5) can be written as,

equation

This has the same appearance as the chain rule for single variable functions (now with vectors in the place of scalars) and is a convenient way of remembering the multi-variable result.


3. Chain rule for scalar functions (second derivative)

The second derivative with respect to the original variable, , can be written in matrix form as,

equation

equation

Although the right hand side of Eq. (3.1) resembles an inner product (scalar), the 'row' property of derivative vectors (mentioned in section 1) means that this is actually an outer product.

Again, imposing the transformation Eq. (2.2), the result, Eq. (2.5), can be used to rewrite the second derivative matrix in terms of the new, primed variables,

equation

equation

equation


4. Chain Rule for Vector Functions (First Derivative)

If the function itself is a vector, , then the derivative is a matrix,

equation

where the number of components of () is not necessarily the same as the number of components of (). Making the same transformation of the independent variable as in section 2, Eq. (2.2), and using the result of Eq. (2.5), allows one to write the derivative in terms of the primed variables as,

equation

All of the results, Eqs. (2.5), (3.5) and (4.2) follow from only one explicit use of the chain rule (in section 2).