Adept: A fast automatic differentiation library for C++
Robin Hogan, May 2012
(r.j.hogan@reading.ac.uk)
Overview
Adept (Automatic Differentiation using Expression Templates) is a
software library that enables algorithms written in C and C++ to
be automatically
differentiated. It uses an operator overloading approach, so
very little code modification is required. Differentiation can be
performed in forward mode, reverse mode (to compute the adjoint),
or the full Jacobian matrix can be computed. Moreover, the use of
expression
templates and several other important optimizations mean
that reverse-mode differentiation is 5-9 times faster than the
leading libraries that provide equivalent functionality
(ADOL-C
and CppAD) and less
memory is used. In fact, Adept is also only around 5-20% slower
than an adjoint code you might write by hand, but immeasurably
faster in terms of user time; adjoint coding is very time consuming
and error-prone. For further details, read the paper.
- Paper: Hogan, R. J., 2012: Fast reverse-mode automatic
differentiation using expression templates in C++. To be submitted
to Mathematics and Computers in
Simulation: PDF
file
- Web
site: http://www.met.rdg.ac.uk/clouds/adept
for the latest version of the code.
- Contact: If you use the code and have any comments,
queries, requests or bug-fixes then please
contact Robin Hogan. I'm
also interested to know of any uses of the code - then I can also keep
you updated on changes, bug-fixes etc.
Functionality
Adept provides the following functionality:
- Full Jacobian matrix. Given the non-linear
function y=f(x) coded in C or C++, Adept will
compute the matrix H=∂y/x, where the
element at row i and column j of H
is Hi,j=∂yi/∂xj. This
matrix will be computed much more rapidly and accurately than if you
simply recompute the function multiple times perturbing each element
of x one by one. The Jacobian matrix is used in
the Gauss-Newton
and Levenberg-Marquardt
minimization algorithms.
- Reverse-mode differentiation. This is a key component in
optimization problems where a non-linear function needs to be
minimized but the state vector x is too large for it to make
sense to compute the full Jacobian matrix. Atmospheric data
assimilation is the canonical example in Meteorology. Given a
non-linear function y=f(x) and a vector of
adjoints ∂J/∂y (where J is the scalar
function to be minimized), Adept will compute the vector of adjoints
∂J/∂x. Formally,
∂J/∂x is given by the matrix-vector
product HT∂J/∂y, but it is
computed here without computing the full Jacobian
matrix H. The adjoint may then be used in
a quasi-Newton
minimization scheme.
- Forward-mode differentiation. Given the non-linear
function y=f(x) and a vector of perturbations
δx, Adept will compute the corresponding vector
δy arising from a linearization of the
function f. Formally, δy is given by the
matrix-vector product Hδx, but it is computed
here without computing the full Jacobian matrix H. Note that
Adept is optimized for the reverse case, so might not be as fast
(and will certainly not be as economical in memory) in the forward
mode as libraries written especially for that purpose.
Note that at present Adept is missing some functionality that you
may require:
- Differentiation is first-order only: it cannot directly compute
higher-order derivatives such as the Hessian matrix.
- No support for complex numbers.
- Your code must only operate on variables individually: they can be
stored in C-style arrays or std::vector types, but Adept
does not work with containers that allow operations on entire arrays
such as the std::valarray type.
- C and C++ only; this library could not be written in Fortran since
the language provides no template capability.
It is hoped that future versions will remedy these limitations (and
it is hoped that a future version of Fortran will support templates).
License
Adept is released under
the GNU General
Public License (GPL). This means that you can use and modify the
library for any purpose, but that if you distribute a software
package that incorporates the library or a modified version of it,
you must release the source code for the entire software package
under the conditions of the GNU GPL. If you would like to use Adept
under other terms, please
contact Robin Hogan.
If you use Adept in a publication, please cite the Hogan (2012) paper. This is a
request and not a condition of the license.
How to use Adept to automatically differentiate your code
If you have
used ADOL-C
or CppAD then you will
already be familiar with what is involved in applying an
operator-overloading automatic differentiation package to your
code. The user interface to Adept differs from these two only in the
detail.
Step 1: Code preparation
Firstly, in all source files containing code to be differentiated,
you need to include the adept.h header file and import
the adouble type from the adept
namespace. Assuming your code uses double precision, you then search
and replace double with the "active"
equivalent adouble, but doing this only for those variables
whose values depend on the independent input variables.
Consider the contrived example in the paper:
double algorithm(const double x[2]) {
double y = 4.0;
double s = 2.0*x[0] + 3.0*x[1]*x[1];
y *= sin(s);
return y;
}
The modified code would look like this, where the changes are shown in
red:
#include "adept.h"
using adept::adouble;
adouble algorithm(const adouble x[2]) {
adouble y = 4.0;
adouble s = 2.0*x[0] + 3.0*x[1]*x[1];
y *= sin(s);
return y;
}
Changes like this need to be done in all source files that form
part of an algorithm to be differentiated. Note that you may
use adouble as the template argument of an Standard
Template Library (STL) vector type
(i.e. std::vector<adouble>), but unfortunately not
for the STL complex and valarray types.
If you need to access the real number underlying
an adouble variable a, for example in order to use
it as an argument to the fprintf function, then
use a.value() or adept::value(a). Of course, any
mathematical operations performed on this real number will not be
differentiated.
Step 2: Interface for reverse-mode differentiation
Suppose you wanted to create a wrapper around algorithm to
return not only the result but also to compute the adjoint, it would
look like this:
#include "adept.h"
double algorithm_ad(const double x_val[2], // Input values
double* Y_ad, // Input-output adjoint
double x_ad[2]) { // Output adjoint
using namespace adept; // Import Stack and adouble from adept
static Stack stack; // Where the derivative information is stored
stack.start(); // Start recording
adouble x[2] = {x_val[0], x_val[1]}; // Initialize adouble inputs
adouble Y = algorithm(x); // Call version overloaded for adouble args
Y.set_gradient(*Y_ad); // Load the input-output adjoint
stack.reverse(); // Run the adjoint algorithm
x_ad[0] = x[0].get_gradient(); // Extract the output adjoint for x[0]
x_ad[1] = x[1].get_gradient(); // ...and x[1]
*Y_ad = Y.get_gradient(); // Input-output adjoint will have changed too
return Y.value(); // Return the result of the simple computation
}
The component parts of this function deserve some comment:
- The Stack object is where the differential version of the
algorithm will be stored. It may optionally be
declared static which means that memory allocated in its
first usage will be reused if algorithm_ad is called
multiple times, saving the overhead of repeated memory
allocation. But don't use static in multi-threaded code.
- The start() member function of the Stack object
makes the stack accessible to subsequent expressions via a global
variable, but using thread-local storage to ensure thread
safety. Note that the stack must be started before the
first adouble declaration, since such a declaration
registers the adouble object with the currently active
stack. Otherwise the code will crash with a memory fault. If you
want to deactivate a particular stack in order to use another one,
you can call its detach() member function.
- The next line simply copies the input values to the algorithm
into adouble variables. These are the independent
variables, but note that there is no obligation for these to be
stored as one array (as in CppAD), and for forward and reverse mode
automatic differentiation you do not even need to declare which
variables are the independent ones. This line could be made slightly
more efficient: the operations of assigning the initial values
of x to real numbers leads to information being stored on
the stack that is not strictly needed for the
differentiation. Instead we could
use x[0].set_value(x_val[0]) and similarly
for x[1], or to set the two values simultaneously we could
use adept::set_values(a, 2, a_val).
- The algorithm is called, and behind the scenes the
equivalent derivative statement for every mathematical statement is
stored in the stack. The result of the forward calculation stored
in Y, known as a dependent variable. The example
above has one dependent variable, but any number is allowed, and
they could be returned in another way, e.g. by passing a
non-constant array to algorithm that is filled with the
final values when the function returns.
- The set_gradient member function of Y sets the
gradient (or adjoint) of Y. Multiple adjoints may be set at
this stage by repeated calls to set_gradient, and
if Y and Y_ad were arrays of length n
then they could be assigned together
using adept::set_gradients(Y, n, Y_ad).
- The reverse() member function of stack performs
the adjoint calculation, sweeping in reverse through the derivative
statements stored on the stack.
- The next two lines simply extract the gradients for the two
independent variables. Alternatively we could have extracted these
simultaneously using adept::get_gradients(x, 2, x_ad).
To do forward-mode differentiation in this example would involve
setting the initial gradients of x instead of Y,
calling the member function forward() instead
of reverse(), and extracting the final gradients
from Y instead of x.
Step 3: Compiling
Under a Unix-like environment, compile the code ensuring
that adept.h can be found (e.g. put it
in /home/me/include and compile
with -I/home/me/include) and link the code to the compiled
Adept library file libadept.a (e.g. put it
in /home/me/lib and compile with -L/home/me/lib
-ladept).
Calculating the full Jacobian matrix
If the full Jacobian matrix is required then the interface code can
be a little simpler:
#include "adept.h"
double algorithm_jac(const double x_val[2],// Input values
double jac[2]) { // Output Jacobian matrix
using namespace adept; // Import Stack and adouble from adept
static Stack stack; // Where the derivative information is stored
stack.start(); // Start recording
adouble x[2] = {x_val[0], x_val[1]}; // Initialize adouble inputs
adouble Y = algorithm(x); // Call version overloaded for adouble args
stack.independent(x, 2); // Identify independent variables
stack.dependent(Y); // Identify dependent variable(s)
stack.jacobian(jac); // Compute the Jacobian and store in jac
return Y.value(); // Return the result of the simple computation
}
The changes from the previous example are in red:
- The memory provided to store the Jacobian must be a
one-dimensional array of size m×n,
where m is the number of dependent variables and n is
the number of independent variables. The resulting matrix is stored
in the sense of the index representing the dependent variables
varying fastest.
- The independent member function of stack is used
to identify independent variables. In this case two independent
variables are identified at once, but applying to a
single adouble argument with no second argument will
identify just one independent variable. Multiple calls are possible
for multiple independent variables.
- The dependent member function of stack
identifies the dependent variables, and its usage is identical
to independent.
- The jacobian member function places the result in its
argument. Internally the calculation is performed by multiple
forward or reverse passes, whichever would be faster (dependent on
the numbers of independent and dependent variables).
Tips for the best performance
- If you are working with single-threaded code, or in a
multi-threaded program only one thread will be using
a Stack object, then you can get slightly faster code by
compiling with -DADEPT_STACK_THREAD_UNSAFE. This uses a
standard (i.e. non-thread-local) global variable to point to the
currently active stack object, which is slightly faster to access.
- If you compile with the -g option to store debugging
symbols, your object files and executable will be much larger
because every mathematical statement in the file will have the name
of its associated templated type stored in the file and these can be
long. Once you have debugged your implementation of Adept, you may
wish to omit debugging symbols from production versions of the
executable.
- On the GNU C++ compiler, the -O3 --param
max-inline-insns-single=2000 optimizations are recommended,
the latter of which enables the compiler to more agressively
inline the function calls associated with expressions.
- By default the Jacobian functions are compiled to process two rows
or columns of the Jacobian matrix at once. This is optimum for
modern Intel processors that use the SSE2 instructions to perform
two double-precision operations at once. On other platforms you may
wish to change this. To make the Jacobian functions
process n rows or columns at once, recompile the library with
-DADEPT_MULTIPASS_SIZE=n.
- As in ADOL-C, the optimum use of memory is achieved by creating
and destroying adouble objects in a last-in-first-out
(LIFO) fashion. This is most elegantly achieved by avoiding the use
of new and delete for variables whose scope is
limited to a single function or a single loop, and instead
using std::vector<adouble> containers. This way the
memory is automatically freed when the object goes out of scope, and
the C++ standard requires that the individual adouble
objects are destructed in the reverse order to that which they were
constructed, thereby satisfying Adept's LIFO requirement.
- To investigate the memory used by Adept, you may simply send a
Stack object to a stream, e.g. "std::cout <<
stack;". You may also use the memory() member function,
which returns the total number of bytes used. For more detail, after
running the algorithm, print the entire list of derivative
statements to standard output with stack.print().
See also