Skip to content

Latest commit

 

History

History
155 lines (112 loc) · 9.51 KB

File metadata and controls

155 lines (112 loc) · 9.51 KB

Operators

!!! tip If there are some concepts you do not understand, take a look at the book The Elements of Differentiable Programming (Blondel and Roulet, 2024).

List of operators

Given a function f(x) = y, there are several differentiation operators available. The terminology depends on:

  • the type and shape of the input x
  • the type and shape of the output y
  • the order of differentiation

Below we list and describe all the operators we support.

!!! warning The package is thoroughly tested with inputs and outputs of the following types: Float64, Vector{Float64} and Matrix{Float64}. We also expect it to work on most kinds of Number and AbstractArray variables. Beyond that, you are in uncharted territory. We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs might work in some cases, but we make no guarantees about that yet.

High-level operators

These operators are computed using only the input x.

operator order input x output y operator result type operator result shape
derivative 1 Number Any similar to y size(y)
second_derivative 2 Number Any similar to y size(y)
gradient 1 Any Number similar to x size(x)
jacobian 1 AbstractArray AbstractArray AbstractMatrix (length(y), length(x))
hessian 2 AbstractArray Number AbstractMatrix (length(x), length(x))

Low-level operators

These operators are computed using the input x and another argument t of type NTuple, which contains one or more tangents. You can think of tangents as perturbations propagated through the function; they live either in the same space as x or in the same space as y.

operator order input x output y element type of t operator result type operator result shape
pushforward (JVP) 1 Any Any similar to x similar to y size(y)
pullback (VJP) 1 Any Any similar to y similar to x size(x)
hvp 2 Any Number similar to x similar to x size(x)

Variants

Several variants of each operator are defined:

  • out-of-place operators return a new derivative object
  • in-place operators mutate the provided derivative object
out-of-place in-place out-of-place + primal in-place + primal
derivative derivative! value_and_derivative value_and_derivative!
second_derivative second_derivative! value_derivative_and_second_derivative value_derivative_and_second_derivative!
gradient gradient! value_and_gradient value_and_gradient!
hessian hessian! value_gradient_and_hessian value_gradient_and_hessian!
jacobian jacobian! value_and_jacobian value_and_jacobian!
pushforward pushforward! value_and_pushforward value_and_pushforward!
pullback pullback! value_and_pullback value_and_pullback!
hvp hvp! gradient_and_hvp gradient_and_hvp!

Mutation and signatures

Two kinds of functions are supported:

  • out-of-place functions f(x) = y
  • in-place functions f!(y, x) = nothing

!!! warning In-place functions only work with pushforward, pullback, derivative and jacobian. The other operators hvp, gradient and hessian require scalar outputs, so it makes no sense to mutate the number y.

This results in various operator signatures (the necessary arguments and their order):

function signature out-of-place operator (returns result) in-place operator (mutates result)
out-of-place function f op(f, backend, x, [t]) op!(f, result, backend, x, [t])
in-place function f! op(f!, y, backend, x, [t]) op!(f!, y, result, backend, x, [t])

!!! warning The positional arguments between f/f! and backend are always mutated, regardless of the bang ! in the operator name. In particular, for in-place functions f!(y, x), every variant of every operator will mutate y.

Preparation

Principle

In many cases, AD can be accelerated if the function has been called at least once (e.g. to record a tape) or if some cache objects are pre-allocated. This preparation procedure is backend-specific, but we expose a common syntax to achieve it.

operator preparation (different point) preparation (same point)
derivative prepare_derivative -
gradient prepare_gradient -
jacobian prepare_jacobian -
second_derivative prepare_second_derivative -
hessian prepare_hessian -
pushforward prepare_pushforward prepare_pushforward_same_point
pullback prepare_pullback prepare_pullback_same_point
hvp prepare_hvp prepare_hvp_same_point

In addition, the preparation syntax depends on the number of arguments accepted by the function.

function signature preparation signature
out-of-place function prepare_op(f, backend, x, [t])
in-place function prepare_op(f!, y, backend, x, [t])

Preparation creates an object called prep which contains the the necessary information to speed up an operator and its variants. The idea is that you prepare only once, which can be costly, but then call the operator several times while reusing the same prep.

op(f, backend, x, [t])  # slow because it includes preparation
op(f, prep, backend, x, [t])  # fast because it skips preparation

!!! warning The prep object is the last argument before backend and it is always mutated, regardless of the bang ! in the operator name. As a consequence, preparation is not thread-safe and sharing prep objects between threads may lead to unexpected behavior. If you need to run differentiation concurrently, prepare separate prep objects for each thread.

Reusing preparation

It is not always safe to reuse the results of preparation. For different-point preparation, the output prep of

prepare_op(f, [y], backend, x, [t, contexts...])

can be reused in subsequent calls to

op(f, prep, [other_y], backend, other_x, [other_t, other_contexts...])

provided that the following conditions all hold:

  • f and backend remain the same
  • other_x has the same type and size as x
  • other_y has the same type and size as y
  • other_t has the same type and size as t
  • all the elements of other_contexts have the same type and size as the corresponding elements of contexts

For same-point preparation, the same rules hold with two modifications:

  • other_x must be equal to x
  • any element of other_contexts with type Constant must be equal to the corresponding element of contexts

!!! danger Reusing preparation with different types or sizes may work with some backends and error with others, so it is not allowed by the API of DifferentiationInterface.

!!! warning These rules hold for the majority of backends, but there are some exceptions. The most important exception is ReverseDiff and its taping mechanism, which is sensitive to control flow inside the function.