Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion DifferentiationInterface/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.11...main)
## [Unreleased](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.12...main)

## [0.7.12](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.11...DifferentiationInterface-v0.7.12)

### Added

- Better documentation on argument assumptions ([#917](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/917))

### Fixed

- Speed up Mooncake in forward mode by preallocating tangents ([#915](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/915))
- Speed up Mooncake reverse mode with selective zeroing ([#916](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/916))

## [0.7.11](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.10...DifferentiationInterface-v0.7.11)

Expand Down
4 changes: 2 additions & 2 deletions DifferentiationInterface/Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "DifferentiationInterface"
uuid = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
authors = ["Guillaume Dalle", "Adrian Hill"]
version = "0.7.11"
version = "0.7.12"

[deps]
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
Expand Down Expand Up @@ -74,7 +74,7 @@ PolyesterForwardDiff = "0.1.2"
ReverseDiff = "1.15.1"
SparseArrays = "1"
SparseConnectivityTracer = "0.6.14, 1"
SparseMatrixColorings = "0.4.9"
SparseMatrixColorings = "0.4.23"
StaticArrays = "1.9.7"
Symbolics = "5.27.1, 6, 7"
Tracker = "0.2.33"
Expand Down
3 changes: 2 additions & 1 deletion DifferentiationInterface/docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,14 @@ makedocs(;
pages = [
"Home" => "index.md",
"Tutorials" => ["tutorials/basic.md", "tutorials/advanced.md"],
"api.md",
"Explanation" => [
"explanation/arguments.md",
"explanation/operators.md",
"explanation/backends.md",
"explanation/advanced.md",
],
"FAQ" => ["faq/limitations.md", "faq/differentiability.md"],
"api.md",
"Development" => [
"dev/internals.md",
"dev/math.md",
Expand Down
53 changes: 12 additions & 41 deletions DifferentiationInterface/docs/src/explanation/advanced.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,12 @@
# Advanced features

## Contexts

### Additional arguments

For all operators provided DifferentiationInterface, there can be only one differentiated (or "active") argument, which we call `x`.
However, the release v0.6 introduced the possibility of additional "context" arguments, which are not differentiated but still passed to the function after `x`.

Contexts can be useful if you have a function `y = f(x, a, b, c, ...)` or `f!(y, x, a, b, c, ...)` and you want derivatives of `y` with respect to `x` only.
Another option would be creating a closure, but that is sometimes undesirable.

### Types of contexts

Every context argument must be wrapped in a subtype of [`Context`](@ref) and come after the differentiated input `x`.
Right now, there are two kinds of context: [`Constant`](@ref) and [`Cache`](@ref).

!!! warning

Not every backend supports every type of context. See the documentation on [Backends](@ref) for more details.

Semantically, both of these calls compute the partial gradient of `f(x, c)` with respect to `x`, but they consider `c` differently:

```julia
gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))
```

In the first call, `c` is kept unchanged throughout the function evaluation.
In the second call, `c` can be mutated with values computed during the function.

Importantly, one can prepare an operator with an arbitrary value `c'` of the `Constant` (subject to the usual restrictions on preparation).
The values in a provided `Cache` never matter anyway.

## Sparsity

When faced with sparse Jacobian or Hessian matrices, one can take advantage of their sparsity pattern to speed up the computation.
DifferentiationInterface does this automatically if you pass a backend of type [`AutoSparse`](@extref ADTypes.AutoSparse).

!!! tip

To know more about sparse AD, read the survey [_What Color Is Your Jacobian? Graph Coloring for Computing Derivatives_](https://epubs.siam.org/doi/10.1137/S0036144504444711) (Gebremedhin et al., 2005).

### `AutoSparse` object
Expand All @@ -48,29 +16,32 @@ An `AutoSparse` backend must be constructed from three ingredients:

1. An underlying (dense) backend, which can be [`SecondOrder`](@ref) or anything from [ADTypes.jl](https://github.com/SciML/ADTypes.jl)

2. A sparsity pattern detector like:
2. A sparsity pattern detector following the [`ADTypes.AbstractSparsityDetector`](@extref ADTypes.AbstractSparsityDetector) interface, such as:

+ [`TracerSparsityDetector`](@extref SparseConnectivityTracer.TracerSparsityDetector) from [SparseConnectivityTracer.jl](https://github.com/adrhill/SparseConnectivityTracer.jl)
+ [`SymbolicsSparsityDetector`](@extref Symbolics.SymbolicsSparsityDetector) from [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl)
+ [`DenseSparsityDetector`](@ref) from DifferentiationInterface.jl (beware that this detector only gives a locally valid pattern)
+ [`KnownJacobianSparsityDetector`](@extref ADTypes.KnownJacobianSparsityDetector) or [`KnownHessianSparsityDetector`](@extref ADTypes.KnownHessianSparsityDetector) from [ADTypes.jl](https://github.com/SciML/ADTypes.jl) (if you already know the pattern)
3. A coloring algorithm from [SparseMatrixColorings.jl](https://github.com/gdalle/SparseMatrixColorings.jl), such as:

+ [`GreedyColoringAlgorithm`](@extref SparseMatrixColorings.GreedyColoringAlgorithm) (our generic recommendation)

3. A coloring algorithm following the [`ADTypes.AbstractColoringAlgorithm`](@extref ADTypes.AbstractColoringAlgorithm) interface, such as those from [SparseMatrixColorings.jl](https://github.com/gdalle/SparseMatrixColorings.jl):

+ [`GreedyColoringAlgorithm`](@extref SparseMatrixColorings.GreedyColoringAlgorithm) (our generic recommendation, don't forget to tune the `order` parameter)
+ [`ConstantColoringAlgorithm`](@extref SparseMatrixColorings.ConstantColoringAlgorithm) (if you have already computed the optimal coloring and always want to return it)
+ [`OptimalColoringAlgorithm`](@extref SparseMatrixColorings.OptimalColoringAlgorithm) (if you have a low-dimensional matrix for which you want to know the best possible coloring)

!!! note

Symbolic backends have built-in sparsity handling, so `AutoSparse(AutoSymbolics())` and `AutoSparse(AutoFastDifferentiation())` do not need additional configuration for pattern detection or coloring.

### Cost of sparse preparation
### Reusing sparse preparation

The preparation step of `jacobian` or `hessian` with an `AutoSparse` backend can be long, because it needs to detect the sparsity pattern and perform a matrix coloring.
But after preparation, the more zeros are present in the matrix, the greater the speedup will be compared to dense differentiation.

!!! danger

The result of preparation for an `AutoSparse` backend cannot be reused if the sparsity pattern changes.
In particular, during preparation, make sure to pick input and context values that do not give rise to exceptional patterns (e.g. with too many zeros because of a multiplication with a constant `c = 0`, which may then be non-zero later on). Random values are usually a better choice during sparse preparation.

### Tuning the coloring algorithm

Expand Down
71 changes: 71 additions & 0 deletions DifferentiationInterface/docs/src/explanation/arguments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Arguments

## General guidelines

### Function form

DifferentiationInterface only computes derivatives for functions with one of two specific forms:

```julia
y = f(x, contexts...) # out of place, returns `y`
f!(y, x, contexts...) # in place, returns `nothing`
```

In this notation:

- `f` (or `f!`) is the differentiated function
- `y` is the output
- `x` is the input, the only "active" argument, which always comes first
- `contexts` may contain additional, inactive arguments

The quantities returned by the various [operators](@ref "Operators") always correspond to (partial) derivatives of `y` with respect to `x`.

### Assumptions

The package makes one central assumption on the behavior and implementation of `f` (or `f!`):

!!! danger "Mutation rule"
Either an argument's provided value matters, or it can be mutated during the function call, but never both.

This rule is declined as follows:

- The provided value of `x` matters because we evaluate and differentiate `f` at point `x`. Therefore, `x` cannot be mutated by the function.
- For in-place functions `f!`, the output `y` is meant to be overwritten. Hence, its provided (initial) value cannot matter, and it must be entirely overwritten.

!!! warning
Whether or not the function object itself can be mutated is a tricky question, and support for this varies between backends.
When in doubt, try to avoid mutating functions and pass contexts instead.
In any case, DifferentiationInterface will assume that the recursive components (fields, subfields, etc.) of `f` or `f!` individually satisfy the same mutation rule: whenever the initial value matters, no mutation is allowed.

## Contexts

### Motivation

As stated, there can be only one active argument, which we call `x`.
However, version 0.6 of the package introduced the possibility of additional "context" arguments, whose derivatives we don't need to compute.
Contexts can be useful if you have a function `y = f(x, a, b, c, ...)` or `f!(y, x, a, b, c, ...)` and you only want the derivative of `y` with respect to `x`.
Another option would be creating a closure, but that is sometimes undesirable for performance reasons.

Every context argument must be wrapped in a subtype of [`Context`](@ref) and come after the active argument `x`.

### Context types

There are three kinds of context: [`Constant`](@ref), [`Cache`](@ref) and the hybrid [`ConstantOrCache`](@ref).
Those are also classified based on the mutation rule:

- [`Constant`](@ref) contexts wrap data that influences the output of the function. Hence they cannot be mutated.
- [`Cache`](@ref) contexts correspond to scratch spaces that can be mutated at will. Hence their provided value is arbitrary.
- [`ConstantOrCache`](@ref) is a hybrid, whose recursive components (fields, subfields, etc.) must individually satisfy the assumptions of either `Constant` or `Cache`.

Semantically, both of these calls compute the partial gradient of `f(x, c)` with respect to `x`, but they consider `c` differently:

```julia
gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))
```

In the first call, `c` must be kept unchanged throughout the function evaluation.
In the second call, `c` may be mutated with values computed during the function.

!!! warning
Not every backend supports every type of context. See the documentation on [backends](@ref "Backends") for more details.
49 changes: 25 additions & 24 deletions DifferentiationInterface/docs/src/explanation/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,33 @@

We support the following dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl):

- [`AutoChainRules`](@extref ADTypes.AutoChainRules)
- [`AutoDiffractor`](@extref ADTypes.AutoDiffractor)
- [`AutoEnzyme`](@extref ADTypes.AutoEnzyme)
- [`AutoFastDifferentiation`](@extref ADTypes.AutoFastDifferentiation)
- [`AutoFiniteDiff`](@extref ADTypes.AutoFiniteDiff)
- [`AutoFiniteDifferences`](@extref ADTypes.AutoFiniteDifferences)
- [`AutoForwardDiff`](@extref ADTypes.AutoForwardDiff)
- [`AutoGTPSA`](@extref ADTypes.AutoGTPSA)
- [`AutoMooncake`](@extref ADTypes.AutoMooncake) and [`AutoMooncakeForward`](@extref ADTypes.AutoMooncake) (the latter is experimental)
- [`AutoPolyesterForwardDiff`](@extref ADTypes.AutoPolyesterForwardDiff)
- [`AutoReverseDiff`](@extref ADTypes.AutoReverseDiff)
- [`AutoSymbolics`](@extref ADTypes.AutoSymbolics)
- [`AutoTracker`](@extref ADTypes.AutoTracker)
- [`AutoZygote`](@extref ADTypes.AutoZygote)
- [`AutoChainRules`](@extref ADTypes.AutoChainRules)
- [`AutoDiffractor`](@extref ADTypes.AutoDiffractor)
- [`AutoEnzyme`](@extref ADTypes.AutoEnzyme)
- [`AutoFastDifferentiation`](@extref ADTypes.AutoFastDifferentiation)
- [`AutoFiniteDiff`](@extref ADTypes.AutoFiniteDiff)
- [`AutoFiniteDifferences`](@extref ADTypes.AutoFiniteDifferences)
- [`AutoForwardDiff`](@extref ADTypes.AutoForwardDiff)
- [`AutoGTPSA`](@extref ADTypes.AutoGTPSA)
- [`AutoMooncake`](@extref ADTypes.AutoMooncake) and [`AutoMooncakeForward`](@extref ADTypes.AutoMooncake) (the latter is experimental)
- [`AutoPolyesterForwardDiff`](@extref ADTypes.AutoPolyesterForwardDiff)
- [`AutoReverseDiff`](@extref ADTypes.AutoReverseDiff)
- [`AutoSymbolics`](@extref ADTypes.AutoSymbolics)
- [`AutoTracker`](@extref ADTypes.AutoTracker)
- [`AutoZygote`](@extref ADTypes.AutoZygote)

## Features

Given a backend object, you can use:

- [`check_available`](@ref) to know whether the required AD package is loaded
- [`check_inplace`](@ref) to know whether the backend supports in-place functions (all backends support out-of-place functions)
- [`check_available`](@ref) to know whether the required AD package is loaded
- [`check_inplace`](@ref) to know whether the backend supports in-place functions (all backends support out-of-place functions)

In theory, all we need from each backend is either a `pushforward` or a `pullback`: we can deduce every other operator from these two.
In practice, many AD backends have custom implementations for high-level operators like `gradient` or `jacobian`, which we reuse whenever possible.

!!! details

In the rough summary table below,

- ✅ means that we reuse the custom implementation from the backend;
Expand Down Expand Up @@ -90,7 +90,7 @@ The inner backend will be called first, and the outer backend will differentiate
In general, using a forward outer backend over a reverse inner backend will yield the best performance.

!!! danger

Second-order AD is tricky, and many backend combinations will fail (even if you combine a backend with itself).
Be ready to experiment and open issues if necessary.

Expand All @@ -99,6 +99,7 @@ In general, using a forward outer backend over a reverse inner backend will yiel
The wrapper [`DifferentiateWith`](@ref) allows you to switch between backends.
It takes a function `f` and specifies that `f` should be differentiated with the substitute backend of your choice, instead of whatever true backend the surrounding code is trying to use.
In other words, when someone tries to differentiate `dw = DifferentiateWith(f, substitute_backend)` with `true_backend`, then `substitute_backend` steps in and `true_backend` does not dive into the function `f` itself.

At the moment, `DifferentiateWith` only works when `true_backend` is either [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl), reverse-mode [Mooncake.jl](https://github.com/chalk-lab/Mooncake.jl), or a [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl)-compatible backend (e.g., [Zygote.jl](https://github.com/FluxML/Zygote.jl)).

## Implementations
Expand All @@ -117,7 +118,7 @@ Same-point preparation runs the forward sweep and returns the pullback closure.
We only implement `pushforward`.

!!! danger

The latest releases of Diffractor [broke DifferentiationInterface](https://github.com/JuliaDiff/Diffractor.jl/issues/290).

### Enzyme
Expand All @@ -126,7 +127,7 @@ Depending on the `mode` attribute inside [`AutoEnzyme`](@extref ADTypes.AutoEnzy
When necessary, preparation chooses a number of chunks (for `gradient` and `jacobian` in forward mode, for `jacobian` only in reverse mode).

!!! warning

Enzyme.jl's handling of activities and multiple arguments is not fully supported here, which can cause slowdowns or errors.
If differentiation fails or takes too long, consider using Enzyme.jl through its [native API](https://enzymead.github.io/Enzyme.jl/stable/) instead.

Expand All @@ -135,7 +136,7 @@ When necessary, preparation chooses a number of chunks (for `gradient` and `jaco
For every operator, preparation generates an [executable function](https://brianguenter.github.io/FastDifferentiation.jl/stable/makefunction/) from the symbolic expression of the differentiated function.

!!! warning

Preparation can be very slow for symbolic AD.

### FiniteDiff
Expand All @@ -159,7 +160,7 @@ For all operators, preparation preallocates the input [`TPS`s](https://bmad-sim.
If a GTPSA [`Descriptor`](https://bmad-sim.github.io/GTPSA.jl/stable/man/b_descriptor/) is not provided to `AutoGTPSA`, then a `Descriptor` will be generated in preparation based on the context.

!!! danger

When providing a custom GTPSA `Descriptor` to `AutoGTPSA`, it is the responsibility of the user to ensure that the number of [GTPSA "variables"](https://bmad-sim.github.io/GTPSA.jl/stable/quickstart/#Calculating-a-Truncated-Power-Series) specified in the `Descriptor` is consistent with the number of inputs of the provided function. Undefined behavior and crashes may occur if this is not the case.

### PolyesterForwardDiff
Expand All @@ -175,7 +176,7 @@ This tape is computed from the input `x` provided at preparation time.
It is control-flow dependent, so only one branch is recorded at each `if` statement.

!!! danger

If your function has value-specific control flow (like `if x[1] > 0` or `if c == 1`), you may get silently wrong results whenever it takes new branches that were not taken during preparation.
You must make sure to run preparation with an input and contexts whose values trigger the correct control flow for future executions.

Expand All @@ -186,7 +187,7 @@ Whenever contexts are provided, tape recording is deactivated in all cases, beca
For all operators, preparation generates an [executable function](https://docs.sciml.ai/Symbolics/stable/manual/build_function/) from the symbolic expression of the differentiated function.

!!! warning

Preparation can be very slow for symbolic AD.

### Mooncake
Expand Down
Loading
Loading