You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Preparation mechanism (e.g. to create a config or tape)
26
-
- Thorough validation on standard inputs and outputs (scalars, vectors, matrices)
26
+
- Thorough validation on standard inputs and outputs (numbers, vectors, matrices)
27
27
- Testing and benchmarking utilities accessible to users with [DifferentiationInterfaceTest](https://github.com/gdalle/DifferentiationInterface.jl/tree/main/DifferentiationInterfaceTest)
Second-order operators can also be used with a combination of backends inside the [`SecondOrder`](@ref) struct.
120
-
There are many possible combinations, a lot of which will fail.
121
-
Due to compilation overhead, we do not currently test them all to display the working ones in the documentation, but we might if users deem it relevant.
We use the syntactic convention `!!` to signal that some of the arguments _can_ be mutated, but they do not _have to be_.
38
-
Such arguments will always be part of the return, so that one can simply reuse the operator's output and forget its input.
39
-
In other words, this is good:
40
-
```julia
41
-
# work with grad_in
42
-
grad_out = gradient!!(f, grad_in, backend, x)
43
-
# work with grad_out: OK
44
-
```
45
-
On the other hand, this is bad, because if `grad_in` has not been mutated, you will forget the results:
46
-
```julia
47
-
# work with grad_in
48
-
gradient!!(f, grad_in, backend, x)
49
-
# mistakenly keep working with grad_in: NOT OK
50
-
```
51
-
Note that we don't guarantee `grad_out` will have the same type as `grad_in`.
52
-
Its type can even depend on the choice of backend.
53
-
54
-
## Second order
55
-
56
-
Second-order differentiation is also supported.
57
-
You can either pick a single backend to do all the work, or combine an "outer" backend with an "inner" backend using the [`SecondOrder`](@ref) struct, like so: `SecondOrder(outer, inner)`.
58
-
59
-
The available operators are similar to first-order ones:
60
-
61
-
| operator | input `x`| output `y`| result type | result shape |
Second-order differentiation is still experimental, use at your own risk.
47
+
In order to ensure symmetry between one-argument functions `f(x) = y` and two-argument functions `f!(y, x) = nothing`, we define the same operators for both cases.
|`f!(y, x)`|`operator(f!, y, backend, x, ...)`|`operator!(f!, y, res, backend, x, ...)`|
54
+
55
+
!!! warning
56
+
Every variant of the operator will mutate `y` when applied to a two-argument function `f!(y, x) = nothing`, even if it does not have a `!` in its name.
77
57
78
58
## Preparation
79
59
80
-
In many cases, AD can be accelerated if the function has been run at least once (e.g. to record a tape) and if some cache objects are provided.
60
+
In many cases, AD can be accelerated if the function has been run at least once (e.g. to create a config or record a tape) and if some cache objects are provided.
81
61
This is a backend-specific procedure, but we expose a common syntax to achieve it.
82
62
83
63
| operator | preparation function |
@@ -91,42 +71,69 @@ This is a backend-specific procedure, but we expose a common syntax to achieve i
91
71
|`pullback`|[`prepare_pullback`](@ref)|
92
72
|`hvp`|[`prepare_hvp`](@ref)|
93
73
94
-
If you run `prepare_operator(backend, f, x)`, it will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
95
-
This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x`, but it should work with different _values_ of `x`.
74
+
If you run `prepare_operator(backend, f, x, [seed])`, it will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
75
+
This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x` and the _control flow_ within the function, but it should work with different _values_ of `x`.
96
76
97
77
You can then call `operator(backend, f, x2, extras)`, which should be faster than `operator(f, backend, x2)`.
98
78
This is especially worth it if you plan to call `operator` several times in similar settings: you can think of it as a warm up.
99
79
100
80
!!! warning
101
-
For `SecondOrder` backends, the inner differentiation cannot be prepared at the moment, only the outer one is.
81
+
The `extras` object is nearly always mutated, even if the operator does not have a `!` in its name.
102
82
103
-
##FAQ
83
+
### Second order
104
84
105
-
### Multiple inputs/outputs
85
+
We offer two ways to perform second-order differentiation (for [`second_derivative`](@ref), [`hvp`](@ref) and [`hessian`](@ref)):
106
86
107
-
Restricting the API to one input and one output has many coding advantages, but it is not very flexible.
108
-
If you need more than that, use [ComponentArrays.jl](https://github.com/jonniedie/ComponentArrays.jl) to wrap several objects inside a single `ComponentVector`.
87
+
- pick a single backend to do all the work
88
+
- combine an "outer" and "inner" backend within the [`SecondOrder`](@ref) struct: the inner backend will be called first, and the outer backend will differentiate the generated code
89
+
90
+
!!! warning
91
+
There are many possible backend combinations, a lot of which will fail.
92
+
At the moment, trial and error is your best friend.
93
+
Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.
94
+
95
+
## Experimental
96
+
97
+
!!! danger
98
+
Everything in this section is still experimental, use it at your own risk.
109
99
110
100
### Sparsity
111
101
112
-
If you need to work with sparse Jacobians, you can pick one of the [sparse backends](@ref Sparse) from [ADTypes.jl](https://github.com/SciML/ADTypes.jl).
113
-
The sparsity pattern is computed automatically with [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl) during the preparation step.
102
+
[ADTypes.jl](https://github.com/SciML/ADTypes.jl) provides [sparse versions](@ref Sparse) of many common AD backends.
103
+
They can accelerate the computation of sparse Jacobians and Hessians:
114
104
115
-
If you need to work with sparse Hessians, you can use a sparse backend as the _outer_ backend of a `SecondOrder`.
116
-
This means the Hessian is obtained as the sparse Jacobian of the gradient.
105
+
- for sparse Jacobians, just select one of them as your first-order backend.
106
+
- for sparse Hessians, select one of them as the _outer part_ of a [`SecondOrder`](@ref) backend (in that case, the Hessian is obtained as the sparse Jacobian of the gradient).
117
107
118
-
!!! danger
119
-
Sparsity support is still experimental, use at your own risk.
108
+
The sparsity pattern is computed automatically with [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl) during the preparation step.
109
+
110
+
!!! info "Planned feature"
111
+
Modular sparsity pattern computation, with other algorithms beyond those from Symbolics.jl.
120
112
121
113
### Split reverse mode
122
114
123
115
Some reverse mode AD backends expose a "split" option, which runs only the forward sweep, and encapsulates the reverse sweep in a closure.
124
116
We make this available for all backends with the following operators:
Split reverse mode is still experimental, use at your own risk.
122
+
## Not supported
123
+
124
+
### Batched evaluation
125
+
126
+
!!! info "Planned feature"
127
+
Interface for providing several pushforward / pullback seeds at once, similar to the chunking in ForwardDiff.jl or the batches in Enzyme.jl.
128
+
129
+
### Non-standard types
130
+
131
+
The package is thoroughly tested with inputs and outputs of the following types: `Float64`, `Vector{Float64}` and `Matrix{Float64}`.
132
+
We also expect it to work on all kinds of `Number` and `AbstractArray` variables.
133
+
Beyond that, you are in uncharted territory.
134
+
We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs _might work with some backends_, but we make no guarantees about that.
135
+
136
+
### Multiple inputs/outputs
137
+
138
+
Restricting the API to one input and one output has many coding advantages, but it is not very flexible.
139
+
If you need more than that, use [ComponentArrays.jl](https://github.com/jonniedie/ComponentArrays.jl) to wrap several objects inside a single `ComponentVector`.
0 commit comments