Skip to content

Commit 3d452e0

Browse files
authored
Improve docs (#234)
1 parent 2cac72b commit 3d452e0

4 files changed

Lines changed: 60 additions & 64 deletions

File tree

DifferentiationInterface/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,15 +72,15 @@ julia> Pkg.add(
7272

7373
```julia
7474
using DifferentiationInterface
75-
import ForwardDiff, Enzyme, Zygote # import automatic differentiation backends you want to use
75+
import ForwardDiff, Enzyme, Zygote # AD backends you want to use
7676

7777
f(x) = sum(abs2, x)
7878

79-
x = [1.0, 2.0, 3.0]
79+
x = [1.0, 2.0]
8080

81-
value_and_gradient(f, AutoForwardDiff(), x) # returns (14.0, [2.0, 4.0, 6.0]) using ForwardDiff.jl
82-
value_and_gradient(f, AutoEnzyme(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Enzyme.jl
83-
value_and_gradient(f, AutoZygote(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Zygote.jl
81+
value_and_gradient(f, AutoForwardDiff(), x) # returns (5.0, [2.0, 4.0]) with ForwardDiff.jl
82+
value_and_gradient(f, AutoEnzyme(), x) # returns (5.0, [2.0, 4.0]) with Enzyme.jl
83+
value_and_gradient(f, AutoZygote(), x) # returns (5.0, [2.0, 4.0]) with Zygote.jl
8484
```
8585

8686
For more performance, take a look at the [DifferentiationInterface tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/tutorial/).

DifferentiationInterface/docs/src/backends.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,34 +47,36 @@ backend_table = Markdown.parse(String(take!(io)))
4747

4848
## Types
4949

50-
We support all dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl), as well as their sparse wrapper `AutoSparse`.
50+
We support all dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl), as well as their sparse wrapper [`AutoSparse`](@ref).
5151

5252
For sparse backends, only the Jacobian and Hessian operators are implemented differently, the other operators behave the same as for the corresponding dense backend.
5353

5454
```@example backends
5555
backend_table #hide
5656
```
5757

58-
## Availability
58+
## Checks
59+
60+
### Availability
5961

6062
You can use [`check_available`](@ref) to verify whether a given backend is loaded.
6163

62-
## Support for two-argument functions
64+
### Support for two-argument functions
6365

6466
All backends are compatible with one-argument functions `f(x) = y`.
6567
Only some are compatible with two-argument functions `f!(y, x) = nothing`.
6668
You can check this compatibility using [`check_twoarg`](@ref).
6769

68-
## Hessian support
70+
### Support for Hessian
6971

7072
Only some backends are able to compute Hessians.
71-
You can use [`check_hessian`](@ref) to check this feature.
73+
You can use [`check_hessian`](@ref) to check this feature (beware that it will try to compute a small Hessian, so it is not instantaneous).
7274

7375
## API reference
7476

7577
!!! warning
76-
The following documentation has been re-exported from [ADTypes.jl](https://github.com/SciML/ADTypes.jl).
77-
Refer to the ADTypes documentation for more information.
78+
The following documentation has been borrowed from ADTypes.jl.
79+
Refer to the [ADTypes documentation](https://sciml.github.io/ADTypes.jl/stable/) for more information.
7880

7981
```@docs
8082
ADTypes

DifferentiationInterface/docs/src/overview.md

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,16 @@ We provide the following high-level operators:
1111
| [`derivative`](@ref) | 1 | `Number` | `Number` or `AbstractArray` | same as `y` | `size(y)` |
1212
| [`second_derivative`](@ref) | 2 | `Number` | `Number` or `AbstractArray` | same as `y` | `size(y)` |
1313
| [`gradient`](@ref) | 1 | `AbstractArray` | `Number` | same as `x` | `size(x)` |
14-
| [`hvp`](@ref) | 2 | `AbstractArray` | `Number` | same as `x` | `size(x)` |
1514
| [`hessian`](@ref) | 2 | `AbstractArray` | `Number` | `AbstractMatrix` | `(length(x), length(x))` |
1615
| [`jacobian`](@ref) | 1 | `AbstractArray` | `AbstractArray` | `AbstractMatrix` | `(length(y), length(x))` |
1716

18-
They can all be derived from two low-level operators:
17+
They can be derived from lower-level operators:
1918

20-
| operator | order | input `x` | output `y` | result type | result shape |
21-
| :----------------------------- | :---- | :--------- | :----------- | :---------- | :----------- |
22-
| [`pushforward`](@ref) (or JVP) | 1 | `Any` | `Any` | same as `y` | `size(y)` |
23-
| [`pullback`](@ref) (or VJP) | 1 | `Any` | `Any` | same as `x` | `size(x)` |
19+
| operator | order | input `x` | output `y` | seed `v` | result type | result shape |
20+
| :----------------------------- | :---- | :-------------- | :----------- | :------- | :---------- | :----------- |
21+
| [`pushforward`](@ref) (or JVP) | 1 | `Any` | `Any` | `dx` | same as `y` | `size(y)` |
22+
| [`pullback`](@ref) (or VJP) | 1 | `Any` | `Any` | `dy` | same as `x` | `size(x)` |
23+
| [`hvp`](@ref) | 2 | `AbstractArray` | `Number` | `dx` | same as `x` | `size(x)` |
2424

2525
Luckily, most backends have custom implementations, which we reuse if possible instead of relying on fallbacks.
2626

@@ -36,26 +36,25 @@ Several variants of each operator are defined:
3636
| [`derivative`](@ref) | [`derivative!`](@ref) | [`value_and_derivative`](@ref) | [`value_and_derivative!`](@ref) |
3737
| [`second_derivative`](@ref) | [`second_derivative!`](@ref) | NA | NA |
3838
| [`gradient`](@ref) | [`gradient!`](@ref) | [`value_and_gradient`](@ref) | [`value_and_gradient!`](@ref) |
39-
| [`hvp`](@ref) | [`hvp!`](@ref) | NA | NA |
4039
| [`hessian`](@ref) | [`hessian!`](@ref) | NA | NA |
4140
| [`jacobian`](@ref) | [`jacobian!`](@ref) | [`value_and_jacobian`](@ref) | [`value_and_jacobian!`](@ref) |
4241
| [`pushforward`](@ref) | [`pushforward!`](@ref) | [`value_and_pushforward`](@ref) | [`value_and_pushforward!`](@ref) |
4342
| [`pullback`](@ref) | [`pullback!`](@ref) | [`value_and_pullback`](@ref) | [`value_and_pullback!`](@ref) |
43+
| [`hvp`](@ref) | [`hvp!`](@ref) | NA | NA |
4444

4545
## Mutation and signatures
4646

4747
In order to ensure symmetry between one-argument functions `f(x) = y` and two-argument functions `f!(y, x) = nothing`, we define the same operators for both cases.
4848
However they have different signatures:
4949

50-
| signature | out-of-place | in-place |
51-
| :--------- | :--------------------------------- | :------------------------------------------ |
52-
| `f(x)` | `operator(f, backend, x, ...)` | `operator!(f, result, backend, x, ...)` |
53-
| `f!(y, x)` | `operator(f!, y, backend, x, ...)` | `operator!(f!, y, result, backend, x, ...)` |
50+
| signature | out-of-place | in-place |
51+
| :--------- | :------------------------------------------- | :---------------------------------------------------- |
52+
| `f(x)` | `operator(f, backend, x, [v], [extras])` | `operator!(f, result, backend, x, [v], [extras])` |
53+
| `f!(y, x)` | `operator(f!, y, backend, x, [v], [extras])` | `operator!(f!, y, result, backend, x, [v], [extras])` |
5454

5555
!!! warning
5656
Our mutation convention is that all positional arguments between `f`/`f!` and `backend` are mutated (the `extras` as well, see below).
5757
This convention holds regardless of the bang `!` in the operator name, because we assume that a user passing a two-argument function `f!(y, x)` anticipates mutation anyway.
58-
5958
Still, better be careful with two-argument functions, because every variant of the operator will mutate `y`... even if it does not have a `!` in its name (see the bottom left cell in the table).
6059

6160
## Preparation
@@ -78,8 +77,8 @@ Unsurprisingly, preparation syntax depends on the number of arguments:
7877

7978
| signature | preparation signature |
8079
| :--------- | :----------------------------------------- |
81-
| `f(x)` | `prepare_operator(f, backend, x, ...)` |
82-
| `f!(y, x)` | `prepare_operator(f!, y, backend, x, ...)` |
80+
| `f(x)` | `prepare_operator(f, backend, x, [v])` |
81+
| `f!(y, x)` | `prepare_operator(f!, y, backend, x, [v])` |
8382

8483
The preparation `prepare_operator(f, backend, x)` will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
8584
This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x` and the _control flow_ within the function, but it should work with different _values_ of `x`.
@@ -102,6 +101,9 @@ We offer two ways to perform second-order differentiation (for [`second_derivati
102101
At the moment, trial and error is your best friend.
103102
Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.
104103

104+
!!! warning
105+
Preparation does not yet work for the inner differentiation step of a `SecondOrder`, only the outer differentiation is prepared.
106+
105107
## Experimental
106108

107109
!!! danger
@@ -125,9 +127,10 @@ We make this available for all backends with the following operators:
125127

126128
### Translation
127129

128-
The wrapper [`DifferentiateWith`](@ref) allows you to take a function and specify that it should be differentiated with the backend of your choice.
129-
In other words, when you try to differentiate `dw = DifferentiateWith(f, backend1)` with `backend2`, then `backend1` steps in and `backend2` does nothing.
130-
At the moment it only works when `backend2` supports [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl).
130+
The wrapper [`DifferentiateWith`](@ref) allows you to translate between AD backends.
131+
It takes a function `f` and specifies that `f` should be differentiated with the backend of your choice, instead of whatever other backend the code is trying to use.
132+
In other words, when someone tries to differentiate `dw = DifferentiateWith(f, backend1)` with `backend2`, then `backend1` steps in and `backend2` does nothing.
133+
At the moment, `DifferentiateWith` only works when `backend2` supports [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl).
131134

132135
## Going further
133136

DifferentiationInterface/docs/src/tutorial.md

Lines changed: 26 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,8 @@ We present a typical workflow with DifferentiationInterface.jl and showcase its
88

99
```@example tuto
1010
using DifferentiationInterface
11-
12-
import ForwardDiff, Enzyme # ⚠️ import the backends you want to use ⚠️
1311
```
1412

15-
!!! tip
16-
Importing backends with `import` instead of `using` avoids name conflicts and makes sure you are using operators from DifferentiationInterface.jl.
17-
This is useful since most backends also export operators like `gradient` and `jacobian`.
18-
19-
2013
## Computing a gradient
2114

2215
A common use case of automatic differentiation (AD) is optimizing real-valued functions with first- or second-order methods.
@@ -25,21 +18,26 @@ Let's define a simple objective and a random input vector
2518
```@example tuto
2619
f(x) = sum(abs2, x)
2720
28-
x = [1.0, 2.0, 3.0]
29-
nothing # hide
21+
x = collect(1.0:5.0)
3022
```
3123

32-
To compute its gradient, we need to choose a "backend", i.e. an AD package that DifferentiationInterface.jl will call under the hood.
24+
To compute its gradient, we need to choose a "backend", i.e. an AD package to call under the hood.
3325
Most backend types are defined by [ADTypes.jl](https://github.com/SciML/ADTypes.jl) and re-exported by DifferentiationInterface.jl.
3426

3527
[ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) is very generic and efficient for low-dimensional inputs, so it's a good starting point:
3628

3729
```@example tuto
30+
import ForwardDiff
31+
3832
backend = AutoForwardDiff()
3933
nothing # hide
4034
```
4135

42-
Now you can use DifferentiationInterface.jl to get the gradient:
36+
!!! tip
37+
To avoid name conflicts, load AD packages with `import` instead of `using`.
38+
Indeed, most AD packages also export operators like `gradient` and `jacobian`, but you only want to use the ones from DifferentiationInterface.jl.
39+
40+
Now you can use the following syntax to compute the gradient:
4341

4442
```@example tuto
4543
gradient(f, backend, x)
@@ -48,15 +46,10 @@ gradient(f, backend, x)
4846
Was that fast?
4947
[BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) helps you answer that question.
5048

51-
```@repl tuto
49+
```@example tuto
5250
using BenchmarkTools
53-
@btime gradient($f, $backend, $x);
54-
```
55-
56-
More or less what you would get if you just used the API from ForwardDiff.jl:
5751
58-
```@repl tuto
59-
@btime ForwardDiff.gradient($f, $x);
52+
@benchmark gradient($f, $backend, $x)
6053
```
6154

6255
Not bad, but you can do better.
@@ -69,19 +62,18 @@ Some backends get a speed boost from this trick.
6962
```@example tuto
7063
grad = similar(x)
7164
gradient!(f, grad, backend, x)
72-
7365
grad # has been mutated
7466
```
7567

7668
The bang indicates that one of the arguments of `gradient!` might be mutated.
7769
More precisely, our convention is that _every positional argument between the function and the backend is mutated (and the `extras` too, see below)_.
7870

79-
```@repl tuto
80-
@btime gradient!($f, _grad, $backend, $x) evals=1 setup=(_grad=similar($x));
71+
```@example tuto
72+
@benchmark gradient!($f, _grad, $backend, $x) evals=1 setup=(_grad=similar($x))
8173
```
8274

8375
For some reason the in-place version is not much better than your first attempt.
84-
However, it has one less allocation, which corresponds to the gradient vector you provided.
76+
However, it makes fewer allocations, thanks to the gradient vector you provided.
8577
Don't worry, you can get even more performance.
8678

8779
## Preparing for multiple gradients
@@ -100,31 +92,31 @@ You don't need to know what this object is, you just need to pass it to the grad
10092
```@example tuto
10193
grad = similar(x)
10294
gradient!(f, grad, backend, x, extras)
103-
10495
grad # has been mutated
10596
```
10697

10798
Preparation makes the gradient computation much faster, and (in this case) allocation-free.
10899

109-
```@repl tuto
110-
@btime gradient!($f, _grad, $backend, $x, _extras) evals=1 setup=(
100+
```@example tuto
101+
@benchmark gradient!($f, _grad, $backend, $x, _extras) evals=1 setup=(
111102
_grad=similar($x);
112103
_extras=prepare_gradient($f, $backend, $x)
113-
);
104+
)
114105
```
115106

116107
Beware that the `extras` object is nearly always mutated by differentiation operators, even though it is given as the last positional argument.
117108

118109
## Switching backends
119110

120111
The whole point of DifferentiationInterface.jl is that you can easily experiment with different AD solutions.
121-
Typically, for gradients, reverse mode AD might be a better fit.
122-
So let's try the state-of-the-art [Enzyme.jl](https://github.com/EnzymeAD/Enzyme.jl)!
112+
Typically, for gradients, reverse mode AD might be a better fit, so let's try [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl)!
123113

124-
For this one, the backend definition is slightly more involved, because you need to feed the "mode" to the object from ADTypes.jl:
114+
For this one, the backend definition is slightly more involved, because you can specify whether the tape needs to be compiled:
125115

126116
```@example tuto
127-
backend2 = AutoEnzyme(; mode=Enzyme.Reverse)
117+
import ReverseDiff
118+
119+
backend2 = AutoReverseDiff(; compile=true)
128120
nothing # hide
129121
```
130122

@@ -134,16 +126,15 @@ But once it is done, things run smoothly with exactly the same syntax:
134126
gradient(f, backend2, x)
135127
```
136128

137-
And you can run the same benchmarks:
129+
And you can run the same benchmarks to see what you gained (although such a small input may not be realistic):
138130

139-
```@repl tuto
140-
@btime gradient!($f, _grad, $backend2, $x, _extras) evals=1 setup=(
131+
```@example tuto
132+
@benchmark gradient!($f, _grad, $backend2, $x, _extras) evals=1 setup=(
141133
_grad=similar($x);
142134
_extras=prepare_gradient($f, $backend2, $x)
143-
);
135+
)
144136
```
145137

146-
Not only is it blazingly fast, you achieved this speedup without looking at the docs of either ForwardDiff.jl or Enzyme.jl!
147138
In short, DifferentiationInterface.jl allows for easy testing and comparison of AD backends.
148139
If you want to go further, check out the [DifferentiationInterfaceTest.jl tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterfaceTest/dev/tutorial/).
149140
It provides benchmarking utilities to compare backends and help you select the one that is best suited for your problem.

0 commit comments

Comments
 (0)