JuliaDiff
diff --git a/‎DifferentiationInterface/Project.toml‎
Lines changed: 1 addition & 1 deletion b/‎DifferentiationInterface/Project.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎DifferentiationInterface/README.md‎
Lines changed: 4 additions & 4 deletions b/‎DifferentiationInterface/README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎DifferentiationInterface/docs/src/api.md‎
Lines changed: 58 additions & 24 deletions b/‎DifferentiationInterface/docs/src/api.md‎
Lines changed: 58 additions & 24 deletions
diff --git a/‎DifferentiationInterface/docs/src/backends.md‎
Lines changed: 2 additions & 7 deletions b/‎DifferentiationInterface/docs/src/backends.md‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎DifferentiationInterface/docs/src/overview.md‎
Lines changed: 84 additions & 77 deletions b/‎DifferentiationInterface/docs/src/overview.md‎
Lines changed: 84 additions & 77 deletions
@@ -1,7 +1,7 @@
 name = "DifferentiationInterface"
 uuid = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
 authors = ["Guillaume Dalle", "Adrian Hill"]
-version = "0.1.0"
+version = "0.2.0"
 
 [deps]
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
 
@@ -15,15 +15,15 @@ An interface to various automatic differentiation (AD) backends in Julia.
 
 This package provides a backend-agnostic syntax to differentiate functions of the following types:
 
-- _allocating_: `f(x) = y`
-- _mutating_: `f!(y, x) = nothing`
+- _one-argument functions_ (allocating): `f(x) = y`
+- _two-argument functions_ (mutating): `f!(y, x) = nothing`
 
 ## Features
 
-- First and second order operators
+- First- and second-order operators
 - In-place and out-of-place differentiation
 - Preparation mechanism (e.g. to create a config or tape)
-- Thorough validation on standard inputs and outputs (scalars, vectors, matrices)
+- Thorough validation on standard inputs and outputs (numbers, vectors, matrices)
 - Testing and benchmarking utilities accessible to users with [DifferentiationInterfaceTest](https://github.com/gdalle/DifferentiationInterface.jl/tree/main/DifferentiationInterfaceTest)
 
 ## Compatibility
 
@@ -11,50 +11,84 @@ DifferentiationInterface
 
 ## Derivative
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["src/derivative.jl"]
-Private = false
+```@docs
+prepare_derivative
+derivative
+derivative!
+value_and_derivative
+value_and_derivative!
 ```
 
 ## Gradient
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["gradient.jl"]
-Private = false
+```@docs
+prepare_gradient
+gradient
+gradient!
+value_and_gradient
+value_and_gradient!
 ```
 
 ## Jacobian
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["jacobian.jl"]
-Private = false
+```@docs
+prepare_jacobian
+jacobian
+jacobian!
+value_and_jacobian
+value_and_jacobian!
 ```
 
 ## Second order
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["second_order.jl", "second_derivative.jl", "hessian.jl", "hvp.jl"]
-Private = false
+```@docs
+SecondOrder
+```
+
+```@docs
+prepare_second_derivative
+second_derivative
+second_derivative!
+```
+
+```@docs
+prepare_hvp
+hvp
+hvp!
+```
+
+```@docs
+prepare_hessian
+hessian
+hessian!
 ```
 
 ## Primitives
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["pushforward.jl", "pullback.jl"]
-Private = false
+```@docs
+prepare_pushforward
+pushforward
+pushforward!
+value_and_pushforward
+value_and_pushforward!
+```
+
+```@docs
+prepare_pullback
+pullback
+pullback!
+value_and_pullback
+value_and_pullback!
+value_and_pullback_split
+value_and_pullback!_split
 ```
 
 ## Backend queries
 
-```@autodocs
-Modules = [DifferentiationInterface]
-Pages = ["backends.jl"]
-Private = false
+```@docs
+check_available
+check_mutation
+check_hessian
 ```
 
 ## Internals
 
@@ -88,8 +88,8 @@ Markdown.parse(join(vcat(header, subheader, rows...), "\n"))  # hide
 
 ## Mutation support
 
-All backends are compatible with allocating functions `f(x) = y`.
-Only some are compatible with mutating functions `f!(y, x) = nothing`.
+All backends are compatible with one-argument functions `f(x) = y`.
+Only some are compatible with two-argument functions `f!(y, x) = nothing`.
 You can use [`check_mutation`](@ref) to check that feature, like we did below:
 
 ```@example backends
@@ -114,8 +114,3 @@ rows = map(all_backends()) do backend  # hide
 end  # hide
 Markdown.parse(join(vcat(header, subheader, rows...), "\n"))  # hide
 ```
-
-!!! warning
-    Second-order operators can also be used with a combination of backends inside the [`SecondOrder`](@ref) struct.
-    There are many possible combinations, a lot of which will fail.
-    Due to compilation overhead, we do not currently test them all to display the working ones in the documentation, but we might if users deem it relevant.
@@ -3,20 +3,26 @@
 ## Operators
 
 Depending on the type of input and output, differentiation operators can have various names.
-Most backends have custom implementations, which we reuse if possible.
 
-We choose the following terminology for the high-level operators we provide:
+We provide the following high-level operators:
 
-| operator             | input  `x`      | output   `y`                | result type      | result shape             |
-| :------------------- | :-------------- | :-------------------------- | :--------------- | :----------------------- |
-| [`derivative`](@ref) | `Number`        | `Number` or `AbstractArray` | same as `y`      | `size(y)`                |
-| [`gradient`](@ref)   | `AbstractArray` | `Number`                    | same as `x`      | `size(x)`                |
-| [`jacobian`](@ref)   | `AbstractArray` | `AbstractArray`             | `AbstractMatrix` | `(length(y), length(x))` |
+| operator                    | order | input  `x`      | output   `y`                | result type      | result shape             |
+| :-------------------------- | :---- | :-------------- | :-------------------------- | :--------------- | :----------------------- |
+| [`derivative`](@ref)        | 1     | `Number`        | `Number` or `AbstractArray` | same as `y`      | `size(y)`                |
+| [`second_derivative`](@ref) | 2     | `Number`        | `Number` or `AbstractArray` | same as `y`      | `size(y)`                |
+| [`gradient`](@ref)          | 1     | `AbstractArray` | `Number`                    | same as `x`      | `size(x)`                |
+| [`hvp`](@ref)               | 2     | `AbstractArray` | `Number`                    | same as `x`      | `size(x)`                |
+| [`hessian`](@ref)           | 2     | `AbstractArray` | `Number`                    | `AbstractMatrix` | `(length(x), length(x))` |
+| [`jacobian`](@ref)          | 1     | `AbstractArray` | `AbstractArray`             | `AbstractMatrix` | `(length(y), length(x))` |
 
-They are all based on the following low-level operators:
+They can all be derived from two low-level operators:
 
-- [`pushforward`](@ref) (or JVP), to propagate input tangents
-- [`pullback`](@ref) (or VJP), to backpropagate output cotangents
+| operator                       | order | input  `x` | output   `y` | result type | result shape |
+| :----------------------------- | :---- | :--------- | :----------- | :---------- | :----------- |
+| [`pushforward`](@ref) (or JVP) | 1     | `Any`      | `Any`        | same as `y` | `size(y)`    |
+| [`pullback`](@ref) (or VJP)    | 1     | `Any`      | `Any`        | same as `x` | `size(x)`    |
+
+Luckily, most backends have custom implementations, which we reuse if possible instead of relying on fallbacks.
 
 !!! tip
     See the book [The Elements of Differentiable Programming](https://arxiv.org/abs/2403.14606) for details on these concepts.
@@ -25,59 +31,33 @@ They are all based on the following low-level operators:
 
 Several variants of each operator are defined:
 
-| out-of-place          | in-place (or not)       | out-of-place + primal           | in-place (or not) + primal        |
-| :-------------------- | :---------------------- | :------------------------------ | :-------------------------------- |
-| [`derivative`](@ref)  | [`derivative!!`](@ref)  | [`value_and_derivative`](@ref)  | [`value_and_derivative!!`](@ref)  |
-| [`gradient`](@ref)    | [`gradient!!`](@ref)    | [`value_and_gradient`](@ref)    | [`value_and_gradient!!`](@ref)    |
-| [`jacobian`](@ref)    | [`jacobian!!`](@ref)    | [`value_and_jacobian`](@ref)    | [`value_and_jacobian!!`](@ref)    |
-| [`pushforward`](@ref) | [`pushforward!!`](@ref) | [`value_and_pushforward`](@ref) | [`value_and_pushforward!!`](@ref) |
-| [`pullback`](@ref)    | [`pullback!!`](@ref)    | [`value_and_pullback`](@ref)    | [`value_and_pullback!!`](@ref)    |
+| out-of-place                | in-place                     | out-of-place + primal           | in-place + primal                |
+| :-------------------------- | :--------------------------- | :------------------------------ | :------------------------------- |
+| [`derivative`](@ref)        | [`derivative!`](@ref)        | [`value_and_derivative`](@ref)  | [`value_and_derivative!`](@ref)  |
+| [`second_derivative`](@ref) | [`second_derivative!`](@ref) | NA                              | NA                               |
+| [`gradient`](@ref)          | [`gradient!`](@ref)          | [`value_and_gradient`](@ref)    | [`value_and_gradient!`](@ref)    |
+| [`hvp`](@ref)               | [`hvp!`](@ref)               | NA                              | NA                               |
+| [`hessian`](@ref)           | [`hessian!`](@ref)           | NA                              | NA                               |
+| [`jacobian`](@ref)          | [`jacobian!`](@ref)          | [`value_and_jacobian`](@ref)    | [`value_and_jacobian!`](@ref)    |
+| [`pushforward`](@ref)       | [`pushforward!`](@ref)       | [`value_and_pushforward`](@ref) | [`value_and_pushforward!`](@ref) |
+| [`pullback`](@ref)          | [`pullback!`](@ref)          | [`value_and_pullback`](@ref)    | [`value_and_pullback!`](@ref)    |
 
-!!! warning
-    We use the syntactic convention `!!` to signal that some of the arguments _can_ be mutated, but they do not _have to be_.
-    Such arguments will always be part of the return, so that one can simply reuse the operator's output and forget its input.
-    In other words, this is good:
-    ```julia
-    # work with grad_in
-    grad_out = gradient!!(f, grad_in, backend, x)
-    # work with grad_out: OK
-    ```
-    On the other hand, this is bad, because if `grad_in` has not been mutated, you will forget the results:
-    ```julia
-    # work with grad_in
-    gradient!!(f, grad_in, backend, x)
-    # mistakenly keep working with grad_in: NOT OK
-    ```
-    Note that we don't guarantee `grad_out` will have the same type as `grad_in`.
-    Its type can even depend on the choice of backend.
-
-## Second order
-
-Second-order differentiation is also supported.
-You can either pick a single backend to do all the work, or combine an "outer" backend with an "inner" backend using the [`SecondOrder`](@ref) struct, like so: `SecondOrder(outer, inner)`.
-
-The available operators are similar to first-order ones:
-
-| operator                    | input  `x`      | output   `y`                | result type      | result shape             |
-| :-------------------------- | :-------------- | :-------------------------- | :--------------- | :----------------------- |
-| [`second_derivative`](@ref) | `Number`        | `Number` or `AbstractArray` | same as `y`      | `size(y)`                |
-| [`hvp`](@ref)               | `AbstractArray` | `Number`                    | same as `x`      | `size(x)`                |
-| [`hessian`](@ref)           | `AbstractArray` | `Number`                    | `AbstractMatrix` | `(length(x), length(x))` |
-
-We only define two variants for now:
-
-| out-of-place                | in-place (or not)             |
-| :-------------------------- | :---------------------------- |
-| [`second_derivative`](@ref) | [`second_derivative!!`](@ref) |
-| [`hvp`](@ref)               | [`hvp!!`](@ref)               |
-| [`hessian`](@ref)           | [`hessian!!`](@ref)           |
+## Mutation and signatures
 
-!!! danger
-    Second-order differentiation is still experimental, use at your own risk.
+In order to ensure symmetry between one-argument functions `f(x) = y` and two-argument functions `f!(y, x) = nothing`, we define the same operators for both cases.
+However they have different signatures:
+
+| signature  | out-of-place                       | in-place                                 |
+| :--------- | :--------------------------------- | :--------------------------------------- |
+| `f(x)`     | `operator(f,     backend, x, ...)` | `operator!(f,     res, backend, x, ...)` |
+| `f!(y, x)` | `operator(f!, y, backend, x, ...)` | `operator!(f!, y, res, backend, x, ...)` |
+
+!!! warning
+    Every variant of the operator will mutate `y` when applied to a two-argument function `f!(y, x) = nothing`, even if it does not have a `!` in its name.
 
 ## Preparation
 
-In many cases, AD can be accelerated if the function has been run at least once (e.g. to record a tape) and if some cache objects are provided.
+In many cases, AD can be accelerated if the function has been run at least once (e.g. to create a config or record a tape) and if some cache objects are provided.
 This is a backend-specific procedure, but we expose a common syntax to achieve it.
 
 | operator            | preparation function                |
@@ -91,42 +71,69 @@ This is a backend-specific procedure, but we expose a common syntax to achieve i
 | `pullback`          | [`prepare_pullback`](@ref)          |
 | `hvp`               | [`prepare_hvp`](@ref)               |
 
-If you run `prepare_operator(backend, f, x)`, it will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
-This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x`, but it should work with different _values_ of `x`.
+If you run `prepare_operator(backend, f, x, [seed])`, it will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
+This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x` and the _control flow_ within the function, but it should work with different _values_ of `x`.
 
 You can then call `operator(backend, f, x2, extras)`, which should be faster than `operator(f, backend, x2)`.
 This is especially worth it if you plan to call `operator` several times in similar settings: you can think of it as a warm up.
 
 !!! warning
-    For `SecondOrder` backends, the inner differentiation cannot be prepared at the moment, only the outer one is.
+    The `extras` object is nearly always mutated, even if the operator does not have a `!` in its name.
 
-## FAQ
+### Second order
 
-### Multiple inputs/outputs
+We offer two ways to perform second-order differentiation (for [`second_derivative`](@ref), [`hvp`](@ref) and [`hessian`](@ref)):
 
-Restricting the API to one input and one output has many coding advantages, but it is not very flexible.
-If you need more than that, use [ComponentArrays.jl](https://github.com/jonniedie/ComponentArrays.jl) to wrap several objects inside a single `ComponentVector`.
+- pick a single backend to do all the work
+- combine an "outer" and "inner" backend within the [`SecondOrder`](@ref) struct: the inner backend will be called first, and the outer backend will differentiate the generated code
+
+!!! warning
+    There are many possible backend combinations, a lot of which will fail.
+    At the moment, trial and error is your best friend.
+    Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.
+
+## Experimental
+
+!!! danger
+    Everything in this section is still experimental, use it at your own risk.
 
 ### Sparsity
 
-If you need to work with sparse Jacobians, you can pick one of the [sparse backends](@ref Sparse) from [ADTypes.jl](https://github.com/SciML/ADTypes.jl).
-The sparsity pattern is computed automatically with [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl) during the preparation step.
+[ADTypes.jl](https://github.com/SciML/ADTypes.jl) provides [sparse versions](@ref Sparse) of many common AD backends.
+They can accelerate the computation of sparse Jacobians and Hessians:
 
-If you need to work with sparse Hessians, you can use a sparse backend as the _outer_ backend of a `SecondOrder`.
-This means the Hessian is obtained as the sparse Jacobian of the gradient.
+- for sparse Jacobians, just select one of them as your first-order backend.
+- for sparse Hessians, select one of them as the _outer part_ of a [`SecondOrder`](@ref) backend (in that case, the Hessian is obtained as the sparse Jacobian of the gradient).
 
-!!! danger
-    Sparsity support is still experimental, use at your own risk.
+The sparsity pattern is computed automatically with [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl) during the preparation step.
+
+!!! info "Planned feature"
+    Modular sparsity pattern computation, with other algorithms beyond those from Symbolics.jl.
 
 ### Split reverse mode
 
 Some reverse mode AD backends expose a "split" option, which runs only the forward sweep, and encapsulates the reverse sweep in a closure.
 We make this available for all backends with the following operators:
 
-|                      | out-of-place                       | in-place (or not)                      |
-| :------------------- | :--------------------------------- | :------------------------------------- |
-| allocating functions | [`value_and_pullback_split`](@ref) | [`value_and_pullback!!_split`](@ref)   |
-| mutating functions   | -                                  | [`value_and_pullback!!_split!!`](@ref) |
+| out-of-place                       | in-place                            |
+| :--------------------------------- | :---------------------------------- |
+| [`value_and_pullback_split`](@ref) | [`value_and_pullback!_split`](@ref) |
 
-!!! danger
-    Split reverse mode is still experimental, use at your own risk.
+## Not supported
+
+### Batched evaluation
+
+!!! info "Planned feature"
+    Interface for providing several pushforward / pullback seeds at once, similar to the chunking in ForwardDiff.jl or the batches in Enzyme.jl.
+
+### Non-standard types
+
+The package is thoroughly tested with inputs and outputs of the following types: `Float64`, `Vector{Float64}` and `Matrix{Float64}`.
+We also expect it to work on all kinds of `Number` and `AbstractArray` variables.
+Beyond that, you are in uncharted territory.
+We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs _might work with some backends_, but we make no guarantees about that.
+
+### Multiple inputs/outputs
+
+Restricting the API to one input and one output has many coding advantages, but it is not very flexible.
+If you need more than that, use [ComponentArrays.jl](https://github.com/jonniedie/ComponentArrays.jl) to wrap several objects inside a single `ComponentVector`.