JuliaDiff · gdalle · Nov 15, 2025 · Nov 15, 2025
@@ -5,7 +5,18 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.11...main)
+## [Unreleased](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.12...main)
+
+## [0.7.12](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.11...DifferentiationInterface-v0.7.12)
+
+### Added
+
+- Better documentation on argument assumptions ([#917](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/917))
+
+### Fixed
+
+- Speed up Mooncake in forward mode by preallocating tangents ([#915](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/915))
+- Speed up Mooncake reverse mode with selective zeroing ([#916](https://github.com/JuliaDiff/DifferentiationInterface.jl/pull/916))  
 
 ## [0.7.11](https://github.com/JuliaDiff/DifferentiationInterface.jl/compare/DifferentiationInterface-v0.7.10...DifferentiationInterface-v0.7.11)
 

@@ -1,7 +1,7 @@
 name = "DifferentiationInterface"
 uuid = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
 authors = ["Guillaume Dalle", "Adrian Hill"]
-version = "0.7.11"
+version = "0.7.12"
 
 [deps]
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
@@ -74,7 +74,7 @@ PolyesterForwardDiff = "0.1.2"
 ReverseDiff = "1.15.1"
 SparseArrays = "1"
 SparseConnectivityTracer = "0.6.14, 1"
-SparseMatrixColorings = "0.4.9"
+SparseMatrixColorings = "0.4.23"
 StaticArrays = "1.9.7"
 Symbolics = "5.27.1, 6, 7"
 Tracker = "0.2.33"

@@ -27,13 +27,14 @@ makedocs(;
     pages = [
         "Home" => "index.md",
         "Tutorials" => ["tutorials/basic.md", "tutorials/advanced.md"],
+        "api.md",
         "Explanation" => [
+            "explanation/arguments.md",
             "explanation/operators.md",
             "explanation/backends.md",
             "explanation/advanced.md",
         ],
         "FAQ" => ["faq/limitations.md", "faq/differentiability.md"],
-        "api.md",
         "Development" => [
             "dev/internals.md",
             "dev/math.md",

@@ -1,44 +1,12 @@
 # Advanced features
 
-## Contexts
-
-### Additional arguments
-
-For all operators provided DifferentiationInterface, there can be only one differentiated (or "active") argument, which we call `x`.
-However, the release v0.6 introduced the possibility of additional "context" arguments, which are not differentiated but still passed to the function after `x`.
-
-Contexts can be useful if you have a function `y = f(x, a, b, c, ...)` or `f!(y, x, a, b, c, ...)` and you want derivatives of `y` with respect to `x` only.
-Another option would be creating a closure, but that is sometimes undesirable.
-
-### Types of contexts
-
-Every context argument must be wrapped in a subtype of [`Context`](@ref) and come after the differentiated input `x`.
-Right now, there are two kinds of context: [`Constant`](@ref) and [`Cache`](@ref).
-
-!!! warning
-
-    Not every backend supports every type of context. See the documentation on [Backends](@ref) for more details.
-
-Semantically, both of these calls compute the partial gradient of `f(x, c)` with respect to `x`, but they consider `c` differently:
-
-```julia
-gradient(f, backend, x, Constant(c))
-gradient(f, backend, x, Cache(c))
-```
-
-In the first call, `c` is kept unchanged throughout the function evaluation.
-In the second call, `c` can be mutated with values computed during the function.
-
-Importantly, one can prepare an operator with an arbitrary value `c'` of the `Constant` (subject to the usual restrictions on preparation).
-The values in a provided `Cache` never matter anyway.
-
 ## Sparsity
 
 When faced with sparse Jacobian or Hessian matrices, one can take advantage of their sparsity pattern to speed up the computation.
 DifferentiationInterface does this automatically if you pass a backend of type [`AutoSparse`](@extref ADTypes.AutoSparse).
 
 !!! tip
-    
+
     To know more about sparse AD, read the survey [_What Color Is Your Jacobian? Graph Coloring for Computing Derivatives_](https://epubs.siam.org/doi/10.1137/S0036144504444711) (Gebremedhin et al., 2005).
 
 ### `AutoSparse` object
@@ -48,29 +16,32 @@ An `AutoSparse` backend must be constructed from three ingredients:
 
  1. An underlying (dense) backend, which can be [`SecondOrder`](@ref) or anything from [ADTypes.jl](https://github.com/SciML/ADTypes.jl)
 
- 2. A sparsity pattern detector like:
-    
+ 2. A sparsity pattern detector following the [`ADTypes.AbstractSparsityDetector`](@extref ADTypes.AbstractSparsityDetector) interface, such as:
+
       + [`TracerSparsityDetector`](@extref SparseConnectivityTracer.TracerSparsityDetector) from [SparseConnectivityTracer.jl](https://github.com/adrhill/SparseConnectivityTracer.jl)
       + [`SymbolicsSparsityDetector`](@extref Symbolics.SymbolicsSparsityDetector) from [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl)
       + [`DenseSparsityDetector`](@ref) from DifferentiationInterface.jl (beware that this detector only gives a locally valid pattern)
       + [`KnownJacobianSparsityDetector`](@extref ADTypes.KnownJacobianSparsityDetector) or [`KnownHessianSparsityDetector`](@extref ADTypes.KnownHessianSparsityDetector) from [ADTypes.jl](https://github.com/SciML/ADTypes.jl) (if you already know the pattern)
- 3. A coloring algorithm from [SparseMatrixColorings.jl](https://github.com/gdalle/SparseMatrixColorings.jl), such as:
-
-      + [`GreedyColoringAlgorithm`](@extref SparseMatrixColorings.GreedyColoringAlgorithm) (our generic recommendation)
+
+ 3. A coloring algorithm following the [`ADTypes.AbstractColoringAlgorithm`](@extref ADTypes.AbstractColoringAlgorithm) interface, such as those from [SparseMatrixColorings.jl](https://github.com/gdalle/SparseMatrixColorings.jl):
+
+      + [`GreedyColoringAlgorithm`](@extref SparseMatrixColorings.GreedyColoringAlgorithm) (our generic recommendation, don't forget to tune the `order` parameter)
       + [`ConstantColoringAlgorithm`](@extref SparseMatrixColorings.ConstantColoringAlgorithm) (if you have already computed the optimal coloring and always want to return it)
+      + [`OptimalColoringAlgorithm`](@extref SparseMatrixColorings.OptimalColoringAlgorithm) (if you have a low-dimensional matrix for which you want to know the best possible coloring)
 
 !!! note
-    
+
     Symbolic backends have built-in sparsity handling, so `AutoSparse(AutoSymbolics())` and `AutoSparse(AutoFastDifferentiation())` do not need additional configuration for pattern detection or coloring.
 
-### Cost of sparse preparation
+### Reusing sparse preparation
 
 The preparation step of `jacobian` or `hessian` with an `AutoSparse` backend can be long, because it needs to detect the sparsity pattern and perform a matrix coloring.
 But after preparation, the more zeros are present in the matrix, the greater the speedup will be compared to dense differentiation.
 
 !!! danger
-    
+
     The result of preparation for an `AutoSparse` backend cannot be reused if the sparsity pattern changes.
+    In particular, during preparation, make sure to pick input and context values that do not give rise to exceptional patterns (e.g. with too many zeros because of a multiplication with a constant `c = 0`, which may then be non-zero later on). Random values are usually a better choice during sparse preparation.
 
 ### Tuning the coloring algorithm
 

@@ -0,0 +1,71 @@
+# Arguments
+
+## General guidelines
+
+### Function form
+
+DifferentiationInterface only computes derivatives for functions with one of two specific forms:
+
+```julia
+y = f(x, contexts...)  # out of place, returns `y`
+f!(y, x, contexts...)  # in place, returns `nothing`
+```
+
+In this notation:
+
+- `f` (or `f!`) is the differentiated function
+- `y` is the output
+- `x` is the input, the only "active" argument, which always comes first
+- `contexts` may contain additional, inactive arguments
+
+The quantities returned by the various [operators](@ref "Operators") always correspond to (partial) derivatives of `y` with respect to `x`.
+
+### Assumptions
+
+The package makes one central assumption on the behavior and implementation of `f` (or `f!`):
+
+!!! danger "Mutation rule"
+    Either an argument's provided value matters, or it can be mutated during the function call, but never both.
+
+This rule is declined as follows:
+
+- The provided value of `x` matters because we evaluate and differentiate `f` at point `x`. Therefore, `x` cannot be mutated by the function.
+- For in-place functions `f!`, the output `y` is meant to be overwritten. Hence, its provided (initial) value cannot matter, and it must be entirely overwritten.
+
+!!! warning
+    Whether or not the function object itself can be mutated is a tricky question, and support for this varies between backends.
+    When in doubt, try to avoid mutating functions and pass contexts instead.
+    In any case, DifferentiationInterface will assume that the recursive components (fields, subfields, etc.) of `f` or `f!` individually satisfy the same mutation rule: whenever the initial value matters, no mutation is allowed.
+
+## Contexts
+
+### Motivation
+
+As stated, there can be only one active argument, which we call `x`.
+However, version 0.6 of the package introduced the possibility of additional "context" arguments, whose derivatives we don't need to compute.
+Contexts can be useful if you have a function `y = f(x, a, b, c, ...)` or `f!(y, x, a, b, c, ...)` and you only want the derivative of `y` with respect to `x`.
+Another option would be creating a closure, but that is sometimes undesirable for performance reasons.
+
+Every context argument must be wrapped in a subtype of [`Context`](@ref) and come after the active argument `x`.
+
+### Context types
+
+There are three kinds of context: [`Constant`](@ref), [`Cache`](@ref) and the hybrid [`ConstantOrCache`](@ref).
+Those are also classified based on the mutation rule:
+
+- [`Constant`](@ref) contexts wrap data that influences the output of the function. Hence they cannot be mutated.
+- [`Cache`](@ref) contexts correspond to scratch spaces that can be mutated at will. Hence their provided value is arbitrary.
+- [`ConstantOrCache`](@ref) is a hybrid, whose recursive components (fields, subfields, etc.) must individually satisfy the assumptions of either `Constant` or `Cache`.
+
+Semantically, both of these calls compute the partial gradient of `f(x, c)` with respect to `x`, but they consider `c` differently:
+
+```julia
+gradient(f, backend, x, Constant(c))
+gradient(f, backend, x, Cache(c))
+```
+
+In the first call, `c` must be kept unchanged throughout the function evaluation.
+In the second call, `c` may be mutated with values computed during the function.
+
+!!! warning
+    Not every backend supports every type of context. See the documentation on [backends](@ref "Backends") for more details.
@@ -4,33 +4,33 @@
 
 We support the following dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl):
 
-  - [`AutoChainRules`](@extref ADTypes.AutoChainRules)
-  - [`AutoDiffractor`](@extref ADTypes.AutoDiffractor)
-  - [`AutoEnzyme`](@extref ADTypes.AutoEnzyme)
-  - [`AutoFastDifferentiation`](@extref ADTypes.AutoFastDifferentiation)
-  - [`AutoFiniteDiff`](@extref ADTypes.AutoFiniteDiff)
-  - [`AutoFiniteDifferences`](@extref ADTypes.AutoFiniteDifferences)
-  - [`AutoForwardDiff`](@extref ADTypes.AutoForwardDiff)
-  - [`AutoGTPSA`](@extref ADTypes.AutoGTPSA)
-  - [`AutoMooncake`](@extref ADTypes.AutoMooncake) and [`AutoMooncakeForward`](@extref ADTypes.AutoMooncake) (the latter is experimental)
-  - [`AutoPolyesterForwardDiff`](@extref ADTypes.AutoPolyesterForwardDiff)
-  - [`AutoReverseDiff`](@extref ADTypes.AutoReverseDiff)
-  - [`AutoSymbolics`](@extref ADTypes.AutoSymbolics)
-  - [`AutoTracker`](@extref ADTypes.AutoTracker)
-  - [`AutoZygote`](@extref ADTypes.AutoZygote)
+- [`AutoChainRules`](@extref ADTypes.AutoChainRules)
+- [`AutoDiffractor`](@extref ADTypes.AutoDiffractor)
+- [`AutoEnzyme`](@extref ADTypes.AutoEnzyme)
+- [`AutoFastDifferentiation`](@extref ADTypes.AutoFastDifferentiation)
+- [`AutoFiniteDiff`](@extref ADTypes.AutoFiniteDiff)
+- [`AutoFiniteDifferences`](@extref ADTypes.AutoFiniteDifferences)
+- [`AutoForwardDiff`](@extref ADTypes.AutoForwardDiff)
+- [`AutoGTPSA`](@extref ADTypes.AutoGTPSA)
+- [`AutoMooncake`](@extref ADTypes.AutoMooncake) and [`AutoMooncakeForward`](@extref ADTypes.AutoMooncake) (the latter is experimental)
+- [`AutoPolyesterForwardDiff`](@extref ADTypes.AutoPolyesterForwardDiff)
+- [`AutoReverseDiff`](@extref ADTypes.AutoReverseDiff)
+- [`AutoSymbolics`](@extref ADTypes.AutoSymbolics)
+- [`AutoTracker`](@extref ADTypes.AutoTracker)
+- [`AutoZygote`](@extref ADTypes.AutoZygote)
 
 ## Features
 
 Given a backend object, you can use:
 
-  - [`check_available`](@ref) to know whether the required AD package is loaded
-  - [`check_inplace`](@ref) to know whether the backend supports in-place functions (all backends support out-of-place functions)
+- [`check_available`](@ref) to know whether the required AD package is loaded
+- [`check_inplace`](@ref) to know whether the backend supports in-place functions (all backends support out-of-place functions)
 
 In theory, all we need from each backend is either a `pushforward` or a `pullback`: we can deduce every other operator from these two.
 In practice, many AD backends have custom implementations for high-level operators like `gradient` or `jacobian`, which we reuse whenever possible.
 
 !!! details
-    
+
     In the rough summary table below,
 
       - ✅ means that we reuse the custom implementation from the backend;
@@ -90,7 +90,7 @@ The inner backend will be called first, and the outer backend will differentiate
 In general, using a forward outer backend over a reverse inner backend will yield the best performance.
 
 !!! danger
-    
+
     Second-order AD is tricky, and many backend combinations will fail (even if you combine a backend with itself).
     Be ready to experiment and open issues if necessary.
 
@@ -99,6 +99,7 @@ In general, using a forward outer backend over a reverse inner backend will yiel
 The wrapper [`DifferentiateWith`](@ref) allows you to switch between backends.
 It takes a function `f` and specifies that `f` should be differentiated with the substitute backend of your choice, instead of whatever true backend the surrounding code is trying to use.
 In other words, when someone tries to differentiate `dw = DifferentiateWith(f, substitute_backend)` with `true_backend`, then `substitute_backend` steps in and `true_backend` does not dive into the function `f` itself.
+
 At the moment, `DifferentiateWith` only works when `true_backend` is either [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl), reverse-mode [Mooncake.jl](https://github.com/chalk-lab/Mooncake.jl), or a [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl)-compatible backend (e.g., [Zygote.jl](https://github.com/FluxML/Zygote.jl)).
 
 ## Implementations
@@ -117,7 +118,7 @@ Same-point preparation runs the forward sweep and returns the pullback closure.
 We only implement `pushforward`.
 
 !!! danger
-    
+
     The latest releases of Diffractor [broke DifferentiationInterface](https://github.com/JuliaDiff/Diffractor.jl/issues/290).
 
 ### Enzyme
@@ -126,7 +127,7 @@ Depending on the `mode` attribute inside [`AutoEnzyme`](@extref ADTypes.AutoEnzy
 When necessary, preparation chooses a number of chunks (for `gradient` and `jacobian` in forward mode, for `jacobian` only in reverse mode).
 
 !!! warning
-    
+
     Enzyme.jl's handling of activities and multiple arguments is not fully supported here, which can cause slowdowns or errors.
     If differentiation fails or takes too long, consider using Enzyme.jl through its [native API](https://enzymead.github.io/Enzyme.jl/stable/) instead.
 
@@ -135,7 +136,7 @@ When necessary, preparation chooses a number of chunks (for `gradient` and `jaco
 For every operator, preparation generates an [executable function](https://brianguenter.github.io/FastDifferentiation.jl/stable/makefunction/) from the symbolic expression of the differentiated function.
 
 !!! warning
-    
+
     Preparation can be very slow for symbolic AD.
 
 ### FiniteDiff
@@ -159,7 +160,7 @@ For all operators, preparation preallocates the input [`TPS`s](https://bmad-sim.
 If a GTPSA [`Descriptor`](https://bmad-sim.github.io/GTPSA.jl/stable/man/b_descriptor/) is not provided to `AutoGTPSA`, then a `Descriptor` will be generated in preparation based on the context.
 
 !!! danger
-    
+
     When providing a custom GTPSA `Descriptor` to `AutoGTPSA`, it is the responsibility of the user to ensure that the number of [GTPSA "variables"](https://bmad-sim.github.io/GTPSA.jl/stable/quickstart/#Calculating-a-Truncated-Power-Series) specified in the `Descriptor` is consistent with the number of inputs of the provided function. Undefined behavior and crashes may occur if this is not the case.
 
 ### PolyesterForwardDiff
@@ -175,7 +176,7 @@ This tape is computed from the input `x` provided at preparation time.
 It is control-flow dependent, so only one branch is recorded at each `if` statement.
 
 !!! danger
-    
+
     If your function has value-specific control flow (like `if x[1] > 0` or `if c == 1`), you may get silently wrong results whenever it takes new branches that were not taken during preparation.
     You must make sure to run preparation with an input and contexts whose values trigger the correct control flow for future executions.
 
@@ -186,7 +187,7 @@ Whenever contexts are provided, tape recording is deactivated in all cases, beca
 For all operators, preparation generates an [executable function](https://docs.sciml.ai/Symbolics/stable/manual/build_function/) from the symbolic expression of the differentiated function.
 
 !!! warning
-    
+
     Preparation can be very slow for symbolic AD.
 
 ### Mooncake