docs: switch second_order_adjoints Phase 1 and brusselator to Mooncake

ChrisRackauckas · ChrisRackauckas · commit f8deaafce257 · 2026-04-11T21:53:45.000-04:00
- second_order_adjoints.md: Phase 1 (Adam) now uses AutoMooncake, Phase 2 (NewtonTrustRegion) stays on AutoZygote (Hessian via SecondOrder(ForwardDiff, Zygote)) pending forward-over-Mooncake support (chalk-lab/Mooncake.jl#1142). Split into two OptimizationFunctions to avoid applying the wrong backend to Phase 2. - brusselator.md: switch AutoZygote → AutoMooncake with friendly_tangents. Tested locally with N_GRID=8 and shortened tspan — Mooncake gradient chain works end-to-end (loss decreasing from 0.131 to 0.059 in 3 Adam steps). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
diff --git a/docs/src/examples/ode/second_order_adjoints.md b/docs/src/examples/ode/second_order_adjoints.md
@@ -15,14 +15,18 @@ optimizations.
 
 !!! note
     
-    This example still uses Zygote because `NewtonTrustRegion` needs a true
-    Hessian, and Mooncake does not yet have a forward-over-Mooncake path that
-    Optimization.jl can use to assemble one (the auto-fallback to
-    `SecondOrder(AutoMooncake(), AutoMooncake())` raises `ArgumentError`).
-    The Adam phase below works fine with `OPT.AutoMooncake(; config = Mooncake.Config(; friendly_tangents = true))`
-    if you only want first-order training; the full Newton/NewtonTrustRegion
-    pipeline will become Mooncake-compatible once forward-mode Mooncake
-    matures.
+    The Adam (first-order) phase below uses Mooncake. The
+    `NewtonTrustRegion` (second-order) phase still uses Zygote because
+    Mooncake currently has no working forward-over-reverse path through
+    `SciMLSensitivity` + `OrdinaryDiffEq`: `SecondOrder(AutoMooncake(),
+    AutoMooncake())` raises a "reverse-over-reverse not supported" error
+    and `SecondOrder(AutoForwardDiff(), AutoMooncake())` is blocked on
+    Mooncake's `IEEEFloat`-only gradient interface (it rejects
+    `ForwardDiff.Dual` as the primal type). Tracking issue:
+    [chalk-lab/Mooncake.jl#1142](https://github.com/chalk-lab/Mooncake.jl/pull/1142)
+    is the first step in unblocking this. Once forward-over-Mooncake is
+    available end-to-end, this tutorial can be switched to Mooncake for
+    both phases.
 
 ```@example secondorderadjoints
 import SciMLSensitivity as SMS
@@ -34,6 +38,7 @@ import OrdinaryDiffEq as ODE
 import Plots
 import Random
 import OptimizationOptimJL as OOJ
+import Mooncake
 
 u0 = Float32[2.0; 0.0]
 datasize = 30
@@ -94,13 +99,20 @@ callback = function (state, l; doplot = false)
     return l < 0.01
 end
 
-adtype = OPT.AutoZygote()
-optf = OPT.OptimizationFunction((x, p) -> loss_neuralode(x), adtype)
-
-optprob1 = OPT.OptimizationProblem(optf, ps)
+# First-order training: Mooncake gives the gradient through the
+# `SciMLSensitivity` adjoint chain.
+adtype1 = OPT.AutoMooncake(; config = Mooncake.Config(; friendly_tangents = true))
+optf1 = OPT.OptimizationFunction((x, p) -> loss_neuralode(x), adtype1)
+optprob1 = OPT.OptimizationProblem(optf1, ps)
 pstart = OPT.solve(optprob1, OPO.Adam(0.01); callback, maxiters = 100).u
 
-optprob2 = OPT.OptimizationProblem(optf, pstart)
+# Second-order training: NewtonTrustRegion needs a true Hessian, which
+# `OptimizationBase` assembles via `SecondOrder(AutoForwardDiff(),
+# AutoZygote())`. Mooncake cannot fill that role yet (see the note above),
+# so the Hessian phase keeps the Zygote VJP.
+adtype2 = OPT.AutoZygote()
+optf2 = OPT.OptimizationFunction((x, p) -> loss_neuralode(x), adtype2)
+optprob2 = OPT.OptimizationProblem(optf2, pstart)
 pmin = OPT.solve(optprob2, OOJ.NewtonTrustRegion(); callback, maxiters = 200)
 ```
 
diff --git a/docs/src/examples/pde/brusselator.md b/docs/src/examples/pde/brusselator.md
@@ -156,7 +156,7 @@ First, we have to define and configure the neural network that has to be used fo
 
 ```@example bruss
 import Lux, Random, Optimization as OPT, OptimizationOptimJL as OOJ,
-       SciMLSensitivity as SMS, Zygote
+       SciMLSensitivity as SMS, Mooncake
 
 model = Lux.Chain(Lux.Dense(2 => 16, tanh), Lux.Dense(16 => 1))
 rng = Random.default_rng()
@@ -223,12 +223,13 @@ function loss_fn(ps, _)
 end
 ```
 
-Once the loss function is defined, we use the ADAM optimizer to train the neural network. The optimization problem is defined using SciML's `Optimization.jl` tools, and gradients are computed via automatic differentiation using `AutoZygote()` from `SciMLSensitivity`:
+Once the loss function is defined, we use the ADAM optimizer to train the neural network. The optimization problem is defined using SciML's `Optimization.jl` tools, and gradients are computed via automatic differentiation using Mooncake through the `SciMLSensitivity` adjoint chain:
 
 ```@example bruss
 println("[Training] Starting optimization...")
 import OptimizationOptimisers as OPO
-optf = OPT.OptimizationFunction(loss_fn, SMS.AutoZygote())
+adtype = OPT.AutoMooncake(; config = Mooncake.Config(; friendly_tangents = true))
+optf = OPT.OptimizationFunction(loss_fn, adtype)
 optprob = OPT.OptimizationProblem(optf, ps_init)
 loss_history = Float32[]