diff --git a/DifferentiationInterface/docs/src/dev/math.md b/DifferentiationInterface/docs/src/dev/math.md
index 88948235e..f31e91b79 100644
--- a/DifferentiationInterface/docs/src/dev/math.md
+++ b/DifferentiationInterface/docs/src/dev/math.md
@@ -10,34 +10,34 @@ It is inspired by
 
 Consider a mathematical function $f(x, c, s) = y$ where
 
-- $x \in \mathcal{X}$ is the active argument (the one being differentiated)
-- $c \in \mathcal{C}$ is a constant argument (corresponds to [`Constant`](@ref) contexts)
-- $s \in \mathcal{S}$ is a scratch argument (corresponds to [`Cache`](@ref) contexts)
-- $y \in \mathcal{Y}$ is the output
+-  $x \in \mathcal{X}$ is the active argument (the one being differentiated)
+-  $c \in \mathcal{C}$ is a constant argument (corresponds to [`Constant`](@ref) contexts)
+-  $s \in \mathcal{S}$ is a scratch argument (corresponds to [`Cache`](@ref) contexts)
+-  $y \in \mathcal{Y}$ is the output
 
 In Julia code, some of the input arguments might be mutated, while the output may be written to as well.
 Therefore, the proper model is a function $\phi(x_0, c_0, s_0, y_0) = (x_1, c_1, s_1, y_1)$ where $a_0$ is the state of argument $a$ before $f$ is run, while $a_1$ is its state after $a$ is run.
 
 DI makes the following hypotheses on the implementation of $f$ (aka the behavior of $\phi$):
 
-1. The active argument $x$ is not mutated, so $x_1 = x_0$
-2. The constant argument $c$ is not mutated, so $c_1 = c_0$
-3. The initial value of the scratch argument $s_0$ does not matter
-4. The initial value of the output $y_0$ does not matter
+1. The active argument $x$ is not mutated, so $x_1 = x_0$.
+2. The constant argument $c$ is not mutated, so $c_1 = c_0$.
+3. The initial value of the scratch argument $s_0$ does not matter. It does not affect any of the states $x_1$, $c_1$, $s_1$, $y_1$.
+4. The initial value of the output $y_0$ does not matter. It does not affect any of the states $x_1$, $c_1$, $s_1$, $y_1$.
 
 ## Forward mode
 
 We want to compute a Jacobian-Vector Product (JVP) $\dot{y} = \left(\frac{\partial f}{\partial x}\right) \dot{x}$ where $\dot{x} \in \mathcal{X}$ is an input tangent.
 
 To do that, we run our AD backend on $\phi$ with input tangents $(\dot{x}_0, \dot{c}_0, \dot{s}_0, \dot{y}_0)$ and obtain $(\dot{x}_1, \dot{c}_1, \dot{s}_1, \dot{y}_1)$.
-The interesting value is
+The value of interest is
 $$\dot{y}_1 = \frac{\partial y_1}{\partial x_0} \dot{x}_0 + \frac{\partial y_1}{\partial c_0} \dot{c}_0 + \frac{\partial y_1}{\partial s_0} \dot{s}_0 + \frac{\partial y_1}{\partial y_0} \dot{y}_0$$
 
 Thanks to our hypotheses 3 and 4 on the function's implementation, $\frac{\partial y_1}{\partial s_0} = 0$ and $\frac{\partial y_1}{\partial y_0} = 0$, so we are left with:
 $$\dot{y}_1 = \frac{\partial y_1}{\partial x_0} \dot{x_0} + \frac{\partial y_1}{\partial c_0} \dot{c_0}$$
 
-Thus, as long as $\dot{c}_0 = 0$, the output tangent $\dot{y}_1$ contains the correct JVP.
-Let us now look at $\dot{s}_1$ with the help of hypothesis 2:
+Thus, as long as we set $\dot{c}_0 = 0$, the output tangent $\dot{y}_1$ contains the correct JVP.
+Let us now look at $\dot{c}_1$ with the help of hypothesis 2:
 $$\dot{c}_1 = \frac{\partial c_1}{\partial x_0} \dot{x}_0 + \frac{\partial c_1}{\partial c_0} \dot{c}_0 + \frac{\partial c_1}{\partial s_0} \dot{s}_0 + \frac{\partial c_1}{\partial y_0} \dot{y}_0 = \dot{c}_0$$
 
 The tangent of $c$ will always be preserved by differentiation.
@@ -47,14 +47,14 @@ The tangent of $c$ will always be preserved by differentiation.
 We want to compute a Vector-Jacobian Product (VJP) $\bar{x} = \left(\frac{\partial f}{\partial x}\right)^* \bar{y}$ where $\bar{y} \in \mathcal{Y}$ is an output sensivity.
 
 To do that, we run our AD backend on $\phi$ with output sensitivities $(\bar{x}_1, \bar{c}_1, \bar{s}_1, \bar{y}_1)$ and obtain $(\bar{x}_0, \bar{c}_0, \bar{s}_0, \bar{y}_0)$.
-The interesting value is
+The value of interest is
 $$\bar{x}_0 = \left(\frac{\partial x_1}{\partial x_0}\right)^* \bar{x}_1 + \left(\frac{\partial c_1}{\partial x_0}\right)^* \bar{c}_1 + \left(\frac{\partial s_1}{\partial x_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial x_0}\right)^* \bar{y}_1$$
 
-Thanks to our hypotheses 1 and 2 on the function's implementation, $\frac{\partial x_1}{\partial x_0} = I$ and $\frac{\partial c_1}{\partial x_0} = 0$, so we are left with:
+Thanks to our hypotheses 1 and 2 on the function's implementation, $\frac{\partial x_1}{\partial x_0} = I$ and $\frac{\partial c_1}{\partial x_0} = \frac{\partial c_0}{\partial x_0} =0$, so we are left with:
 $$\bar{x}_0 = \bar{x}_1 + \left(\frac{\partial s_1}{\partial x_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial x_0}\right)^* \bar{y}_1$$
 
-Thus, as long as $\bar{x}_1 = 0$ and $\bar{s}_1 = 0$, the input sensitivity $\bar{x}_0$ contains the correct VJP.
-Let us now look at $\bar{s}_0$ with the help of hypothesis 3:
+Thus, as long as we set $\bar{x}_1 = 0$ and $\bar{s}_1 = 0$, the input sensitivity $\bar{x}_0$ contains the correct VJP.
+Let us now look at $\bar{s}_0$ with the help of hypothesis 3, which tells us that $\frac{\partial x_1}{\partial s_0} = 0$, $\frac{\partial c_1}{\partial s_0} = 0$, $\frac{\partial s_1}{\partial s_0} = 0$, and $\frac{\partial y_1}{\partial s_0} = 0$:
 
 $$\bar{s}_0 = \left(\frac{\partial x_1}{\partial s_0}\right)^* \bar{x}_1 + \left(\frac{\partial c_1}{\partial s_0}\right)^* \bar{c}_1 + \left(\frac{\partial s_1}{\partial s_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial s_0}\right)^* \bar{y}_1 = 0$$