Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions DifferentiationInterface/docs/src/dev/math.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,34 @@ It is inspired by

Consider a mathematical function $f(x, c, s) = y$ where

- $x \in \mathcal{X}$ is the active argument (the one being differentiated)
- $c \in \mathcal{C}$ is a constant argument (corresponds to [`Constant`](@ref) contexts)
- $s \in \mathcal{S}$ is a scratch argument (corresponds to [`Cache`](@ref) contexts)
- $y \in \mathcal{Y}$ is the output
- $x \in \mathcal{X}$ is the active argument (the one being differentiated)
- $c \in \mathcal{C}$ is a constant argument (corresponds to [`Constant`](@ref) contexts)
- $s \in \mathcal{S}$ is a scratch argument (corresponds to [`Cache`](@ref) contexts)
- $y \in \mathcal{Y}$ is the output

In Julia code, some of the input arguments might be mutated, while the output may be written to as well.
Therefore, the proper model is a function $\phi(x_0, c_0, s_0, y_0) = (x_1, c_1, s_1, y_1)$ where $a_0$ is the state of argument $a$ before $f$ is run, while $a_1$ is its state after $a$ is run.

DI makes the following hypotheses on the implementation of $f$ (aka the behavior of $\phi$):

1. The active argument $x$ is not mutated, so $x_1 = x_0$
2. The constant argument $c$ is not mutated, so $c_1 = c_0$
3. The initial value of the scratch argument $s_0$ does not matter
4. The initial value of the output $y_0$ does not matter
1. The active argument $x$ is not mutated, so $x_1 = x_0$.
2. The constant argument $c$ is not mutated, so $c_1 = c_0$.
3. The initial value of the scratch argument $s_0$ does not matter. It does not affect any of the states $x_1$, $c_1$, $s_1$, $y_1$.
4. The initial value of the output $y_0$ does not matter. It does not affect any of the states $x_1$, $c_1$, $s_1$, $y_1$.

## Forward mode

We want to compute a Jacobian-Vector Product (JVP) $\dot{y} = \left(\frac{\partial f}{\partial x}\right) \dot{x}$ where $\dot{x} \in \mathcal{X}$ is an input tangent.

To do that, we run our AD backend on $\phi$ with input tangents $(\dot{x}_0, \dot{c}_0, \dot{s}_0, \dot{y}_0)$ and obtain $(\dot{x}_1, \dot{c}_1, \dot{s}_1, \dot{y}_1)$.
The interesting value is
The value of interest is
$$\dot{y}_1 = \frac{\partial y_1}{\partial x_0} \dot{x}_0 + \frac{\partial y_1}{\partial c_0} \dot{c}_0 + \frac{\partial y_1}{\partial s_0} \dot{s}_0 + \frac{\partial y_1}{\partial y_0} \dot{y}_0$$

Thanks to our hypotheses 3 and 4 on the function's implementation, $\frac{\partial y_1}{\partial s_0} = 0$ and $\frac{\partial y_1}{\partial y_0} = 0$, so we are left with:
$$\dot{y}_1 = \frac{\partial y_1}{\partial x_0} \dot{x_0} + \frac{\partial y_1}{\partial c_0} \dot{c_0}$$

Thus, as long as $\dot{c}_0 = 0$, the output tangent $\dot{y}_1$ contains the correct JVP.
Let us now look at $\dot{s}_1$ with the help of hypothesis 2:
Thus, as long as we set $\dot{c}_0 = 0$, the output tangent $\dot{y}_1$ contains the correct JVP.
Let us now look at $\dot{c}_1$ with the help of hypothesis 2:
$$\dot{c}_1 = \frac{\partial c_1}{\partial x_0} \dot{x}_0 + \frac{\partial c_1}{\partial c_0} \dot{c}_0 + \frac{\partial c_1}{\partial s_0} \dot{s}_0 + \frac{\partial c_1}{\partial y_0} \dot{y}_0 = \dot{c}_0$$

The tangent of $c$ will always be preserved by differentiation.
Expand All @@ -47,14 +47,14 @@ The tangent of $c$ will always be preserved by differentiation.
We want to compute a Vector-Jacobian Product (VJP) $\bar{x} = \left(\frac{\partial f}{\partial x}\right)^* \bar{y}$ where $\bar{y} \in \mathcal{Y}$ is an output sensivity.

To do that, we run our AD backend on $\phi$ with output sensitivities $(\bar{x}_1, \bar{c}_1, \bar{s}_1, \bar{y}_1)$ and obtain $(\bar{x}_0, \bar{c}_0, \bar{s}_0, \bar{y}_0)$.
The interesting value is
The value of interest is
$$\bar{x}_0 = \left(\frac{\partial x_1}{\partial x_0}\right)^* \bar{x}_1 + \left(\frac{\partial c_1}{\partial x_0}\right)^* \bar{c}_1 + \left(\frac{\partial s_1}{\partial x_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial x_0}\right)^* \bar{y}_1$$

Thanks to our hypotheses 1 and 2 on the function's implementation, $\frac{\partial x_1}{\partial x_0} = I$ and $\frac{\partial c_1}{\partial x_0} = 0$, so we are left with:
Thanks to our hypotheses 1 and 2 on the function's implementation, $\frac{\partial x_1}{\partial x_0} = I$ and $\frac{\partial c_1}{\partial x_0} = \frac{\partial c_0}{\partial x_0} =0$, so we are left with:
$$\bar{x}_0 = \bar{x}_1 + \left(\frac{\partial s_1}{\partial x_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial x_0}\right)^* \bar{y}_1$$

Thus, as long as $\bar{x}_1 = 0$ and $\bar{s}_1 = 0$, the input sensitivity $\bar{x}_0$ contains the correct VJP.
Let us now look at $\bar{s}_0$ with the help of hypothesis 3:
Thus, as long as we set $\bar{x}_1 = 0$ and $\bar{s}_1 = 0$, the input sensitivity $\bar{x}_0$ contains the correct VJP.
Let us now look at $\bar{s}_0$ with the help of hypothesis 3, which tells us that $\frac{\partial x_1}{\partial s_0} = 0$, $\frac{\partial c_1}{\partial s_0} = 0$, $\frac{\partial s_1}{\partial s_0} = 0$, and $\frac{\partial y_1}{\partial s_0} = 0$:

$$\bar{s}_0 = \left(\frac{\partial x_1}{\partial s_0}\right)^* \bar{x}_1 + \left(\frac{\partial c_1}{\partial s_0}\right)^* \bar{c}_1 + \left(\frac{\partial s_1}{\partial s_0}\right)^* \bar{s}_1 + \left(\frac{\partial y_1}{\partial s_0}\right)^* \bar{y}_1 = 0$$

Expand Down
Loading