labmlai
diff --git a/‎docs/activations/fta/index.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/activations/fta/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/adaptive_computation/parity.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/adaptive_computation/parity.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/adaptive_computation/ponder_net/index.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/adaptive_computation/ponder_net/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/adaptive_computation/ponder_net/readme.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/adaptive_computation/ponder_net/readme.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/capsule_networks/index.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/capsule_networks/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/capsule_networks/mnist.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/capsule_networks/mnist.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/capsule_networks/readme.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/capsule_networks/readme.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/cfr/kuhn/index.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/cfr/kuhn/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/conv_mixer/index.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/conv_mixer/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/conv_mixer/readme.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/conv_mixer/readme.html‎
Lines changed: 1 addition & 1 deletion
@@ -73,7 +73,7 @@
             </div>
             <h1>Fuzzy Tiling Activations (FTA)</h1>
 <p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></p>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1911.08068">Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1911.08068">Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online</a>.</p>
 <p>Fuzzy tiling activations are a form of sparse activations based on binning.</p>
 <p>Binning is classification of a scalar value into a bin based on intervals. One problem with binning is that it gives zero gradients for most values (except at the boundary of bins). The other is that binning loses precision if the bin intervals are large.</p>
 <p>FTA overcomes these disadvantages. Instead of hard boundaries like in Tiling Activations, FTA uses soft boundaries between bins. This gives non-zero gradients for all or a wide range of values. And also doesn&#x27;t lose precision since it&#x27;s captured in partial values.</p>
 
@@ -71,7 +71,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1>Parity Task</h1>
-<p>This creates data for Parity Task from the paper <a href="https://papers.labml.ai/paper/1603.08983">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>
+<p>This creates data for Parity Task from the paper <a href="https://arxiv.org/abs/1603.08983">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>
 <p>The input of the parity task is a vector with <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">0</span></span></span></span></span>&#x27;s <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s and <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style="">−</span><span class="mord" style=""><span class="mord coloredeq eqc" style="">1</span></span></span></span></span></span></span>&#x27;s. The output is the parity of <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s - one if there is an odd number of <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span> or <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style="">−</span><span class="mord" style=""><span class="mord coloredeq eqc" style="">1</span></span></span></span></span></span></span>&#x27;s.</p>
 
         </div>
 
@@ -72,7 +72,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1>PonderNet: Learning to Ponder</h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2107.05407">PonderNet: Learning to Ponder</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2107.05407">PonderNet: Learning to Ponder</a>.</p>
 <p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent.</p>
 <p>PonderNet has a step function of the form</p>
 <p><span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.902771em;vertical-align:-0.208331em;"></span><span class="mord coloredeq eqx" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style=""><span class="mord mathnormal coloredeq eqbl" style="margin-right:0.03588em">y</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord coloredeq eqbg" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbi" style=""><span class="mord mathnormal mtight" style="">n</span></span><span class="mbin mtight">+</span><span class="mord mtight coloredeq eqbe" style=""><span class="mord mtight" style="">1</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqy" style=""><span class="mord" style=""><span class="mord mathnormal" style="">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqbj" style=""><span class="mord mathnormal" style="">s</span></span><span class="mopen">(</span><span class="mord coloredeq eqbk" style=""><span class="mord mathnormal" style="">x</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqba" style=""><span class="mord" style=""><span class="mord" style=""><span class="mord mathnormal coloredeq eqbg" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></span></p>
 
@@ -72,7 +72,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1><a href="https://nn.labml.ai/adaptive_computation/ponder_net/index.html">PonderNet: Learning to Ponder</a></h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2107.05407">PonderNet: Learning to Ponder</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2107.05407">PonderNet: Learning to Ponder</a>.</p>
 <p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>
 
         </div>
 
@@ -71,7 +71,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1>Capsule Networks</h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
 <p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>
 <p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href="mnist.html">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>
 <p>This file holds the implementations of the core modules of Capsule Networks.</p>
 
@@ -72,7 +72,7 @@
             </div>
             <h1>Classify MNIST digits with Capsule Networks</h1>
 <p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>
-<p>This paper implements the experiment described in paper <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
+<p>This paper implements the experiment described in paper <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
 
         </div>
         <div class='code'>
 
@@ -71,7 +71,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1><a href="https://nn.labml.ai/capsule_networks/index.html">Capsule Networks</a></h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
 <p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>
 <p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href="mnist.html">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>
 <p>This file holds the implementations of the core modules of Capsule Networks.</p>
 
@@ -221,7 +221,7 @@ <h2><a href="../index.html#InfoSet">Information set</a></h2>
             <div class="highlight"><pre><span class="lineno">74</span>        <span class="n">total</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="o">.</span><span class="n">values</span><span class="p">())</span>
 <span class="lineno">75</span>        <span class="n">total</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">total</span><span class="p">,</span> <span class="mf">1e-6</span><span class="p">)</span>
 <span class="lineno">76</span>        <span class="n">bet</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="p">[</span><span class="n">cast</span><span class="p">(</span><span class="n">Action</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">)]</span> <span class="o">/</span> <span class="n">total</span>
-<span class="lineno">77</span>        <span class="k">return</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">bet</span> <span class="o">*</span> <span class="mi">100</span><span class="si">:</span><span class="s1"> .1f</span><span class="si">}</span><span class="s1">%&#39;</span></pre></div>
+<span class="lineno">77</span>        <span class="k">return</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">bet</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="si">:</span><span class="s1"> .1f</span><span class="si">}</span><span class="s1">%&#39;</span></pre></div>
         </div>
     </div>
     <div class='section' id='section-11'>
 
@@ -71,7 +71,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1>Patches Are All You Need? (ConvMixer)</h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2201.09792">Patches Are All You Need?</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2201.09792">Patches Are All You Need?</a>.</p>
 <p><img alt="ConvMixer diagram from the paper" src="conv_mixer.png"></p>
 <p>ConvMixer is Similar to <a href="../transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="../transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="../transformers/feed_forward.html">FFN</a> of ViT).</p>
 <p>ConvMixer uses a <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style="">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin" style="">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord" style="">1</span></span></span></span></span></span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>
 
@@ -71,7 +71,7 @@
                 <a href='#section-0'>#</a>
             </div>
             <h1><a href="https://nn.labml.ai/conv_mixer/index.html">Patches Are All You Need?</a></h1>
-<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2201.09792">Patches Are All You Need?</a>.</p>
+<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2201.09792">Patches Are All You Need?</a>.</p>
 <p>ConvMixer is Similar to <a href="https://nn.labml.ai/transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="https://nn.labml.ai/transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="https://nn.labml.ai/transformers/feed_forward.html">FFN</a> of ViT).</p>
 <p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>
 <p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href="https://nn.labml.ai/normalization/batch_norm/index.html">Batch normalization</a> instead of <a href="../normalization/layer_norm/index.html">Layer normalization</a>.</p>