Skip to content

Commit 9a42ac2

Browse files
committed
arxiv.org links
1 parent 1159ecf commit 9a42ac2

238 files changed

Lines changed: 354 additions & 353 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/activations/fta/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
</div>
7474
<h1>Fuzzy Tiling Activations (FTA)</h1>
7575
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a></p>
76-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1911.08068">Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online</a>.</p>
76+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1911.08068">Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online</a>.</p>
7777
<p>Fuzzy tiling activations are a form of sparse activations based on binning.</p>
7878
<p>Binning is classification of a scalar value into a bin based on intervals. One problem with binning is that it gives zero gradients for most values (except at the boundary of bins). The other is that binning loses precision if the bin intervals are large.</p>
7979
<p>FTA overcomes these disadvantages. Instead of hard boundaries like in Tiling Activations, FTA uses soft boundaries between bins. This gives non-zero gradients for all or a wide range of values. And also doesn&#x27;t lose precision since it&#x27;s captured in partial values.</p>

docs/adaptive_computation/parity.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
<a href='#section-0'>#</a>
7272
</div>
7373
<h1>Parity Task</h1>
74-
<p>This creates data for Parity Task from the paper <a href="https://papers.labml.ai/paper/1603.08983">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>
74+
<p>This creates data for Parity Task from the paper <a href="https://arxiv.org/abs/1603.08983">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>
7575
<p>The input of the parity task is a vector with <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">0</span></span></span></span></span>&#x27;s <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s and <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style=""></span><span class="mord" style=""><span class="mord coloredeq eqc" style="">1</span></span></span></span></span></span></span>&#x27;s. The output is the parity of <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s - one if there is an odd number of <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord coloredeq eqc" style=""><span class="mord" style="">1</span></span></span></span></span></span> or <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style=""></span><span class="mord" style=""><span class="mord coloredeq eqc" style="">1</span></span></span></span></span></span></span>&#x27;s.</p>
7676

7777
</div>

docs/adaptive_computation/ponder_net/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
<a href='#section-0'>#</a>
7373
</div>
7474
<h1>PonderNet: Learning to Ponder</h1>
75-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2107.05407">PonderNet: Learning to Ponder</a>.</p>
75+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2107.05407">PonderNet: Learning to Ponder</a>.</p>
7676
<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent.</p>
7777
<p>PonderNet has a step function of the form</p>
7878
<p><span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.902771em;vertical-align:-0.208331em;"></span><span class="mord coloredeq eqx" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style=""><span class="mord mathnormal coloredeq eqbl" style="margin-right:0.03588em">y</span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord coloredeq eqbg" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbi" style=""><span class="mord mathnormal mtight" style="">n</span></span><span class="mbin mtight">+</span><span class="mord mtight coloredeq eqbe" style=""><span class="mord mtight" style="">1</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqy" style=""><span class="mord" style=""><span class="mord mathnormal" style="">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqbj" style=""><span class="mord mathnormal" style="">s</span></span><span class="mopen">(</span><span class="mord coloredeq eqbk" style=""><span class="mord mathnormal" style="">x</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqba" style=""><span class="mord" style=""><span class="mord" style=""><span class="mord mathnormal coloredeq eqbg" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight coloredeq eqbi" style="">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></span></p>

docs/adaptive_computation/ponder_net/readme.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
<a href='#section-0'>#</a>
7373
</div>
7474
<h1><a href="https://nn.labml.ai/adaptive_computation/ponder_net/index.html">PonderNet: Learning to Ponder</a></h1>
75-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2107.05407">PonderNet: Learning to Ponder</a>.</p>
75+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2107.05407">PonderNet: Learning to Ponder</a>.</p>
7676
<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>
7777

7878
</div>

docs/capsule_networks/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
<a href='#section-0'>#</a>
7272
</div>
7373
<h1>Capsule Networks</h1>
74-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
74+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
7575
<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>
7676
<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href="mnist.html">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>
7777
<p>This file holds the implementations of the core modules of Capsule Networks.</p>

docs/capsule_networks/mnist.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
</div>
7373
<h1>Classify MNIST digits with Capsule Networks</h1>
7474
<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>
75-
<p>This paper implements the experiment described in paper <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
75+
<p>This paper implements the experiment described in paper <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
7676

7777
</div>
7878
<div class='code'>

docs/capsule_networks/readme.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
<a href='#section-0'>#</a>
7272
</div>
7373
<h1><a href="https://nn.labml.ai/capsule_networks/index.html">Capsule Networks</a></h1>
74-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://papers.labml.ai/paper/1710.09829">Dynamic Routing Between Capsules</a>.</p>
74+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of <a href="https://arxiv.org/abs/1710.09829">Dynamic Routing Between Capsules</a>.</p>
7575
<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>
7676
<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href="mnist.html">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>
7777
<p>This file holds the implementations of the core modules of Capsule Networks.</p>

docs/cfr/kuhn/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ <h2><a href="../index.html#InfoSet">Information set</a></h2>
221221
<div class="highlight"><pre><span class="lineno">74</span> <span class="n">total</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="o">.</span><span class="n">values</span><span class="p">())</span>
222222
<span class="lineno">75</span> <span class="n">total</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">total</span><span class="p">,</span> <span class="mf">1e-6</span><span class="p">)</span>
223223
<span class="lineno">76</span> <span class="n">bet</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="p">[</span><span class="n">cast</span><span class="p">(</span><span class="n">Action</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">)]</span> <span class="o">/</span> <span class="n">total</span>
224-
<span class="lineno">77</span> <span class="k">return</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">bet</span> <span class="o">*</span> <span class="mi">100</span><span class="si">:</span><span class="s1"> .1f</span><span class="si">}</span><span class="s1">%&#39;</span></pre></div>
224+
<span class="lineno">77</span> <span class="k">return</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">bet</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">100</span><span class="si">:</span><span class="s1"> .1f</span><span class="si">}</span><span class="s1">%&#39;</span></pre></div>
225225
</div>
226226
</div>
227227
<div class='section' id='section-11'>

docs/conv_mixer/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
<a href='#section-0'>#</a>
7272
</div>
7373
<h1>Patches Are All You Need? (ConvMixer)</h1>
74-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2201.09792">Patches Are All You Need?</a>.</p>
74+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2201.09792">Patches Are All You Need?</a>.</p>
7575
<p><img alt="ConvMixer diagram from the paper" src="conv_mixer.png"></p>
7676
<p>ConvMixer is Similar to <a href="../transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="../transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="../transformers/feed_forward.html">FFN</a> of ViT).</p>
7777
<p>ConvMixer uses a <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqa" style=""><span class="mord" style="">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin" style="">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord" style="">1</span></span></span></span></span></span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>

docs/conv_mixer/readme.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
<a href='#section-0'>#</a>
7272
</div>
7373
<h1><a href="https://nn.labml.ai/conv_mixer/index.html">Patches Are All You Need?</a></h1>
74-
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://papers.labml.ai/paper/2201.09792">Patches Are All You Need?</a>.</p>
74+
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of the paper <a href="https://arxiv.org/abs/2201.09792">Patches Are All You Need?</a>.</p>
7575
<p>ConvMixer is Similar to <a href="https://nn.labml.ai/transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="https://nn.labml.ai/transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="https://nn.labml.ai/transformers/feed_forward.html">FFN</a> of ViT).</p>
7676
<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>
7777
<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href="https://nn.labml.ai/normalization/batch_norm/index.html">Batch normalization</a> instead of <a href="../normalization/layer_norm/index.html">Layer normalization</a>.</p>

0 commit comments

Comments
 (0)