labmlai
diff --git a/‎docs/adaptive_computation/ponder_net/index.html‎
Lines changed: 69 additions & 70 deletions b/‎docs/adaptive_computation/ponder_net/index.html‎
Lines changed: 69 additions & 70 deletions
diff --git a/‎docs/capsule_networks/index.html‎
Lines changed: 49 additions & 49 deletions b/‎docs/capsule_networks/index.html‎
Lines changed: 49 additions & 49 deletions
diff --git a/‎docs/capsule_networks/readme.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/capsule_networks/readme.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/cfr/kuhn/index.html‎
Lines changed: 90 additions & 90 deletions b/‎docs/cfr/kuhn/index.html‎
Lines changed: 90 additions & 90 deletions
diff --git a/‎docs/conv_mixer/experiment.html‎
Lines changed: 31 additions & 32 deletions b/‎docs/conv_mixer/experiment.html‎
Lines changed: 31 additions & 32 deletions
diff --git a/‎docs/conv_mixer/index.html‎
Lines changed: 62 additions & 63 deletions b/‎docs/conv_mixer/index.html‎
Lines changed: 62 additions & 63 deletions
diff --git a/‎docs/conv_mixer/readme.html‎
Lines changed: 1 addition & 2 deletions b/‎docs/conv_mixer/readme.html‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/distillation/index.html‎
Lines changed: 75 additions & 76 deletions b/‎docs/distillation/index.html‎
Lines changed: 75 additions & 76 deletions
diff --git a/‎docs/distillation/large.html‎
Lines changed: 36 additions & 37 deletions b/‎docs/distillation/large.html‎
Lines changed: 36 additions & 37 deletions
diff --git a/‎docs/distillation/readme.html‎
Lines changed: 1 addition & 2 deletions b/‎docs/distillation/readme.html‎
Lines changed: 1 addition & 2 deletions
@@ -77,7 +77,7 @@ <h1><a href="https://nn.labml.ai/capsule_networks/index.html">Capsule Networks</
 <p>This file holds the implementations of the core modules of Capsule Networks.</p>
 <p>I used <a href="https://github.com/jindongwang/Pytorch-CapsuleNet">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>
 <p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>
-<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://app.labml.ai/run/e7c08e08586711ebb3e30242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a> </p>
+<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> </p>
 
         </div>
         <div class='code'>
 
@@ -75,8 +75,7 @@ <h1><a href="https://nn.labml.ai/conv_mixer/index.html">Patches Are All You Need
 <p>ConvMixer is Similar to <a href="https://nn.labml.ai/transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="https://nn.labml.ai/transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="https://nn.labml.ai/transformers/feed_forward.html">FFN</a> of ViT).</p>
 <p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>
 <p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href="https://nn.labml.ai/normalization/batch_norm/index.html">Batch normalization</a> instead of <a href="../normalization/layer_norm/index.html">Layer normalization</a>.</p>
-<p>Here&#x27;s <a href="https://nn.labml.ai/conv_mixer/experiment.html">an experiment</a> that trains ConvMixer on CIFAR-10.</p>
-<p><a href="https://app.labml.ai/run/0fc344da2cd011ecb0bc3fdb2e774a3d"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a></p>
+<p>Here&#x27;s <a href="https://nn.labml.ai/conv_mixer/experiment.html">an experiment</a> that trains ConvMixer on CIFAR-10. </p>
 
         </div>
         <div class='code'>
 
@@ -74,8 +74,7 @@ <h1><a href="https://nn.labml.ai/distillation/index.html">Distilling the Knowled
 <p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of the paper <a href="https://papers.labml.ai/paper/1503.02531">Distilling the Knowledge in a Neural Network</a>.</p>
 <p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>
 <p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>
-<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better.</p>
-<p><a href="https://app.labml.ai/run/d6182e2adaf011eb927c91a2a1710932"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a> </p>
+<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>
 
         </div>
         <div class='code'>