Skip to content

Commit 8b1e432

Browse files
committed
small tweaks, make default WD be 0.1 as is often cited, and remove spurious init of LayerNorm, which is already initialized at 1,0
1 parent ab21d6c commit 8b1e432

2 files changed

Lines changed: 1 addition & 5 deletions

File tree

model.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -173,10 +173,6 @@ def _init_weights(self, module):
173173
torch.nn.init.zeros_(module.bias)
174174
elif isinstance(module, nn.Embedding):
175175
torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
176-
elif isinstance(module, (LayerNorm, nn.LayerNorm)):
177-
torch.nn.init.ones_(module.weight)
178-
if module.bias is not None:
179-
torch.nn.init.zeros_(module.bias)
180176

181177
def forward(self, idx, targets=None):
182178
device = idx.device

train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
# adamw optimizer
5858
learning_rate = 6e-4 # max learning rate
5959
max_iters = 600000 # total number of training iterations
60-
weight_decay = 1e-2
60+
weight_decay = 1e-1
6161
beta1 = 0.9
6262
beta2 = 0.95
6363
grad_clip = 1.0 # clip gradients at this value, or disable if == 0.0

0 commit comments

Comments
 (0)