Skip to content

Commit dc81cb3

Browse files
fix: ensure non-zero learning rate during warmup at iteration 0
The warmup learning rate calculation has been modified to use (it + 1)/(warmup_iters + 1) instead of it/warmup_iters. This ensures a non-zero learning rate at iteration 0 while maintaining the same linear warmup behavior. Fixes karpathy#443
1 parent 9755682 commit dc81cb3

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ def estimate_loss():
231231
def get_lr(it):
232232
# 1) linear warmup for warmup_iters steps
233233
if it < warmup_iters:
234-
return learning_rate * it / warmup_iters
234+
return learning_rate * (it + 1) / (warmup_iters + 1)
235235
# 2) if it > lr_decay_iters, return min learning rate
236236
if it > lr_decay_iters:
237237
return min_lr

0 commit comments

Comments
 (0)