Skip to content

Commit ef44617

Browse files
committed
adding figures and fixing minor issues with README_ROUTING
1 parent 17125d4 commit ef44617

4 files changed

Lines changed: 3 additions & 10 deletions

File tree

README_ROUTING.md

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -169,16 +169,16 @@ The training progress is logged and visualized through:
169169
#### Individual Model Curves
170170

171171
**Routing Model Training Curves:**
172-
![Routing Training Curves](training_plots/routing_curves.png)
172+
![Routing Training Curves](assets/routing_curves.png)
173173

174174
**Baseline Model Training Curves:**
175-
![Baseline Training Curves](training_plots/baseline_curves.png)
175+
![Baseline Training Curves](assets/baseline_curves.png)
176176

177177
#### Model Comparison
178178

179179
The comparison shows the routing model achieving better validation BPC throughout training:
180180

181-
![Model Comparison](training_plots/model_comparison.png)
181+
![Model Comparison](assets/model_comparison.png)
182182

183183
### Training Results
184184

@@ -296,10 +296,3 @@ python sample_both_models.py --start "The future of AI" --temperature 1.2 --top_
296296
## Conclusion
297297

298298
The learnable attention head routing extension demonstrates promising results with a 3.36% improvement in BPC while maintaining computational efficiency through top-k head selection. The mechanism successfully encourages specialization among attention heads while preserving the causal structure of the transformer architecture.
299-
300-
python train_novel.py --config=config/train_enwik8_routing.py
301-
Traceback (most recent call last):
302-
File "/gpfs/work4/0/tdse0635/nanoGPT/train_novel.py", line 70, in <module>
303-
exec(open('configurator.py').read()) # overrides from command line or config file
304-
File "<string>", line 47, in <module>
305-
ValueError: Unknown config key: config

assets/baseline_curves.png

362 KB
Loading

assets/model_comparison.png

127 KB
Loading

assets/routing_curves.png

346 KB
Loading

0 commit comments

Comments
 (0)