Skip to content

Commit bcccbee

Browse files
authored
Update distillation report. (#4472)
1 parent 5ae3255 commit bcccbee

3 files changed

Lines changed: 63 additions & 48 deletions

File tree

docs/vertex-training-cluster/appendix.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"> </script>
2+
3+
14
## Appendix
25

36
### Prompts Used in Evaluation
@@ -46,7 +49,7 @@ Only output the final answer in valid JSON format wrapped in a markdown code blo
4649
{question}
4750
````
4851

49-
**$\tau^2$-bench**
52+
**<span>$$\tau^2$$</span>-bench**
5053

5154
```
5255
{question}
@@ -182,7 +185,7 @@ This section provides supplementary results on individual math benchmarks.
182185

183186
</figure>
184187

185-
### $\tau^2$-bench Individual Domain Results
188+
### <span>$$\tau^2$$</span>-bench Individual Domain Results
186189

187190
**[Choosing Teacher Models](./model_distillation_best_practices.md#choosing-teacher-models)**
188191

@@ -270,7 +273,7 @@ This section provides supplementary results on individual math benchmarks.
270273

271274
### Acceptance Rates for Rejection Sampling
272275

273-
[Table 2](#tab-accept-rates) presents the acceptance rates across four different tasks using six different teacher models. For Qwen3-235B-A22B-Thinking-2507 and GLM 4.7, these models were specifically utilized for generating responses for RE-ARC and $\tau^2$-bench synthetic data, respectively. Notably, a significant performance jump was observed in RE-ARC when using the more recent 2507 version.
276+
[Table 2](#tab-accept-rates) presents the acceptance rates across four different tasks using six different teacher models. For Qwen3-235B-A22B-Thinking-2507 and GLM 4.7, these models were specifically utilized for generating responses for RE-ARC and <span>$$\tau^2$$</span>-bench synthetic data, respectively. Notably, a significant performance jump was observed in RE-ARC when using the more recent 2507 version.
274277

275278
<table id="tab-accept-rates" style="margin-left:auto; margin-right:auto;">
276279
<thead>
@@ -329,11 +332,11 @@ This section provides supplementary results on individual math benchmarks.
329332
<caption style="text-align: left;"><b>Table 2:</b> Acceptance rates for rejection sampling across various tasks using different teacher models.</caption>
330333
</table>
331334

332-
### $\tau^2$-bench Synthetic Example
335+
### <span>$$\tau^2$$</span>-bench Synthetic Example
333336

334337
Below, we attach one synthetic sample generated following the process described in Section [Agentic Task (Tool Utilization)](./model_distillation_best_practices.md#agentic-task-tool-utilization).
335338

336-
<details>
339+
<details markdown="block">
337340

338341
<summary>Task (Click to show/hide)</summary>
339342

@@ -453,7 +456,7 @@ Below, we attach one synthetic sample generated following the process described
453456

454457
</details>
455458

456-
<details>
459+
<details markdown="block">
457460

458461
<summary>Database (Click to show/hide)</summary>
459462

@@ -529,7 +532,7 @@ activation_date = "2023-11-01T10:00:00"
529532

530533
</details>
531534

532-
<details>
535+
<details markdown="block">
533536

534537
<summary>Trajectory (Click to show/hide)</summary>
535538

8.06 KB
Loading

0 commit comments

Comments
 (0)