|
| 1 | +<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"> </script> |
| 2 | + |
| 3 | + |
1 | 4 | ## Appendix |
2 | 5 |
|
3 | 6 | ### Prompts Used in Evaluation |
@@ -46,7 +49,7 @@ Only output the final answer in valid JSON format wrapped in a markdown code blo |
46 | 49 | {question} |
47 | 50 | ```` |
48 | 51 |
|
49 | | -**$\tau^2$-bench** |
| 52 | +**<span>$$\tau^2$$</span>-bench** |
50 | 53 |
|
51 | 54 | ``` |
52 | 55 | {question} |
@@ -182,7 +185,7 @@ This section provides supplementary results on individual math benchmarks. |
182 | 185 |
|
183 | 186 | </figure> |
184 | 187 |
|
185 | | -### $\tau^2$-bench Individual Domain Results |
| 188 | +### <span>$$\tau^2$$</span>-bench Individual Domain Results |
186 | 189 |
|
187 | 190 | **[Choosing Teacher Models](./model_distillation_best_practices.md#choosing-teacher-models)** |
188 | 191 |
|
@@ -270,7 +273,7 @@ This section provides supplementary results on individual math benchmarks. |
270 | 273 |
|
271 | 274 | ### Acceptance Rates for Rejection Sampling |
272 | 275 |
|
273 | | -[Table 2](#tab-accept-rates) presents the acceptance rates across four different tasks using six different teacher models. For Qwen3-235B-A22B-Thinking-2507 and GLM 4.7, these models were specifically utilized for generating responses for RE-ARC and $\tau^2$-bench synthetic data, respectively. Notably, a significant performance jump was observed in RE-ARC when using the more recent 2507 version. |
| 276 | +[Table 2](#tab-accept-rates) presents the acceptance rates across four different tasks using six different teacher models. For Qwen3-235B-A22B-Thinking-2507 and GLM 4.7, these models were specifically utilized for generating responses for RE-ARC and <span>$$\tau^2$$</span>-bench synthetic data, respectively. Notably, a significant performance jump was observed in RE-ARC when using the more recent 2507 version. |
274 | 277 |
|
275 | 278 | <table id="tab-accept-rates" style="margin-left:auto; margin-right:auto;"> |
276 | 279 | <thead> |
@@ -329,11 +332,11 @@ This section provides supplementary results on individual math benchmarks. |
329 | 332 | <caption style="text-align: left;"><b>Table 2:</b> Acceptance rates for rejection sampling across various tasks using different teacher models.</caption> |
330 | 333 | </table> |
331 | 334 |
|
332 | | -### $\tau^2$-bench Synthetic Example |
| 335 | +### <span>$$\tau^2$$</span>-bench Synthetic Example |
333 | 336 |
|
334 | 337 | Below, we attach one synthetic sample generated following the process described in Section [Agentic Task (Tool Utilization)](./model_distillation_best_practices.md#agentic-task-tool-utilization). |
335 | 338 |
|
336 | | -<details> |
| 339 | +<details markdown="block"> |
337 | 340 |
|
338 | 341 | <summary>Task (Click to show/hide)</summary> |
339 | 342 |
|
@@ -453,7 +456,7 @@ Below, we attach one synthetic sample generated following the process described |
453 | 456 |
|
454 | 457 | </details> |
455 | 458 |
|
456 | | -<details> |
| 459 | +<details markdown="block"> |
457 | 460 |
|
458 | 461 | <summary>Database (Click to show/hide)</summary> |
459 | 462 |
|
@@ -529,7 +532,7 @@ activation_date = "2023-11-01T10:00:00" |
529 | 532 |
|
530 | 533 | </details> |
531 | 534 |
|
532 | | -<details> |
| 535 | +<details markdown="block"> |
533 | 536 |
|
534 | 537 | <summary>Trajectory (Click to show/hide)</summary> |
535 | 538 |
|
|
0 commit comments