Problem
The agent-lab powered binary follow-up found that Kimi author/reproducer calls can be normalized in the lab runner, but worker-side helper calls inside generated/runtime strategies still hit a separate chat path that sends temperature below 1.
Observed in agent-lab run runs/2026-06-18/powered-binary-followup/kimi-k27-binary-n12p3-invalid.json:
- model:
kimi-k2.7-code
- cell:
kimi-k27-binary-n12p3
- status: stopped invalid after 12/12 gen0 rows and 3/12 gen1 rows
- trace:
runs/2026-06-18/powered-binary-followup/kimi-k27-binary-n12p3-trace/events.jsonl
- error count: 6
invalid temperature: only 1 is allowed for this model events
Example trace event shape:
LLM call 400: {"error":{"message":"invalid temperature: only 1 is allowed for this model" ...}}
Why it matters
This makes Kimi unusable for certifiable strategy-evolution experiments even when the runner config sets WORKER_TEMPERATURE=1 and wraps the author ChatClient. The failure appears in worker-side helper calls such as critique/analyst paths used by generated strategies, not just top-level worker calls.
Expected fix
Normalize or retry temperature-rejected Kimi requests wherever runtime worker/helper calls bind agent-eval chat clients, matching the existing router-client behavior that retries temperature errors at temperature=1.
Guardrail
Do not hide these as scored model failures. Either retry with the provider-required temperature or surface a structured infra/model-adapter error so evals can mark the cell invalid instead of treating it as a bad policy.
Problem
The agent-lab powered binary follow-up found that Kimi author/reproducer calls can be normalized in the lab runner, but worker-side helper calls inside generated/runtime strategies still hit a separate chat path that sends temperature below 1.
Observed in
agent-labrunruns/2026-06-18/powered-binary-followup/kimi-k27-binary-n12p3-invalid.json:kimi-k2.7-codekimi-k27-binary-n12p3runs/2026-06-18/powered-binary-followup/kimi-k27-binary-n12p3-trace/events.jsonlinvalid temperature: only 1 is allowed for this modeleventsExample trace event shape:
Why it matters
This makes Kimi unusable for certifiable strategy-evolution experiments even when the runner config sets
WORKER_TEMPERATURE=1and wraps the authorChatClient. The failure appears in worker-side helper calls such as critique/analyst paths used by generated strategies, not just top-level worker calls.Expected fix
Normalize or retry temperature-rejected Kimi requests wherever runtime worker/helper calls bind
agent-evalchat clients, matching the existing router-client behavior that retries temperature errors attemperature=1.Guardrail
Do not hide these as scored model failures. Either retry with the provider-required temperature or surface a structured infra/model-adapter error so evals can mark the cell invalid instead of treating it as a bad policy.