feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry by aminsamir45 · Pull Request #577 · workweave/router

aminsamir45 · 2026-07-02T20:20:08Z

Tier 0 of the inter-turn routing optimization plan: make the planner's
inputs and outputs trustworthy before any switching policy is armed.

Observed-token grounding: the EV math priced the prefix with
feats.Tokens, a text-only char/4 estimate that misses tool schemas,
tool results, and images — badly undercounting long agent sessions.
planner.Inputs gains ObservedInputTokens (the previous turn's billed
input + cache-read + cache-creation tokens from the pin's usage
writeback) and Decide prices tokens as max(estimate, observed): a
monotonically growing agent prefix makes the prior turn's total a
hard floor for this turn.
Planner EV shadow telemetry: the STAY/SWITCH verdict, reason,
expected savings / eviction cost / threshold (USD), warmth
assumption, and pinned from-model were only on OTel spans, which age
out and don't join to session_pins or billed cost. Migration 0032
adds nullable planner_* columns to model_router_request_telemetry
(and recreates the production view), so every stay/switch
counterfactual is replayable offline before policy changes ship.

No routing behavior changes: token grounding only feeds the existing EV
comparison, and the telemetry is write-only.

aminsamir45 · 2026-07-02T20:20:25Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

greptile-apps · 2026-07-02T20:27:31Z

T-Rex Logs

What T-Rex did

Ran token-grounding tests, including the base run and the head run, and verified exit codes and behavior changes in observed savings and undercount calculations, plus a source check that shows how ObservedInputTokens are computed.
Inspected planner telemetry schema changes, confirming the before state had zero planner_* columns and that an insert would fail due to the missing column, then confirmed the after state added the planner_* columns as nullable and that a planner row could be read back.
Executed planner telemetry tests, noting that before there were no applicable tests and after running tests TestApplyPlannerTelemetry_SkippedLeavesFieldsNull, TestApplyPlannerTelemetry_StayRecordsEVBreakdown, and TestApplyPlannerTelemetry_SwitchPreservesFromModel all passed with exit code 0, and validated the STAY and SWITCH value mappings.

_{Ran code and verified through T-Rex}

_{Reviews (1): Last reviewed commit: "feat(planner): ground EV math in observe..." | Re-trigger Greptile}

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 69f31b0. Configure here.}

…as shadow telemetry Tier 0 of the inter-turn routing optimization plan: make the planner's inputs and outputs trustworthy before any switching policy is armed. 1. Observed-token grounding: the EV math priced the prefix with feats.Tokens, a text-only char/4 estimate that misses tool schemas, tool results, and images — badly undercounting long agent sessions. planner.Inputs gains ObservedInputTokens (the previous turn's billed input + cache-read + cache-creation tokens from the pin's usage writeback) and Decide prices tokens as max(estimate, observed): a monotonically growing agent prefix makes the prior turn's total a hard floor for this turn. 2. Planner EV shadow telemetry: the STAY/SWITCH verdict, reason, expected savings / eviction cost / threshold (USD), warmth assumption, and pinned from-model were only on OTel spans, which age out and don't join to session_pins or billed cost. Migration 0032 adds nullable planner_* columns to model_router_request_telemetry (and recreates the production view), so every stay/switch counterfactual is replayable offline before policy changes ship. No routing behavior changes: token grounding only feeds the existing EV comparison, and the telemetry is write-only.

Fixed: - ObservedInputTokens now respects provider usage semantics (new observedPromptTokens helper, mirroring catalog.EffectiveInputCost): Anthropic input_tokens is fresh-only so cache tokens are added; OpenAI-shape and Gemini prompt counts already include cached tokens, so summing them double-counted the prefix in the EV math. - applyPlannerTelemetry persists the USD/warmth columns only when the EV math actually ran (new planner.Decision.EVComputed flag); early-return reasons (same_model, no_prior_usage, ...) keep outcome/reason/pin_model but leave the EV columns NULL instead of measured-looking zeros. - 0032 down migration recreates production_request_telemetry with the explicit 0028-era column list instead of SELECT *, which froze in api_key_id (0031) and broke 0031's down (CI roll-down test). Verified locally with a full migrate up / down -all / up roundtrip.

res.PinModel can carry the model of a pin a turn-loop guard dropped after the lookup (maxed-out output, context overflow, provider eligibility, image constraint), so a no_pin verdict would persist a pin the planner never weighed. Gate the column on the reason and document the NULL semantics on the migration comment.

Main landed 0032_autopay-config while this was in review; golang-migrate rejects duplicate sequence numbers. Autopay does not touch the telemetry table or the production view, so the explicit pre-0031 column list in the down migration is unchanged.

greptile-apps · 2026-07-02T22:48:46Z

T-Rex Logs

What T-Rex did

The baseline token grounding was captured before the change from trex-artifacts/observed-token-grounding-01-before.txt.
The after token grounding state was captured after the change from trex-artifacts/observed-token-grounding-02-after.txt, including updated EV, token counts, and EVComputed status.
A proxy test verified Anthropic and OpenAI/OpenRouter token counts after the change, including observed_prompt_tokens and transcript counts.
Evidence artifacts confirmed the attempted before run and the after/tooling blocker check, and noted that no contract mismatch occurred because the changed schema path was not executable in this environment.
Planner telemetry tests ran, including the before/after telemetry writeback artifact and null semantics checks for multiple fields.

_{Ran code and verified through T-Rex}

_{Reviews (2): Last reviewed commit: "chore: retrigger CI (Test/Check Migratio..." | Re-trigger Greptile}

Compaction turns are hard-pinned and skip usage writeback, so after a client compaction the pin still carries the pre-compaction billed size. Flooring the EV tokens with that stale value would price a collapsed transcript as a large switch opportunity. A transcript below compactionMinHistoryMessages is either fresh or just-compacted, so the floor only applies at or above it.

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread internal/proxy/turnloop.go Outdated

Comment thread internal/proxy/service.go Outdated

aminsamir45 mentioned this pull request Jul 2, 2026

Treat client history trims as free-switch windows and add cold-pin follow-fresh lever #578

Merged

3 tasks

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread internal/proxy/service.go Outdated

aminsamir45 force-pushed the tier0-planner-observed-ev-telemetry branch from 84bd994 to 195f4f4 Compare July 2, 2026 21:03

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread internal/proxy/turnloop.go

aminsamir45 closed this Jul 2, 2026

aminsamir45 reopened this Jul 2, 2026

aminsamir45 added 5 commits July 2, 2026 15:48

chore: regenerate sqlc after 0032 column-comment update

4f5df42

aminsamir45 and others added 3 commits July 2, 2026 15:48

style: trim verbose comments to essentials

59432f9

chore: retrigger CI (Test/Check Migrations did not start on prior push)

ab347ab

Co-authored-by: Cursor <cursoragent@cursor.com>

aminsamir45 force-pushed the tier0-planner-observed-ev-telemetry branch from 6c570fd to ab347ab Compare July 2, 2026 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577

feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577
aminsamir45 wants to merge 8 commits into
mainfrom
tier0-planner-observed-ev-telemetry

aminsamir45 commented Jul 2, 2026

Uh oh!

aminsamir45 commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026

T-Rex Logs

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026

T-Rex Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aminsamir45 commented Jul 2, 2026

Uh oh!

aminsamir45 commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026

T-Rex Logs

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026

T-Rex Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aminsamir45 commented Jul 2, 2026 •

edited

Loading