feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577
Open
aminsamir45 wants to merge 8 commits into
Open
feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577aminsamir45 wants to merge 8 commits into
aminsamir45 wants to merge 8 commits into
Conversation
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
What T-Rex did
Reviews (1): Last reviewed commit: "feat(planner): ground EV math in observe..." | Re-trigger Greptile |
3 tasks
84bd994 to
195f4f4
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 69f31b0. Configure here.
…as shadow telemetry Tier 0 of the inter-turn routing optimization plan: make the planner's inputs and outputs trustworthy before any switching policy is armed. 1. Observed-token grounding: the EV math priced the prefix with feats.Tokens, a text-only char/4 estimate that misses tool schemas, tool results, and images — badly undercounting long agent sessions. planner.Inputs gains ObservedInputTokens (the previous turn's billed input + cache-read + cache-creation tokens from the pin's usage writeback) and Decide prices tokens as max(estimate, observed): a monotonically growing agent prefix makes the prior turn's total a hard floor for this turn. 2. Planner EV shadow telemetry: the STAY/SWITCH verdict, reason, expected savings / eviction cost / threshold (USD), warmth assumption, and pinned from-model were only on OTel spans, which age out and don't join to session_pins or billed cost. Migration 0032 adds nullable planner_* columns to model_router_request_telemetry (and recreates the production view), so every stay/switch counterfactual is replayable offline before policy changes ship. No routing behavior changes: token grounding only feeds the existing EV comparison, and the telemetry is write-only.
Fixed: - ObservedInputTokens now respects provider usage semantics (new observedPromptTokens helper, mirroring catalog.EffectiveInputCost): Anthropic input_tokens is fresh-only so cache tokens are added; OpenAI-shape and Gemini prompt counts already include cached tokens, so summing them double-counted the prefix in the EV math. - applyPlannerTelemetry persists the USD/warmth columns only when the EV math actually ran (new planner.Decision.EVComputed flag); early-return reasons (same_model, no_prior_usage, ...) keep outcome/reason/pin_model but leave the EV columns NULL instead of measured-looking zeros. - 0032 down migration recreates production_request_telemetry with the explicit 0028-era column list instead of SELECT *, which froze in api_key_id (0031) and broke 0031's down (CI roll-down test). Verified locally with a full migrate up / down -all / up roundtrip.
res.PinModel can carry the model of a pin a turn-loop guard dropped after the lookup (maxed-out output, context overflow, provider eligibility, image constraint), so a no_pin verdict would persist a pin the planner never weighed. Gate the column on the reason and document the NULL semantics on the migration comment.
Main landed 0032_autopay-config while this was in review; golang-migrate rejects duplicate sequence numbers. Autopay does not touch the telemetry table or the production view, so the explicit pre-0031 column list in the down migration is unchanged.
What T-Rex did
Reviews (2): Last reviewed commit: "chore: retrigger CI (Test/Check Migratio..." | Re-trigger Greptile |
Compaction turns are hard-pinned and skip usage writeback, so after a client compaction the pin still carries the pre-compaction billed size. Flooring the EV tokens with that stale value would price a collapsed transcript as a large switch opportunity. A transcript below compactionMinHistoryMessages is either fresh or just-compacted, so the floor only applies at or above it.
Co-authored-by: Cursor <cursoragent@cursor.com>
6c570fd to
ab347ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Tier 0 of the inter-turn routing optimization plan: make the planner's
inputs and outputs trustworthy before any switching policy is armed.
Observed-token grounding: the EV math priced the prefix with
feats.Tokens, a text-only char/4 estimate that misses tool schemas,
tool results, and images — badly undercounting long agent sessions.
planner.Inputs gains ObservedInputTokens (the previous turn's billed
input + cache-read + cache-creation tokens from the pin's usage
writeback) and Decide prices tokens as max(estimate, observed): a
monotonically growing agent prefix makes the prior turn's total a
hard floor for this turn.
Planner EV shadow telemetry: the STAY/SWITCH verdict, reason,
expected savings / eviction cost / threshold (USD), warmth
assumption, and pinned from-model were only on OTel spans, which age
out and don't join to session_pins or billed cost. Migration 0032
adds nullable planner_* columns to model_router_request_telemetry
(and recreates the production view), so every stay/switch
counterfactual is replayable offline before policy changes ship.
No routing behavior changes: token grounding only feeds the existing EV
comparison, and the telemetry is write-only.