Skip to content

feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577

Open
aminsamir45 wants to merge 8 commits into
mainfrom
tier0-planner-observed-ev-telemetry
Open

feat(planner): ground EV math in observed usage + persist EV verdict as shadow telemetry#577
aminsamir45 wants to merge 8 commits into
mainfrom
tier0-planner-observed-ev-telemetry

Conversation

@aminsamir45

Copy link
Copy Markdown
Contributor

Tier 0 of the inter-turn routing optimization plan: make the planner's
inputs and outputs trustworthy before any switching policy is armed.

  1. Observed-token grounding: the EV math priced the prefix with
    feats.Tokens, a text-only char/4 estimate that misses tool schemas,
    tool results, and images — badly undercounting long agent sessions.
    planner.Inputs gains ObservedInputTokens (the previous turn's billed
    input + cache-read + cache-creation tokens from the pin's usage
    writeback) and Decide prices tokens as max(estimate, observed): a
    monotonically growing agent prefix makes the prior turn's total a
    hard floor for this turn.

  2. Planner EV shadow telemetry: the STAY/SWITCH verdict, reason,
    expected savings / eviction cost / threshold (USD), warmth
    assumption, and pinned from-model were only on OTel spans, which age
    out and don't join to session_pins or billed cost. Migration 0032
    adds nullable planner_* columns to model_router_request_telemetry
    (and recreates the production view), so every stay/switch
    counterfactual is replayable offline before policy changes ship.

No routing behavior changes: token grounding only feeds the existing EV
comparison, and the telemetry is write-only.

aminsamir45 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Comment thread internal/proxy/turnloop.go Outdated
Comment thread internal/proxy/service.go Outdated
@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

T-Rex T-Rex Logs

What T-Rex did

  • Ran token-grounding tests, including the base run and the head run, and verified exit codes and behavior changes in observed savings and undercount calculations, plus a source check that shows how ObservedInputTokens are computed.
  • Inspected planner telemetry schema changes, confirming the before state had zero planner_* columns and that an insert would fail due to the missing column, then confirmed the after state added the planner_* columns as nullable and that a planner row could be read back.
  • Executed planner telemetry tests, noting that before there were no applicable tests and after running tests TestApplyPlannerTelemetry_SkippedLeavesFieldsNull, TestApplyPlannerTelemetry_StayRecordsEVBreakdown, and TestApplyPlannerTelemetry_SwitchPreservesFromModel all passed with exit code 0, and validated the STAY and SWITCH value mappings.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (1): Last reviewed commit: "feat(planner): ground EV math in observe..." | Re-trigger Greptile

Comment thread internal/proxy/service.go Outdated
@aminsamir45 aminsamir45 force-pushed the tier0-planner-observed-ev-telemetry branch from 84bd994 to 195f4f4 Compare July 2, 2026 21:03

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 69f31b0. Configure here.

Comment thread internal/proxy/turnloop.go
@aminsamir45 aminsamir45 closed this Jul 2, 2026
@aminsamir45 aminsamir45 reopened this Jul 2, 2026
…as shadow telemetry

Tier 0 of the inter-turn routing optimization plan: make the planner's
inputs and outputs trustworthy before any switching policy is armed.

1. Observed-token grounding: the EV math priced the prefix with
   feats.Tokens, a text-only char/4 estimate that misses tool schemas,
   tool results, and images — badly undercounting long agent sessions.
   planner.Inputs gains ObservedInputTokens (the previous turn's billed
   input + cache-read + cache-creation tokens from the pin's usage
   writeback) and Decide prices tokens as max(estimate, observed): a
   monotonically growing agent prefix makes the prior turn's total a
   hard floor for this turn.

2. Planner EV shadow telemetry: the STAY/SWITCH verdict, reason,
   expected savings / eviction cost / threshold (USD), warmth
   assumption, and pinned from-model were only on OTel spans, which age
   out and don't join to session_pins or billed cost. Migration 0032
   adds nullable planner_* columns to model_router_request_telemetry
   (and recreates the production view), so every stay/switch
   counterfactual is replayable offline before policy changes ship.

No routing behavior changes: token grounding only feeds the existing EV
comparison, and the telemetry is write-only.
Fixed:
- ObservedInputTokens now respects provider usage semantics (new
  observedPromptTokens helper, mirroring catalog.EffectiveInputCost):
  Anthropic input_tokens is fresh-only so cache tokens are added; OpenAI-shape
  and Gemini prompt counts already include cached tokens, so summing them
  double-counted the prefix in the EV math.
- applyPlannerTelemetry persists the USD/warmth columns only when the EV math
  actually ran (new planner.Decision.EVComputed flag); early-return reasons
  (same_model, no_prior_usage, ...) keep outcome/reason/pin_model but leave
  the EV columns NULL instead of measured-looking zeros.
- 0032 down migration recreates production_request_telemetry with the
  explicit 0028-era column list instead of SELECT *, which froze in
  api_key_id (0031) and broke 0031's down (CI roll-down test). Verified
  locally with a full migrate up / down -all / up roundtrip.
res.PinModel can carry the model of a pin a turn-loop guard dropped after
the lookup (maxed-out output, context overflow, provider eligibility,
image constraint), so a no_pin verdict would persist a pin the planner
never weighed. Gate the column on the reason and document the NULL
semantics on the migration comment.
Main landed 0032_autopay-config while this was in review; golang-migrate
rejects duplicate sequence numbers. Autopay does not touch the telemetry
table or the production view, so the explicit pre-0031 column list in
the down migration is unchanged.
@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

T-Rex T-Rex Logs

What T-Rex did

  • The baseline token grounding was captured before the change from trex-artifacts/observed-token-grounding-01-before.txt.
  • The after token grounding state was captured after the change from trex-artifacts/observed-token-grounding-02-after.txt, including updated EV, token counts, and EVComputed status.
  • A proxy test verified Anthropic and OpenAI/OpenRouter token counts after the change, including observed_prompt_tokens and transcript counts.
  • Evidence artifacts confirmed the attempted before run and the after/tooling blocker check, and noted that no contract mismatch occurred because the changed schema path was not executable in this environment.
  • Planner telemetry tests ran, including the before/after telemetry writeback artifact and null semantics checks for multiple fields.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (2): Last reviewed commit: "chore: retrigger CI (Test/Check Migratio..." | Re-trigger Greptile

aminsamir45 and others added 3 commits July 2, 2026 15:48
Compaction turns are hard-pinned and skip usage writeback, so after a
client compaction the pin still carries the pre-compaction billed size.
Flooring the EV tokens with that stale value would price a collapsed
transcript as a large switch opportunity. A transcript below
compactionMinHistoryMessages is either fresh or just-compacted, so the
floor only applies at or above it.
@aminsamir45 aminsamir45 force-pushed the tier0-planner-observed-ev-telemetry branch from 6c570fd to ab347ab Compare July 2, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant