feat: versioned snapshot refs with MCP auto-pinning by thymikee · Pull Request #1096 · callstack/agent-device

thymikee · 2026-07-04T15:24:44Z

Follow-up to #1093 (merged), closing the loop on #1076: the coarse snapshotRefsStale marker warns honestly but cannot say which tree a ref came from. This PR makes ref provenance a first-class, verifiable fact — without spending a single extra token on the artifact agents actually read.

Design

Token-economy constraint (non-negotiable): snapshots are the most token-expensive artifact agents consume, so the tree output stays byte-identical — plain e12 refs on every node, no per-node growth.

Generation counter, seeded per lifetime. Each daemon session carries a monotonically increasing snapshotGeneration, advanced wherever the stored tree is replaced: the setSessionSnapshot choke point (feat: warn when @refs outlive the session snapshot they came from #1093's inventory of write sites — selector captures, verify-evidence captures, Android freshness refreshes, replay-heal recaptures, overlay captures) plus the snapshot/diff command path that bypasses it (buildNextSnapshotSession). The first bump of a session lifetime seeds at a random 6-digit base (crypto.randomInt) instead of 1: a reopened session restarts its counter, so per-lifetime counts from 1 would let a stale @e1~s1 pin from the previous lifetime silently read as current. With seeding, cross-lifetime collisions are ~1e-6 — this protection is probabilistic (seeded), not identity-based (documented on the field). Within a lifetime the counter is strictly monotonic, so pinned-vs-current comparisons are exact. No persistence.
Generation rides ref-issuing responses ONCE. The same sites where feat: warn when @refs outlive the session snapshot they came from #1093 clears the stale marker — the snapshot command response and find ref outputs — carry the additive refsGeneration: <n> field. One number per response; the digest response view preserves it.
Suffix as accepted input. Ref-consuming commands (press/click/fill/longpress/get/wait @ref) accept both forms:
- plain @e37 — exactly today's behavior, including feat: warn when @refs outlive the session snapshot they came from #1093's coarse warning;
- pinned @e37~s<n> — generation matches the stored tree → clean (the pin proves the ref's provenance, overruling the coarse marker); generation differs → precise warning: Ref @e37 was minted from snapshot s<minted> but the session tree is now s<current> — re-run snapshot -i.; malformed suffix → INVALID_ARGS with a grammar hint.
The suffix is split off at the daemon parse boundary, so runtimes, backends, fast paths, and recording only ever see plain refs. normalizeRef strips it kernel-wide as defense in depth (snapshot -s @ref scopes included).
MCP auto-pinning with PER-REF provenance (airtight at zero token cost). The MCP server layer sees snapshot/find responses before the model does and keeps Map<pinScope, Map<refBody, generation>>, where the pin scope is state dir + session name (stateDir is a per-tool-call MCP config field, so one server process can serve daemons in different state dirs — same-named sessions there must not cross-pollinate). The update rule is merge-only: every ref present in a ref-issuing response (snapshot: all node refs, digest refs list included; find: the returned ref) is recorded at that response's generation; refs absent from the response keep their older pins — that is the point: after snapshot(s12) → find(s13), a plain @e37 from the pre-find snapshot still forwards as @e37~s12 and warns precisely, instead of being silently re-blessed at s13 (the find-blessing hole a single last-seen generation would recreate). Never-issued refs pass through unpinned (the coarse feat: warn when @refs outlive the session snapshot they came from #1093 warning is the floor); a ref-issuing response without refsGeneration clears the scope (never guess). Memory is bounded: the ~1000 most recently issued pins per scope (live refs are re-merged by every snapshot, so eviction only degrades precision back to the coarse floor). The model never sees or types suffixes (tool text renders from the unpinned input).
Replay/scripts ignore pins. Generations are meaningless outside the session that minted them, so replay parsing and session-script writing strip well-formed ~s<n> suffixes and ignore them (documented in stripRecordedRefGeneration).

Compat ladder

Warn-only this release: a stale pinned ref still executes, with the precise warning attached (the geometric offscreen/covered guards keep erroring on detectable drift, unchanged). Tightening stale pinned refs to errors comes in a later release, once auto-pinning clients are established — noted in code comments at the warning resolver.

Registry note: this strengthens runtime-ref/native-ref errorTaxonomy evidence but flips no guarantee cells; the registry is untouched. The STALE_REF hint text is unchanged (it does not reference behavior modified here).

Tests

The blessing scenario (both halves): MCP — snapshot(G1, refs incl. e37) → find(e5 at G2) → plain @e37 forwards pinned ~sG1, not G2; @e5 forwards ~sG2. Daemon (provider scenario) — snapshot(g1) → find click replaces the tree (g1+2) → a pre-find pin executes with the precise g1+1→g1+2 warning; a post-find pin is clean.
Reopen/reseed: two lifetimes of the same session get different seeds (probabilistic assertion, ~1e-6 residual), and a previous-lifetime pin warns at the handler level even though both lifetimes are one replacement deep.
Pin-scope keying: same session name under different stateDirs — no cross-pollination; the original scope keeps pinning.
Unit: ~s grammar (valid/malformed/legacy); generation seeding + monotonic bumps at the choke point and snapshot/diff path (seed-agnostic assertions); refsGeneration on snapshot/find responses and preserved by the digest view; pinned-current clean / pinned-stale precise / plain-coarse / malformed-INVALID_ARGS across press, fill, get; daemon target parsers split pins before anything downstream; replay parse + script writer strip pins.
MCP: digest-ref merging, never-issued refs unpinned, scope cleared when an issuing response stops carrying a generation, already-suffixed/non-@ refs untouched, tool text never shows suffixes.

Refs #1076

github-actions · 2026-07-04T15:25:15Z

Size Report

Metric	Base	Current	Diff
JS raw	1.5 MB	1.5 MB	+4.1 kB
JS gzip	489.8 kB	491.3 kB	+1.5 kB
npm tarball	588.9 kB	590.3 kB	+1.4 kB
npm unpacked	2.1 MB	2.1 MB	+4.1 kB

Startup median (7 runs, lower is better):

Scenario	Base	Current	Diff
CLI --version	29.3 ms	29.0 ms	-0.3 ms
CLI --help	52.7 ms	53.8 ms	+1.2 ms

Top changed chunks:

Chunk	Raw diff	Gzip diff
`dist/src/app-lifecycle.js`	+13.2 kB	+4.6 kB
`dist/src/selector-runtime.js`	+3.1 kB	+788 B
`dist/src/internal/daemon.js`	+607 B	+163 B
`dist/src/cli-help.js`	+226 B	+98 B
`dist/src/snapshot3.js`	+159 B	+86 B

thymikee · 2026-07-04T15:26:53Z

Reviewed by running (mcp/kernel/session-snapshot/provider suites re-run in the worktree: 50/50). The design constraints all held, and two implementation choices exceed the spec:

Exact-match staleness, not older-than: mintedGeneration === currentGeneration is stronger than the sketched design and correctly covers the session-reopen edge — a pinned ~s12 against a freshly reopened session at s1 mismatches and warns, where an older-than check would stay silent on the most confusing variant of staleness.
MCP map deletes on generation-less responses ("never guess") — the conservative failure mode for mixed-version daemon/client pairs.

The parse-site audit's split between strip-at-boundary (target parsers) and defense-in-depth (normalizeRef) means downstream code — including the web provider's clickRef and recording — only ever sees plain refs; replay strips and ignores pins with the session-scoped rationale documented. Token constraint verified: tree output unchanged, refsGeneration rides once, the model never sees suffixes. Warn-only ladder commented at the decision site for the future tightening pass.

LGTM pending CI. This closes #1076's deep hole: the boolean stays the floor for unpinned CLI refs, MCP-driven agents get exact provenance at zero prompt-token cost.

thymikee · 2026-07-04T15:45:38Z

Readiness update: prior review found no actionable blockers, and GitHub checks are now green (20/20). Added ready-for-human.

Review sequencing note on #1097: these lines described #1096 behavior not yet on main. They move to #1096's branch so docs land with the implementation and the two PRs merge in any order.

thymikee · 2026-07-04T17:07:40Z

Discoverability audit (asked out-of-band: "how do agents know about ~s?") — answer per client, with one gap found and fixed:

MCP agents: never need to know. The layer records refsGeneration from ref-issuing responses and rewrites plain refs to pinned ones transparently; staleness arrives as the precise minted-vs-current warning with the recovery inline. Mixed daemon/client versions degrade to the coarse floor (no generation seen → never guess).
CLI-driving agents (the primary surface): the syntax was undiscoverable — refsGeneration appeared in snapshot responses with nothing explaining it, so plain refs + coarse warning would have been the permanent experience. Fixed in the latest commit: one line in the CLI help's agent-loop guidance teaches the pin form, what refsGeneration is, and the payoff (exact staleness warnings). Warnings deliberately do NOT teach syntax — they fire repeatedly, and teaching belongs in once-read surfaces.
Malformed pins fail with a grammar hint, so experimentation self-corrects.

* docs: refocus AGENTS.md on principles and gates; index ADRs; extend CONTEXT.md vocabulary AGENTS.md: replace the routing/command-family prose maps (already drifting from the code) with pointers to the self-describing, parity-tested registries; add the two sections agents actually cannot rediscover cheaply — Principles (one line per incident-backed lesson) and Enforcement gates (the classify-don't-suppress index); extend the module-size guidance from raw LOC caps to answer-one-question files, 1:1 test topology mirroring (removing the integration-aggregation exemption that produced 3,400-line test files), sibling fixture modules, claim collocation, and boundary-only barrels; record the dev-loop staleness triple (dist/daemon/adopted-runner), the tsgo typecheck, the Gatekeeper first-node-exec stall, the DEVICE_IN_USE signature, and the contention-flake protocol; append the two gate steps to the new-flag checklist. CONTEXT.md: vocabulary for the ADR 0011 domain (dispatch path, guarantee cell, owned waiver, parity table, coverage manifest, delegation-on-error, ref generation pin) and an architecture paragraph positioning ADR 0011 as ADR 0008's interaction-semantics counterpart. docs/adr: flip 0011 to Accepted (implemented through Layer 3) and add a read-this-when index that names the registries as the living source of truth over ADR prose. * docs: defer versioned-ref references to the implementing PR Review sequencing note on #1097: these lines described #1096 behavior not yet on main. They move to #1096's branch so docs land with the implementation and the two PRs merge in any order.

@E12

Refs are positional indexes into the latest stored session tree; #1093's coarse snapshotRefsStale marker warns honestly but cannot say WHICH tree a ref came from. Give the session a monotonically increasing snapshotGeneration, advanced wherever the stored tree is replaced: the setSessionSnapshot choke point and the snapshot/diff command path that bypasses it. Token economy (non-negotiable): the snapshot tree output is unchanged — plain e12 refs on every node. Ref-issuing responses (snapshot command, find ref outputs) carry the generation ONCE as the additive refsGeneration field. Ref-consuming commands (press/click/fill/longpress/ get/wait) accept both forms: plain @E12 keeps today's behavior including the coarse #1093 warning; pinned @E12~s3 is clean when the generation matches the stored tree, gets a precise warning naming both generations when it does not, and a malformed suffix is INVALID_ARGS with a grammar hint. Warn-only this release — tightening comes later per the compat ladder. The MCP layer auto-pins at zero token cost: it sees snapshot/find responses before the model does, remembers the last refsGeneration per session name, and rewrites plain @ref tool arguments to the pinned form before forwarding. The model never sees or types suffixes; with no remembered generation, refs pass through unpinned (never guess). Replay parsing and script writing strip and IGNORE pins — generations are meaningless outside the session that minted them. Refs #1076

Moved from #1097 per the review sequencing note: the term lands with the behavior it describes.

MCP agents get pins transparently (auto-pinning), but CLI-driving agents only ever met the coarse warning — refsGeneration arrived in snapshot responses with nothing explaining it, making pins an undiscoverable feature on the primary agent surface. One help line in the agent loop guidance closes that; warnings stay short (they fire repeatedly, teaching belongs in once-read surfaces).

thymikee · 2026-07-04T18:28:06Z

Do not merge — revision in progress. External review found the core invariant broken, verified by reading the code:

The MCP pinning map stores one generation per session, so after snapshot(s12) → find(s13), a plain ref from the pre-find snapshot gets pinned to s13 and passes the staleness check clean — the find-blessing hole is recreated at the pinning layer, defeating the feature's primary justification.
Generations restart at 1 per session lifetime, so a reopened session's ~s1 collides silently with an old ~s1; the reopen protection holds only when counts happen to differ.

Revision underway: per-ref provenance in the MCP map (merge-only updates — refs absent from a response keep their older pins, which is exactly what makes the daemon warn), pin scope keyed beyond bare session name, a random per-lifetime generation seed for probabilistic reopen protection (documented as such), and — critically — the four semantic scenario tests whose absence let this ship: the blessing flow itself, the post-find fresh ref, the reopen collision, and scope isolation. The existing tests covered rewrite mechanics, not cross-issuance semantics; that gap is the lesson.

Review accountability note: my earlier LGTM praised the map's conservative deletion behavior without running the blessing scenario through it — the exact "mechanism exists ≠ semantics hold" failure mode this repo's ADR 0011 review already named once this week. The required tests above are the structural answer.

@e37

Review findings on the first cut: 1. The MCP layer kept ONE refsGeneration per session, so after snapshot(s12) -> find(s13) a plain @e37 from the pre-find snapshot got pinned ~s13 and read as current — recreating the find-blessing hole at the pinning layer. Replace it with per-ref provenance: Map<pinScope, Map<refBody, generation>>, scoped by state dir + session name (stateDir is a per-call MCP config field, so one server process can face multiple daemons). Merge-only updates: refs present in a ref-issuing response (snapshot nodes, digest refs, the find ref) move to its generation; absent refs KEEP their older pins — an old pin on a replaced tree is what makes the daemon warn. Never-issued refs pass through unpinned; an issuing response without refsGeneration clears the scope; memory bounded to the ~1000 most recently issued pins. 2. Generations were per-lifetime counters from 1, so a reopened session's ~s1 collided silently with the previous lifetime's. Seed the first bump at a random 6-digit base (crypto randomInt): cross-lifetime collisions become ~1e-6 — probabilistic (seeded), not identity-based, documented on the field. Pin format unchanged; within-lifetime comparisons stay exact. Tests: the MCP blessing scenario (pre-find ref stays pinned to ITS generation), the daemon half in the provider scenario (find must not bless a pre-find pin), reopen/reseed at unit + handler level, state-dir scope isolation, digest-ref merging; generation fixtures made seed-agnostic (relative bumps, echo the observed seed). Refs #1076

thymikee · 2026-07-04T18:42:35Z

Revision landed in 99b256c addressing the external review:

Finding 1 (blocker) — MCP single-generation blessing hole. The pinning layer now keeps per-ref provenance: Map<pinScope, Map<refBody, generation>>, scope = state dir + session name (stateDir is a per-call MCP config field, so one server process can face multiple daemons). Merge-only updates: refs present in a ref-issuing response (snapshot nodes, digest refs list, the find ref) move to that response's generation; absent refs keep their older pins. After snapshot(s12) → find(s13), a plain @e37 forwards as @e37~s12 and warns precisely — asserted by the new MCP blessing test, and by the daemon half in the provider scenario (find must not bless a pre-find pin). Never-issued refs pass through unpinned; an issuing response without refsGeneration clears the scope; memory bounded to the ~1000 most recently issued pins per scope.

Finding 2 — cross-lifetime ~s1 collisions. The first generation bump of a session lifetime now seeds at a random 6-digit base (crypto.randomInt); within a lifetime the counter stays strictly monotonic. Cross-lifetime protection is probabilistic (~1e-6), not identity-based — documented on the field. Pin format unchanged; generation-asserting tests made seed-agnostic (relative bumps, echo the observed seed).

Still warn-only, guarantee registry untouched. PR body updated. Full gate chain green: format, typecheck, lint, fallow audit vs origin/main, 1329 unit / 129 integration tests.

thymikee · 2026-07-04T18:44:35Z

Re-reviewed the revision — blessing scenario FIRST this time, per the lesson:

The blessing test is the real thing and passes: snapshot issues e2/e37 at G1 → find issues only e5 at G2 → plain @e37 forwards pinned ~sG1, with the companion proving @e5 pins to G2, and the provider scenario proving the daemon end (pre-find pin → precise minted-vs-current warning through the real request path). The two-layer composition covers exactly the flow the previous review missed.
Merge-only semantics verified in code: absent refs keep their older pins (the property that makes warning possible), unknown refs pass through to the coarse floor, generation-less responses still clear the scope (never guess), and the insertion-order eviction degrades to the coarse floor rather than mispinning.
Seed (randomInt(100k,1M) first bump, +1 within lifetime) keeps within-lifetime comparisons exact while making cross-lifetime collisions ~1e-6 — and it's documented as probabilistic rather than claimed as identity, which was the honesty half of finding 2.
Scope key includes the per-call stateDir, with the rationale documented at the read site.

Suites re-run here: MCP + provider versioned-refs + session-snapshot, 41/41; the worker's full chained gate covered the rest (1316 unit + 129 integration). Both external-review findings are closed with the semantic scenarios as permanent guards. This now satisfies the middle option of the reviewer's decision frame — warn-only with true per-ref provenance — and my LGTM stands on that basis, with enforcement remaining a dated, measurement-gated decision on #1076.

github-actions · 2026-07-04T19:02:25Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 19:02 UTC

thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026

thymikee mentioned this pull request Jul 4, 2026

docs: refocus AGENTS.md on principles and enforcement gates #1097

Merged

thymikee added 3 commits July 4, 2026 19:15

docs: CONTEXT.md vocabulary for ref generation pins

68e6cba

Moved from #1097 per the review sequencing note: the term lands with the behavior it describes.

thymikee force-pushed the feat/versioned-refs branch from 2d32bbd to 21644af Compare July 4, 2026 17:16

thymikee mentioned this pull request Jul 4, 2026

press/click/fill --settle: settled observation in the interaction response #1101

Open

thymikee mentioned this pull request Jul 4, 2026

Stale @refs silently resolve to the wrong node after the session tree changes #1076

Closed

thymikee merged commit 86736a5 into main Jul 4, 2026
20 checks passed

thymikee deleted the feat/versioned-refs branch July 4, 2026 19:02

thymikee mentioned this pull request Jul 4, 2026

feat: --settle returns the settled diff in the interaction response (#1101) #1106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: versioned snapshot refs with MCP auto-pinning#1096

feat: versioned snapshot refs with MCP auto-pinning#1096
thymikee merged 4 commits into
mainfrom
feat/versioned-refs

thymikee commented Jul 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thymikee commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Compat ladder

Tests

Uh oh!

github-actions Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size Report

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thymikee commented Jul 4, 2026 •

edited

Loading

github-actions Bot commented Jul 4, 2026 •

edited

Loading