feat: versioned snapshot refs with MCP auto-pinning#1096
Conversation
Size Report
Startup median (7 runs, lower is better):
Top changed chunks:
|
|
Reviewed by running (mcp/kernel/session-snapshot/provider suites re-run in the worktree: 50/50). The design constraints all held, and two implementation choices exceed the spec:
The parse-site audit's split between strip-at-boundary (target parsers) and defense-in-depth ( LGTM pending CI. This closes #1076's deep hole: the boolean stays the floor for unpinned CLI refs, MCP-driven agents get exact provenance at zero prompt-token cost. |
|
Readiness update: prior review found no actionable blockers, and GitHub checks are now green (20/20). Added ready-for-human. |
|
Discoverability audit (asked out-of-band: "how do agents know about ~s?") — answer per client, with one gap found and fixed:
|
* docs: refocus AGENTS.md on principles and gates; index ADRs; extend CONTEXT.md vocabulary AGENTS.md: replace the routing/command-family prose maps (already drifting from the code) with pointers to the self-describing, parity-tested registries; add the two sections agents actually cannot rediscover cheaply — Principles (one line per incident-backed lesson) and Enforcement gates (the classify-don't-suppress index); extend the module-size guidance from raw LOC caps to answer-one-question files, 1:1 test topology mirroring (removing the integration-aggregation exemption that produced 3,400-line test files), sibling fixture modules, claim collocation, and boundary-only barrels; record the dev-loop staleness triple (dist/daemon/adopted-runner), the tsgo typecheck, the Gatekeeper first-node-exec stall, the DEVICE_IN_USE signature, and the contention-flake protocol; append the two gate steps to the new-flag checklist. CONTEXT.md: vocabulary for the ADR 0011 domain (dispatch path, guarantee cell, owned waiver, parity table, coverage manifest, delegation-on-error, ref generation pin) and an architecture paragraph positioning ADR 0011 as ADR 0008's interaction-semantics counterpart. docs/adr: flip 0011 to Accepted (implemented through Layer 3) and add a read-this-when index that names the registries as the living source of truth over ADR prose. * docs: defer versioned-ref references to the implementing PR Review sequencing note on #1097: these lines described #1096 behavior not yet on main. They move to #1096's branch so docs land with the implementation and the two PRs merge in any order.
Refs are positional indexes into the latest stored session tree; #1093's coarse snapshotRefsStale marker warns honestly but cannot say WHICH tree a ref came from. Give the session a monotonically increasing snapshotGeneration, advanced wherever the stored tree is replaced: the setSessionSnapshot choke point and the snapshot/diff command path that bypasses it. Token economy (non-negotiable): the snapshot tree output is unchanged — plain e12 refs on every node. Ref-issuing responses (snapshot command, find ref outputs) carry the generation ONCE as the additive refsGeneration field. Ref-consuming commands (press/click/fill/longpress/ get/wait) accept both forms: plain @E12 keeps today's behavior including the coarse #1093 warning; pinned @E12~s3 is clean when the generation matches the stored tree, gets a precise warning naming both generations when it does not, and a malformed suffix is INVALID_ARGS with a grammar hint. Warn-only this release — tightening comes later per the compat ladder. The MCP layer auto-pins at zero token cost: it sees snapshot/find responses before the model does, remembers the last refsGeneration per session name, and rewrites plain @ref tool arguments to the pinned form before forwarding. The model never sees or types suffixes; with no remembered generation, refs pass through unpinned (never guess). Replay parsing and script writing strip and IGNORE pins — generations are meaningless outside the session that minted them. Refs #1076
Moved from #1097 per the review sequencing note: the term lands with the behavior it describes.
MCP agents get pins transparently (auto-pinning), but CLI-driving agents only ever met the coarse warning — refsGeneration arrived in snapshot responses with nothing explaining it, making pins an undiscoverable feature on the primary agent surface. One help line in the agent loop guidance closes that; warnings stay short (they fire repeatedly, teaching belongs in once-read surfaces).
2d32bbd to
21644af
Compare
|
Do not merge — revision in progress. External review found the core invariant broken, verified by reading the code:
Revision underway: per-ref provenance in the MCP map (merge-only updates — refs absent from a response keep their older pins, which is exactly what makes the daemon warn), pin scope keyed beyond bare session name, a random per-lifetime generation seed for probabilistic reopen protection (documented as such), and — critically — the four semantic scenario tests whose absence let this ship: the blessing flow itself, the post-find fresh ref, the reopen collision, and scope isolation. The existing tests covered rewrite mechanics, not cross-issuance semantics; that gap is the lesson. Review accountability note: my earlier LGTM praised the map's conservative deletion behavior without running the blessing scenario through it — the exact "mechanism exists ≠ semantics hold" failure mode this repo's ADR 0011 review already named once this week. The required tests above are the structural answer. |
Review findings on the first cut: 1. The MCP layer kept ONE refsGeneration per session, so after snapshot(s12) -> find(s13) a plain @e37 from the pre-find snapshot got pinned ~s13 and read as current — recreating the find-blessing hole at the pinning layer. Replace it with per-ref provenance: Map<pinScope, Map<refBody, generation>>, scoped by state dir + session name (stateDir is a per-call MCP config field, so one server process can face multiple daemons). Merge-only updates: refs present in a ref-issuing response (snapshot nodes, digest refs, the find ref) move to its generation; absent refs KEEP their older pins — an old pin on a replaced tree is what makes the daemon warn. Never-issued refs pass through unpinned; an issuing response without refsGeneration clears the scope; memory bounded to the ~1000 most recently issued pins. 2. Generations were per-lifetime counters from 1, so a reopened session's ~s1 collided silently with the previous lifetime's. Seed the first bump at a random 6-digit base (crypto randomInt): cross-lifetime collisions become ~1e-6 — probabilistic (seeded), not identity-based, documented on the field. Pin format unchanged; within-lifetime comparisons stay exact. Tests: the MCP blessing scenario (pre-find ref stays pinned to ITS generation), the daemon half in the provider scenario (find must not bless a pre-find pin), reopen/reseed at unit + handler level, state-dir scope isolation, digest-ref merging; generation fixtures made seed-agnostic (relative bumps, echo the observed seed). Refs #1076
|
Revision landed in 99b256c addressing the external review: Finding 1 (blocker) — MCP single-generation blessing hole. The pinning layer now keeps per-ref provenance: Finding 2 — cross-lifetime Still warn-only, guarantee registry untouched. PR body updated. Full gate chain green: format, typecheck, lint, fallow audit vs origin/main, 1329 unit / 129 integration tests. |
|
Re-reviewed the revision — blessing scenario FIRST this time, per the lesson:
Suites re-run here: MCP + provider versioned-refs + session-snapshot, 41/41; the worker's full chained gate covered the rest (1316 unit + 129 integration). Both external-review findings are closed with the semantic scenarios as permanent guards. This now satisfies the middle option of the reviewer's decision frame — warn-only with true per-ref provenance — and my LGTM stands on that basis, with enforcement remaining a dated, measurement-gated decision on #1076. |
|
Follow-up to #1093 (merged), closing the loop on #1076: the coarse
snapshotRefsStalemarker warns honestly but cannot say which tree a ref came from. This PR makes ref provenance a first-class, verifiable fact — without spending a single extra token on the artifact agents actually read.Design
Token-economy constraint (non-negotiable): snapshots are the most token-expensive artifact agents consume, so the tree output stays byte-identical — plain
e12refs on every node, no per-node growth.Generation counter, seeded per lifetime. Each daemon session carries a monotonically increasing
snapshotGeneration, advanced wherever the stored tree is replaced: thesetSessionSnapshotchoke point (feat: warn when @refs outlive the session snapshot they came from #1093's inventory of write sites — selector captures, verify-evidence captures, Android freshness refreshes, replay-heal recaptures, overlay captures) plus the snapshot/diff command path that bypasses it (buildNextSnapshotSession). The first bump of a session lifetime seeds at a random 6-digit base (crypto.randomInt) instead of 1: a reopened session restarts its counter, so per-lifetime counts from 1 would let a stale@e1~s1pin from the previous lifetime silently read as current. With seeding, cross-lifetime collisions are ~1e-6 — this protection is probabilistic (seeded), not identity-based (documented on the field). Within a lifetime the counter is strictly monotonic, so pinned-vs-current comparisons are exact. No persistence.Generation rides ref-issuing responses ONCE. The same sites where feat: warn when @refs outlive the session snapshot they came from #1093 clears the stale marker — the snapshot command response and find ref outputs — carry the additive
refsGeneration: <n>field. One number per response; the digest response view preserves it.Suffix as accepted input. Ref-consuming commands (
press/click/fill/longpress/get/wait @ref) accept both forms:@e37— exactly today's behavior, including feat: warn when @refs outlive the session snapshot they came from #1093's coarse warning;@e37~s<n>— generation matches the stored tree → clean (the pin proves the ref's provenance, overruling the coarse marker); generation differs → precise warning:Ref @e37 was minted from snapshot s<minted> but the session tree is now s<current> — re-run snapshot -i.; malformed suffix →INVALID_ARGSwith a grammar hint.The suffix is split off at the daemon parse boundary, so runtimes, backends, fast paths, and recording only ever see plain refs.
normalizeRefstrips it kernel-wide as defense in depth (snapshot -s @refscopes included).MCP auto-pinning with PER-REF provenance (airtight at zero token cost). The MCP server layer sees snapshot/find responses before the model does and keeps
Map<pinScope, Map<refBody, generation>>, where the pin scope is state dir + session name (stateDir is a per-tool-call MCP config field, so one server process can serve daemons in different state dirs — same-named sessions there must not cross-pollinate). The update rule is merge-only: every ref present in a ref-issuing response (snapshot: all node refs, digest refs list included; find: the returned ref) is recorded at that response's generation; refs absent from the response keep their older pins — that is the point: after snapshot(s12) → find(s13), a plain@e37from the pre-find snapshot still forwards as@e37~s12and warns precisely, instead of being silently re-blessed at s13 (the find-blessing hole a single last-seen generation would recreate). Never-issued refs pass through unpinned (the coarse feat: warn when @refs outlive the session snapshot they came from #1093 warning is the floor); a ref-issuing response withoutrefsGenerationclears the scope (never guess). Memory is bounded: the ~1000 most recently issued pins per scope (live refs are re-merged by every snapshot, so eviction only degrades precision back to the coarse floor). The model never sees or types suffixes (tool text renders from the unpinned input).Replay/scripts ignore pins. Generations are meaningless outside the session that minted them, so replay parsing and session-script writing strip well-formed
~s<n>suffixes and ignore them (documented instripRecordedRefGeneration).Compat ladder
Warn-only this release: a stale pinned ref still executes, with the precise warning attached (the geometric offscreen/covered guards keep erroring on detectable drift, unchanged). Tightening stale pinned refs to errors comes in a later release, once auto-pinning clients are established — noted in code comments at the warning resolver.
Registry note: this strengthens runtime-ref/native-ref errorTaxonomy evidence but flips no guarantee cells; the registry is untouched. The
STALE_REFhint text is unchanged (it does not reference behavior modified here).Tests
@e37forwards pinned~sG1, not G2;@e5forwards~sG2. Daemon (provider scenario) — snapshot(g1) → find click replaces the tree (g1+2) → a pre-find pin executes with the precise g1+1→g1+2 warning; a post-find pin is clean.stateDirs — no cross-pollination; the original scope keeps pinning.~sgrammar (valid/malformed/legacy); generation seeding + monotonic bumps at the choke point and snapshot/diff path (seed-agnostic assertions);refsGenerationon snapshot/find responses and preserved by the digest view; pinned-current clean / pinned-stale precise / plain-coarse / malformed-INVALID_ARGS across press, fill, get; daemon target parsers split pins before anything downstream; replay parse + script writer strip pins.@refs untouched, tool text never shows suffixes.Refs #1076