Skip to content

feat: versioned snapshot refs with MCP auto-pinning#1096

Merged
thymikee merged 4 commits into
mainfrom
feat/versioned-refs
Jul 4, 2026
Merged

feat: versioned snapshot refs with MCP auto-pinning#1096
thymikee merged 4 commits into
mainfrom
feat/versioned-refs

Conversation

@thymikee

@thymikee thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member

Follow-up to #1093 (merged), closing the loop on #1076: the coarse snapshotRefsStale marker warns honestly but cannot say which tree a ref came from. This PR makes ref provenance a first-class, verifiable fact — without spending a single extra token on the artifact agents actually read.

Design

Token-economy constraint (non-negotiable): snapshots are the most token-expensive artifact agents consume, so the tree output stays byte-identical — plain e12 refs on every node, no per-node growth.

  • Generation counter, seeded per lifetime. Each daemon session carries a monotonically increasing snapshotGeneration, advanced wherever the stored tree is replaced: the setSessionSnapshot choke point (feat: warn when @refs outlive the session snapshot they came from #1093's inventory of write sites — selector captures, verify-evidence captures, Android freshness refreshes, replay-heal recaptures, overlay captures) plus the snapshot/diff command path that bypasses it (buildNextSnapshotSession). The first bump of a session lifetime seeds at a random 6-digit base (crypto.randomInt) instead of 1: a reopened session restarts its counter, so per-lifetime counts from 1 would let a stale @e1~s1 pin from the previous lifetime silently read as current. With seeding, cross-lifetime collisions are ~1e-6 — this protection is probabilistic (seeded), not identity-based (documented on the field). Within a lifetime the counter is strictly monotonic, so pinned-vs-current comparisons are exact. No persistence.

  • Generation rides ref-issuing responses ONCE. The same sites where feat: warn when @refs outlive the session snapshot they came from #1093 clears the stale marker — the snapshot command response and find ref outputs — carry the additive refsGeneration: <n> field. One number per response; the digest response view preserves it.

  • Suffix as accepted input. Ref-consuming commands (press/click/fill/longpress/get/wait @ref) accept both forms:

    • plain @e37 — exactly today's behavior, including feat: warn when @refs outlive the session snapshot they came from #1093's coarse warning;
    • pinned @e37~s<n> — generation matches the stored tree → clean (the pin proves the ref's provenance, overruling the coarse marker); generation differs → precise warning: Ref @e37 was minted from snapshot s<minted> but the session tree is now s<current> — re-run snapshot -i.; malformed suffix → INVALID_ARGS with a grammar hint.

    The suffix is split off at the daemon parse boundary, so runtimes, backends, fast paths, and recording only ever see plain refs. normalizeRef strips it kernel-wide as defense in depth (snapshot -s @ref scopes included).

  • MCP auto-pinning with PER-REF provenance (airtight at zero token cost). The MCP server layer sees snapshot/find responses before the model does and keeps Map<pinScope, Map<refBody, generation>>, where the pin scope is state dir + session name (stateDir is a per-tool-call MCP config field, so one server process can serve daemons in different state dirs — same-named sessions there must not cross-pollinate). The update rule is merge-only: every ref present in a ref-issuing response (snapshot: all node refs, digest refs list included; find: the returned ref) is recorded at that response's generation; refs absent from the response keep their older pins — that is the point: after snapshot(s12) → find(s13), a plain @e37 from the pre-find snapshot still forwards as @e37~s12 and warns precisely, instead of being silently re-blessed at s13 (the find-blessing hole a single last-seen generation would recreate). Never-issued refs pass through unpinned (the coarse feat: warn when @refs outlive the session snapshot they came from #1093 warning is the floor); a ref-issuing response without refsGeneration clears the scope (never guess). Memory is bounded: the ~1000 most recently issued pins per scope (live refs are re-merged by every snapshot, so eviction only degrades precision back to the coarse floor). The model never sees or types suffixes (tool text renders from the unpinned input).

  • Replay/scripts ignore pins. Generations are meaningless outside the session that minted them, so replay parsing and session-script writing strip well-formed ~s<n> suffixes and ignore them (documented in stripRecordedRefGeneration).

Compat ladder

Warn-only this release: a stale pinned ref still executes, with the precise warning attached (the geometric offscreen/covered guards keep erroring on detectable drift, unchanged). Tightening stale pinned refs to errors comes in a later release, once auto-pinning clients are established — noted in code comments at the warning resolver.

Registry note: this strengthens runtime-ref/native-ref errorTaxonomy evidence but flips no guarantee cells; the registry is untouched. The STALE_REF hint text is unchanged (it does not reference behavior modified here).

Tests

  • The blessing scenario (both halves): MCP — snapshot(G1, refs incl. e37) → find(e5 at G2) → plain @e37 forwards pinned ~sG1, not G2; @e5 forwards ~sG2. Daemon (provider scenario) — snapshot(g1) → find click replaces the tree (g1+2) → a pre-find pin executes with the precise g1+1→g1+2 warning; a post-find pin is clean.
  • Reopen/reseed: two lifetimes of the same session get different seeds (probabilistic assertion, ~1e-6 residual), and a previous-lifetime pin warns at the handler level even though both lifetimes are one replacement deep.
  • Pin-scope keying: same session name under different stateDirs — no cross-pollination; the original scope keeps pinning.
  • Unit: ~s grammar (valid/malformed/legacy); generation seeding + monotonic bumps at the choke point and snapshot/diff path (seed-agnostic assertions); refsGeneration on snapshot/find responses and preserved by the digest view; pinned-current clean / pinned-stale precise / plain-coarse / malformed-INVALID_ARGS across press, fill, get; daemon target parsers split pins before anything downstream; replay parse + script writer strip pins.
  • MCP: digest-ref merging, never-issued refs unpinned, scope cleared when an issuing response stops carrying a generation, already-suffixed/non-@ refs untouched, tool text never shows suffixes.

Refs #1076

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.5 MB 1.5 MB +4.1 kB
JS gzip 489.8 kB 491.3 kB +1.5 kB
npm tarball 588.9 kB 590.3 kB +1.4 kB
npm unpacked 2.1 MB 2.1 MB +4.1 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 29.3 ms 29.0 ms -0.3 ms
CLI --help 52.7 ms 53.8 ms +1.2 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/app-lifecycle.js +13.2 kB +4.6 kB
dist/src/selector-runtime.js +3.1 kB +788 B
dist/src/internal/daemon.js +607 B +163 B
dist/src/cli-help.js +226 B +98 B
dist/src/snapshot3.js +159 B +86 B

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Reviewed by running (mcp/kernel/session-snapshot/provider suites re-run in the worktree: 50/50). The design constraints all held, and two implementation choices exceed the spec:

  • Exact-match staleness, not older-than: mintedGeneration === currentGeneration is stronger than the sketched design and correctly covers the session-reopen edge — a pinned ~s12 against a freshly reopened session at s1 mismatches and warns, where an older-than check would stay silent on the most confusing variant of staleness.
  • MCP map deletes on generation-less responses ("never guess") — the conservative failure mode for mixed-version daemon/client pairs.

The parse-site audit's split between strip-at-boundary (target parsers) and defense-in-depth (normalizeRef) means downstream code — including the web provider's clickRef and recording — only ever sees plain refs; replay strips and ignores pins with the session-scoped rationale documented. Token constraint verified: tree output unchanged, refsGeneration rides once, the model never sees suffixes. Warn-only ladder commented at the decision site for the future tightening pass.

LGTM pending CI. This closes #1076's deep hole: the boolean stays the floor for unpinned CLI refs, MCP-driven agents get exact provenance at zero prompt-token cost.

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Readiness update: prior review found no actionable blockers, and GitHub checks are now green (20/20). Added ready-for-human.

@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026
thymikee added a commit that referenced this pull request Jul 4, 2026
Review sequencing note on #1097: these lines described #1096 behavior
not yet on main. They move to #1096's branch so docs land with the
implementation and the two PRs merge in any order.
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Discoverability audit (asked out-of-band: "how do agents know about ~s?") — answer per client, with one gap found and fixed:

  • MCP agents: never need to know. The layer records refsGeneration from ref-issuing responses and rewrites plain refs to pinned ones transparently; staleness arrives as the precise minted-vs-current warning with the recovery inline. Mixed daemon/client versions degrade to the coarse floor (no generation seen → never guess).
  • CLI-driving agents (the primary surface): the syntax was undiscoverablerefsGeneration appeared in snapshot responses with nothing explaining it, so plain refs + coarse warning would have been the permanent experience. Fixed in the latest commit: one line in the CLI help's agent-loop guidance teaches the pin form, what refsGeneration is, and the payoff (exact staleness warnings). Warnings deliberately do NOT teach syntax — they fire repeatedly, and teaching belongs in once-read surfaces.
  • Malformed pins fail with a grammar hint, so experimentation self-corrects.

thymikee added a commit that referenced this pull request Jul 4, 2026
* docs: refocus AGENTS.md on principles and gates; index ADRs; extend CONTEXT.md vocabulary

AGENTS.md: replace the routing/command-family prose maps (already
drifting from the code) with pointers to the self-describing,
parity-tested registries; add the two sections agents actually cannot
rediscover cheaply — Principles (one line per incident-backed lesson)
and Enforcement gates (the classify-don't-suppress index); extend the
module-size guidance from raw LOC caps to answer-one-question files,
1:1 test topology mirroring (removing the integration-aggregation
exemption that produced 3,400-line test files), sibling fixture
modules, claim collocation, and boundary-only barrels; record the
dev-loop staleness triple (dist/daemon/adopted-runner), the tsgo
typecheck, the Gatekeeper first-node-exec stall, the DEVICE_IN_USE
signature, and the contention-flake protocol; append the two gate
steps to the new-flag checklist.

CONTEXT.md: vocabulary for the ADR 0011 domain (dispatch path,
guarantee cell, owned waiver, parity table, coverage manifest,
delegation-on-error, ref generation pin) and an architecture paragraph
positioning ADR 0011 as ADR 0008's interaction-semantics counterpart.

docs/adr: flip 0011 to Accepted (implemented through Layer 3) and add
a read-this-when index that names the registries as the living source
of truth over ADR prose.

* docs: defer versioned-ref references to the implementing PR

Review sequencing note on #1097: these lines described #1096 behavior
not yet on main. They move to #1096's branch so docs land with the
implementation and the two PRs merge in any order.
thymikee added 3 commits July 4, 2026 19:15
Refs are positional indexes into the latest stored session tree; #1093's
coarse snapshotRefsStale marker warns honestly but cannot say WHICH tree
a ref came from. Give the session a monotonically increasing
snapshotGeneration, advanced wherever the stored tree is replaced: the
setSessionSnapshot choke point and the snapshot/diff command path that
bypasses it.

Token economy (non-negotiable): the snapshot tree output is unchanged —
plain e12 refs on every node. Ref-issuing responses (snapshot command,
find ref outputs) carry the generation ONCE as the additive
refsGeneration field. Ref-consuming commands (press/click/fill/longpress/
get/wait) accept both forms: plain @E12 keeps today's behavior including
the coarse #1093 warning; pinned @E12~s3 is clean when the generation
matches the stored tree, gets a precise warning naming both generations
when it does not, and a malformed suffix is INVALID_ARGS with a grammar
hint. Warn-only this release — tightening comes later per the compat
ladder.

The MCP layer auto-pins at zero token cost: it sees snapshot/find
responses before the model does, remembers the last refsGeneration per
session name, and rewrites plain @ref tool arguments to the pinned form
before forwarding. The model never sees or types suffixes; with no
remembered generation, refs pass through unpinned (never guess).

Replay parsing and script writing strip and IGNORE pins — generations
are meaningless outside the session that minted them.

Refs #1076
Moved from #1097 per the review sequencing note: the term lands with
the behavior it describes.
MCP agents get pins transparently (auto-pinning), but CLI-driving
agents only ever met the coarse warning — refsGeneration arrived in
snapshot responses with nothing explaining it, making pins an
undiscoverable feature on the primary agent surface. One help line in
the agent loop guidance closes that; warnings stay short (they fire
repeatedly, teaching belongs in once-read surfaces).
@thymikee thymikee force-pushed the feat/versioned-refs branch from 2d32bbd to 21644af Compare July 4, 2026 17:16
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Do not merge — revision in progress. External review found the core invariant broken, verified by reading the code:

  1. The MCP pinning map stores one generation per session, so after snapshot(s12) → find(s13), a plain ref from the pre-find snapshot gets pinned to s13 and passes the staleness check clean — the find-blessing hole is recreated at the pinning layer, defeating the feature's primary justification.
  2. Generations restart at 1 per session lifetime, so a reopened session's ~s1 collides silently with an old ~s1; the reopen protection holds only when counts happen to differ.

Revision underway: per-ref provenance in the MCP map (merge-only updates — refs absent from a response keep their older pins, which is exactly what makes the daemon warn), pin scope keyed beyond bare session name, a random per-lifetime generation seed for probabilistic reopen protection (documented as such), and — critically — the four semantic scenario tests whose absence let this ship: the blessing flow itself, the post-find fresh ref, the reopen collision, and scope isolation. The existing tests covered rewrite mechanics, not cross-issuance semantics; that gap is the lesson.

Review accountability note: my earlier LGTM praised the map's conservative deletion behavior without running the blessing scenario through it — the exact "mechanism exists ≠ semantics hold" failure mode this repo's ADR 0011 review already named once this week. The required tests above are the structural answer.

Review findings on the first cut:

1. The MCP layer kept ONE refsGeneration per session, so after
   snapshot(s12) -> find(s13) a plain @e37 from the pre-find snapshot got
   pinned ~s13 and read as current — recreating the find-blessing hole at
   the pinning layer. Replace it with per-ref provenance:
   Map<pinScope, Map<refBody, generation>>, scoped by state dir + session
   name (stateDir is a per-call MCP config field, so one server process
   can face multiple daemons). Merge-only updates: refs present in a
   ref-issuing response (snapshot nodes, digest refs, the find ref) move
   to its generation; absent refs KEEP their older pins — an old pin on a
   replaced tree is what makes the daemon warn. Never-issued refs pass
   through unpinned; an issuing response without refsGeneration clears
   the scope; memory bounded to the ~1000 most recently issued pins.

2. Generations were per-lifetime counters from 1, so a reopened
   session's ~s1 collided silently with the previous lifetime's. Seed
   the first bump at a random 6-digit base (crypto randomInt):
   cross-lifetime collisions become ~1e-6 — probabilistic (seeded), not
   identity-based, documented on the field. Pin format unchanged;
   within-lifetime comparisons stay exact.

Tests: the MCP blessing scenario (pre-find ref stays pinned to ITS
generation), the daemon half in the provider scenario (find must not
bless a pre-find pin), reopen/reseed at unit + handler level, state-dir
scope isolation, digest-ref merging; generation fixtures made
seed-agnostic (relative bumps, echo the observed seed).

Refs #1076
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Revision landed in 99b256c addressing the external review:

Finding 1 (blocker) — MCP single-generation blessing hole. The pinning layer now keeps per-ref provenance: Map<pinScope, Map<refBody, generation>>, scope = state dir + session name (stateDir is a per-call MCP config field, so one server process can face multiple daemons). Merge-only updates: refs present in a ref-issuing response (snapshot nodes, digest refs list, the find ref) move to that response's generation; absent refs keep their older pins. After snapshot(s12) → find(s13), a plain @e37 forwards as @e37~s12 and warns precisely — asserted by the new MCP blessing test, and by the daemon half in the provider scenario (find must not bless a pre-find pin). Never-issued refs pass through unpinned; an issuing response without refsGeneration clears the scope; memory bounded to the ~1000 most recently issued pins per scope.

Finding 2 — cross-lifetime ~s1 collisions. The first generation bump of a session lifetime now seeds at a random 6-digit base (crypto.randomInt); within a lifetime the counter stays strictly monotonic. Cross-lifetime protection is probabilistic (~1e-6), not identity-based — documented on the field. Pin format unchanged; generation-asserting tests made seed-agnostic (relative bumps, echo the observed seed).

Still warn-only, guarantee registry untouched. PR body updated. Full gate chain green: format, typecheck, lint, fallow audit vs origin/main, 1329 unit / 129 integration tests.

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Re-reviewed the revision — blessing scenario FIRST this time, per the lesson:

  • The blessing test is the real thing and passes: snapshot issues e2/e37 at G1 → find issues only e5 at G2 → plain @e37 forwards pinned ~sG1, with the companion proving @e5 pins to G2, and the provider scenario proving the daemon end (pre-find pin → precise minted-vs-current warning through the real request path). The two-layer composition covers exactly the flow the previous review missed.
  • Merge-only semantics verified in code: absent refs keep their older pins (the property that makes warning possible), unknown refs pass through to the coarse floor, generation-less responses still clear the scope (never guess), and the insertion-order eviction degrades to the coarse floor rather than mispinning.
  • Seed (randomInt(100k,1M) first bump, +1 within lifetime) keeps within-lifetime comparisons exact while making cross-lifetime collisions ~1e-6 — and it's documented as probabilistic rather than claimed as identity, which was the honesty half of finding 2.
  • Scope key includes the per-call stateDir, with the rationale documented at the read site.

Suites re-run here: MCP + provider versioned-refs + session-snapshot, 41/41; the worker's full chained gate covered the rest (1316 unit + 129 integration). Both external-review findings are closed with the semantic scenarios as permanent guards. This now satisfies the middle option of the reviewer's decision frame — warn-only with true per-ref provenance — and my LGTM stands on that basis, with enforcement remaining a dated, measurement-gated decision on #1076.

@thymikee thymikee merged commit 86736a5 into main Jul 4, 2026
20 checks passed
@thymikee thymikee deleted the feat/versioned-refs branch July 4, 2026 19:02
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 19:02 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant