Skip to content

Release v1.0.0 — Live Sessions, Sharing & Skill Packs#154

Open
DIodide wants to merge 100 commits into
mainfrom
staging
Open

Release v1.0.0 — Live Sessions, Sharing & Skill Packs#154
DIodide wants to merge 100 commits into
mainfrom
staging

Conversation

@DIodide

@DIodide DIodide commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Promotes stagingmain as the v1.0.0 major release. Covers the entire unreleased span since the last tagged release (v0.2.1 → v1.0.0, PRs #81#153). Full notes in CHANGELOG.md (entry added in this PR).

⚠️ Do not merge yet — opened for review (Ibraheem will run the code-review skill first). See the pre-merge devops checklist below.

Release highlights

  • Live session following — in-flight agent output fans out to every viewer of a conversation (other tabs, /workspaces, shared read-only pages) via a Redis Streams bus; interactive prompts, owner-only infra signals, and cost are never relayed to followers.
  • Rewind & fork — rewind / rewind-and-fork under any user message, mid-message rewind at step "seams" (toggle in Settings → Display), compaction summaries + clone-from-summary.
  • Chat & harness sharing + collaboration — shareable links (read-only view + fork), viewer/editor roles, real-time editor collaboration that runs on the owner's harness/credentials, a shared-page side panel, harness share-by-link/email + Clone, and a Manage Sharing page.
  • Skill Packs — reusable skill bundles with optional AGENTS.md/CLAUDE.md sandbox context, a full-page catalog/editor, and one-click owner/repo import (GSAP, Anthropic, Superpowers, Vercel templates).
  • Per-workspace agent sandboxes — a harness's ACP agent runs inside the workspace's own (reused) sandbox; default workspace per account; drag-and-drop reordering; bounded sandbox lifetime with self-heal.
  • Workspace credentials — named secrets (e.g. GITHUB_TOKEN) injected as env vars into a workspace's sandboxes; write-only, AES-256-GCM, Manage Credentials view.
  • Usage — per-credential agent usage by agent, authoritative cost + cache-token accounting, real Claude session/weekly rate-limit surfacing, honest budget labels.
  • Claude Code config — pre-session Mode/Model/Effort, harness-level defaults, opus[1m], Bypass Permissions mode, effort slider + background-agents panel, live workflow/subagent observability.
  • Reliability & integrity — send-path/stream-lifecycle hardening, faithful saved content, resilient auth guards (no more sign-in redirect loop), stale-chunk auto-recovery, list/fork index-backfill tolerance.

Plus a Security section: strict redacted projections for shared content, credential plaintext that never leaves the server, reserved-name protection, host-pinned skill-repo import.

Versioning

  • Cut as git tag v1.0.0 on the main merge commit after this PR is approved + merged (consistent with v0.2.0 / v0.2.1). Both package.json files already read 1.0.0; the tag is the release marker.
  • GitHub release notes = the new CHANGELOG.md entry.

Pre-merge devops checklist (NOT in the changelog — infra only)

These set up the Redis Streams backend for live-following on prod. The bus is fail-soft: with REDIS_URL unset, live-following silently no-ops and single-instance streaming works exactly as today — so the release is safe to merge before this is done; Redis only gates whether live-following actually fans out on prod.

  • Provision a Redis reachable from the prod FastAPI host (ElastiCache or a Redis service on the prod instance); ensure the security group allows the FastAPI host.
  • Set REDIS_URL in the prod FastAPI env (/opt/harness-api/.env) and restart harness-api.
  • Verify /api/chat/follow fan-out works on prod (owner tab + second viewer).
  • Any other prod env parity vs. staging (sweep before cutting the tag).

Scope notes

  • Changelog is user-facing only by request — Redis/AWS prod setup, CI, deploy plumbing, internal refactors, and test-only changes are intentionally excluded (verified by an adversarial coverage/leakage pass).

DIodide and others added 30 commits June 19, 2026 19:12
In-flight tokens previously rendered only in the tab that started the turn
(local React state). Now every chat/agent turn ALSO tees its display events
into a per-conversation Redis Stream, and any passive viewer (the owner's other
tabs, a sharee watching, a late joiner) opens GET /api/chat/follow to replay the
current turn + tail it live — rendering through the same ChatMessages props.

- stream_bus.py: XADD display events (token/thinking/tool_call/tool_result/done/
  error/plan/status...) with MAXLEN trim; interactive events (permission/
  question) are NOT relayed. follow() replays from the live turn marker then
  BLOCK-tails. FAIL-SOFT: when REDIS_URL is unset every fn is a no-op and turns
  stream only to the initiator exactly as before (no regression, safe deploy).
- chat.py/agents.py: wrap the SSE generators with stream_bus.tee; new GET
  /api/chat/follow authorized like a shared read (owner JWT OR editor/viewer
  grant token, incl. anonymous-with-token via optional auth).
- web: useFollowStream hook (isolated reducer faithful to the provider) wired
  into chat/index (owner multi-tab + owner watching a sharee) and the share page
  (editor + read-only viewer). Initiator keeps its token-perfect local stream;
  never both. Solves owner<->sharee AND multi-tab same-account.

Shared Redis also makes fan-out work across multiple FastAPI workers/boxes.
FastAPI 236, web 181; types/biome clean.
MUST-FIX:
- M2 (crash): hoist useFollowStream above chat/index's early return — a hook
  after an early return crashed /chat on every load (rules-of-hooks).
- M1/S1/S2 (secret leak): drop mcp_error + sandbox_status + agent_usage from
  FOLLOW_EVENTS (owner MCP url / sandbox id / agent cost) and strip usage/cost
  from the relayed 'done' frame — followers see only the transcript they can
  already see persisted.
- M3 (handoff): prefer follow else local (covers the post-done window so the
  finished bubble never flickers); onStreamSynced now clears local state AND the
  follow bubble + drains the queue (was orphaning local state + stalling queued
  messages every turn).
- M4 (liveness): time-box each bus op (1s) with a per-turn breaker so a hung
  Redis can never stall the initiator's turn.

SHOULD-FIX:
- S3: /follow now 403s a fully anonymous caller with no token (don't reach the
  Convex dev fail-open).
- S4: follow() leads every replay with a synthetic turn_start so a reconnect
  after MAXLEN trim can't double-append.
- S5: max_connections=256 for concurrent follower XREADs. S6: 6h stream TTL so a
  long turn can't self-expire its key.

FastAPI 238, web 181; types/biome clean.
feat(stream): live token fan-out to all viewers via Redis Streams
Two operations under every user message, on both the normal Harness
(OpenRouter) and Claude Code (ACP), via one thin generic seam:

- Rewind (in place): truncate the thread to that user message — keep it,
  delete everything below. Destructive, so it confirms inline on first
  click; does NOT auto-stream (re-sending is the user's choice).
- Rewind & fork: branch a new conversation at that point, original intact.

~80% reuse:
- messages.removeAfter: new EXCLUSIVE truncation (keeps the target) vs the
  existing inclusive removeFrom used by regenerate. Same owner/editor-token
  auth. + tests.
- Rewind-and-fork reuses conversations.fork (already copies [0..target]).
- Claude Code session rewind reuses the existing forget-and-recreate path:
  resetAgentSessionForRewind() forgets the cached session so the next prompt
  opens a fresh one that re-seeds from the truncated history. No new gateway
  endpoint. No-op for the stateless OpenRouter loop. This one seam is where
  future per-agent rewind handling (Codex/Cursor) lands.
- MessageActions/ChatMessages gain onRewind/onRewindFork (every user message,
  owner-only for v1); both routes wire them.

v1 scope notes: owner-only (collaborator rewind via token = follow-up);
rewind is conversational-only (the Daytona sandbox filesystem is not rewound).
Adversarial review (1 critical, 2 major, 1 minor) — all fixed:

- CRITICAL: client-only forgetAgentSession did NOT rewind Claude Code — the
  gateway dedups sessions by (user, conversation, agent) and returns the warm
  session (non-empty transcript → seed-from-history skipped; ACP session in
  the sandbox still holds the rewound turns). So the agent kept acting on
  deleted turns. Fixed with a real server reset: AgentSessionManager
  .reset_conversation + POST /sessions/by-conversation/{id}/reset tears the
  session down (parking the runtime warm) so the next prompt opens a fresh
  session that re-seeds from the truncated history.
- MAJOR + MINOR (agent-key): the reset is keyed by CONVERSATION, not agent,
  so a session-only agent override can't be missed; the client also forgets
  ALL cached sessions for the conversation (forgetAllAgentSessions).
- MAJOR: removeAfter now deletes by POSITION in the canonical by_conversation
  order (findIndex + slice) instead of a `>` _creationTime compare, so
  same-millisecond siblings can't leak.

resetAgentSessionForRewind is now async (server call + cache clear); both
routes await it with the Clerk token. Tests: reset_conversation (gateway),
position-based removeAfter (convex).
…ed into CI

Real-Redis integration tests (skip cleanly when no Redis; CI provides one):
- fan-out to two followers (byte-identical), live tail, late-joiner replay,
  cross-conversation isolation.
- the FOLLOW_EVENTS allowlist (owner mcp/sandbox/cost + interactive events never
  reach a follower) and done.usage stripping.
- tee() fail-soft: a RAISING bus still delivers the complete turn to the
  initiator; a HUNG bus is bounded by the 1s breaker (no per-event stall).
- MAXLEN-trim replay still leads with the synthetic turn_start reset.
- /api/chat/follow auth: anonymous-no-token + invalid-bearer → 403; an
  authorized viewer gets 200 text/event-stream with frames relayed.

scripts/stream_smoke.py: a runnable fan-out demo (two watchers + a late joiner)
that prints tokens streaming down and asserts everyone reconstructs the message.

CI: the FastAPI job now runs a redis:6.2-alpine service (matches prod redis6.2)
with REDIS_URL_TEST so the integration tests run there.

Addresses the test-quality review (circuit-breaker + SSE-relay coverage gaps,
hardened the one timing-flaky test to an Event gate). FastAPI 252, 5x stable.
test(stream): Redis fan-out integration tests + smoke client + CI
The workspaces-first chat view (the primary UX) renders its own ChatMessages but
never subscribed to the fan-out, so a passive tab there showed only persisted
messages, not the in-flight stream. Mirror the /chat wiring: hoist useFollowStream
above the early return, prefer the follow feed (else local) for the active
conversation's stream state, and clear both on sync. All three chat-rendering
routes (chat, workspaces, share) now follow.
fix(stream): wire live-follow into the /workspaces route
…sponse (re-review)

Re-review (0 critical, 1 major, 2 minor fixed; 2 minor deferred):

- MAJOR: reset_conversation could tear down a session mid-turn. Now only
  resets IDLE sessions (status=="ready", turn_guard==0, lock free) — the same
  guard the reaper/_claim_parked use. The client already blocks rewind while
  streaming; this is the server-side backstop.
- MINOR: resetServerAgentSessions now checks response.ok (api() resolves on
  4xx/5xx) and retries once, returning success — a swallowed reset error would
  otherwise leave the stale session reusable with the rewound turns.

Deferred (documented, narrow): a sub-second send-during-rewind TOCTOU race
(needs composer-send gating); the replay preamble's harness-switch wording
shown for a rewind (cosmetic).
Rewind (in-place truncate) + rewind-and-fork (branch) for normal Harness + Claude Code, via a server-side conversation reset. Adversarially reviewed twice; all critical/major findings fixed.
…es parity

The rewind + fork-at-message handlers (handleRewind, fork-at-message, the
removeAfter/fork mutation bindings, the in-flight guard, the agent-session
reset) were duplicated verbatim in both routes. Extract them into
hooks/use-rewind.ts so the two routes share one implementation and stay at
parity. The hook reads the stream context + Clerk token itself; callers pass
only the active conversation id + a navigate callback. forkAtMessage now
backs both the assistant "Fork" and the user "Rewind & fork" (one handler).
No behavior change.
…ting (#118)

Owner-redirect, an undeletable Default workspace, and a fork workspace
picker — so shared chats open in the right place under workspaces mode.

A. Owner opening their OWN share link now respects workspacesMode: it
   lands in /workspaces?workspaceId&convoId (the conversation's own
   workspace) instead of the deprecated /chat?convoId. Legacy
   workspace-less conversations are adopted into the Default workspace
   (conversations.ensureInWorkspace, which also re-stamps the messages'
   workspaceId so workspace-scoped content search finds them). Falls back
   to /chat for users whose workspacesMode !== "workspaces".

B. Every account gets an undeletable "Default" workspace (isDefault flag).
   getOrCreateDefaultWorkspace returns the flagged default, else adopts an
   existing "Default"-named workspace, else creates a fresh one — it never
   force-flags an arbitrarily-named workspace, so existing custom
   workspaces stay deletable. workspaces.remove rejects the Default; the
   management UI hides its delete button. Its harness/sandbox stay editable.

C. Forking a shared chat opens a workspace picker; the sharee chooses
   which of THEIR workspaces to fork into (default = their Default).
   forkSharedConversation validates the chosen workspace belongs to the
   actor and stamps the fork + copied messages with it (owner usage/cost
   stripped). Landing is mode-aware (/workspaces vs /chat).

The /workspaces route gained a convoId deep-link param. The deep-linked
conversation is now mirrored into the URL (so refresh/bookmark reopens it)
and is no longer wiped by the workspace-init effect while the workspaces
list is still loading.

Includes Convex tests for the Default lifecycle, message re-stamping, and
fork workspace placement. Addresses findings from an adversarial review of
the diff.
Extract shared useRewind hook so /chat and /workspaces stay at rewind/fork parity. No behavior change; 185 tests pass.
… + /workspaces

Rewind / rewind-&-fork into the MIDDLE of an assistant message at part
boundaries (text / reasoning / tool_call), not just whole messages.

Backend:
- messages.truncatePart: keep the first N flat parts of an assistant
  message, recompute `content` from kept text parts (mirrors the gateway's
  "".join(text_parts)), clear legacy reasoning/toolCalls, delete every later
  message — patch + delete in one transaction.
- conversations.fork gains truncateLastPartCount: copy the boundary
  assistant message TRUNCATED. Non-destructive (original untouched, no live
  session to reset) — the safe primary for mid-message.
- Shared contentFromParts helper so the two paths can't drift.

Frontend:
- message-seams.ts (pure, unit-tested): seam geometry over the FLAT parts
  array, mirroring organizeParts' top-level numbering so a kept tool call
  keeps its whole subagent subtree (no orphans). summarizeDropped flags
  whether the agent's context actually changes (text dropped) vs view-only
  (only trailing reasoning/tool_call dropped).
- AssistantParts + Seam: hover-revealed seams in the gaps between rendered
  blocks; hovering dims the blocks below (preview); clicking opens an inline
  confirm with Rewind & fork (primary) / Rewind (destructive) / Cancel and
  honest consequence copy.
- useRewind gains handleRewindToPart (in-place, resets agent) and forkAtPart;
  wired through ChatMessages into both /chat and /workspaces.

Tests: 7 convex (truncatePart + fork-truncate), 9 frontend (seam geometry).
Adversarial review found 1 critical + 3 major + 4 minor. Fixes:

CRITICAL/MAJOR — degenerate seam (keep === parts.length, e.g. an interleaved
background subagent whose child is the last flat part) was rendered but threw
in-place (truncatePart out-of-range) and silently no-op'd on fork. Now gate
seams on hasRenderableAfter(parts, keep) — only show a seam that actually drops
a visible block; never the last block, a no-op cut, or an empty-parts-only cut.
Also wrap handleRewindToPart/forkAtPart in try/catch with an error toast.

MAJOR — "agent's context unchanged" copy could lie. (a) Seams now render ONLY
on the last message in the thread, so truncation never silently deletes later
turns. (b) contentChanges is now computed by COMPARISON (recomputed content !==
stored content) instead of "did a text part drop", so it is honest even on the
default OpenRouter path, which stores only the last agentic iteration's text
while parts[] holds one text part per iteration (chat.py). Added a frontend
contentFromParts mirror + a convex test for the divergent-content case.

MINOR — describeDropped pluralizes on != 1 ("0 blocks"); empty/non-rendering
trailing parts no longer offer a seam (hasRenderableAfter); inline confirm now
moves focus to its primary button on open and dismisses on Escape; open-confirm
identity + dim preview lifted into AssistantParts (one open at a time, dim pins
to the open seam). Doc comment in messageParts.ts clarifies the OpenRouter
divergence.

NOTE (follow-up, not in this PR): chat.py persists last-iteration-only content
for multi-iteration OpenRouter turns — a pre-existing fidelity gap independent
of rewind. Recommend making contentFromParts the single source of truth there.

Tests: convex 40 (+1), frontend 200 (+6 seam). tsc baseline, biome clean.
…change

Verification re-review found a regression from lifting open/hover state into
AssistantParts: clicking a seam unmounts the trigger button while the cursor is
over it, so its onMouseLeave never fires and hoverIdx stays pinned — after
onClose (which only cleared openIdx) the dim fell back to hoverIdx and stuck
after every Cancel/Fork/Rewind.

- Clear hoverIdx on both onOpen and onClose so the dim never sticks.
- Clamp activeKeep against topCount and reset openIdx/hoverIdx when parts[]
  change in place (background subagent append), so block-index state can't
  outlive the geometry it indexes.
Root cause of the deep-link bug fixed earlier: the /workspaces route had no
single owner for selection state. activeConvoId/activeWorkspaceId were two bare
useState cells written by ~6 independent effects + 2 referee refs, with the
precedence rule (URL deep-link > explicit selection > most-recent restore)
living only in prose comments and emergent effect ordering. That tangle is how
a load-order race shipped (a URL-seeded conversation got nulled while
workspaces.list was still loading). It was a regression-via-reimplementation —
the deprecated /chat route already had the safe pattern, but /workspaces
reimplemented the contract divergently.

Extract the tangle into cohesive, unit-tested units:

- hooks/use-workspace-selection.ts — owns activeWorkspaceId/activeConvoId,
  workspace resolution, and selectWorkspace, with the precedence made explicit
  and documented. (8 tests, incl. the original-bug regression: a URL-seeded
  conversation survives workspaces.list loading.)
- hooks/use-recent-chat-restore.ts — owns the restore arm/apply handshake.
  Adds cancelRestore() so an explicit "New chat" cancels an armed-but-unapplied
  restore (fixes a real latent bug: a just-dismissed chat could silently
  reopen), plus a defensive workspace-match guard. (6 tests.)
- lib/navigate-to-conversation.ts — openConversation() centralizes the
  mode-aware (/workspaces vs /chat) routing so convoId is ALWAYS carried; this
  rule was copy-pasted and a copy that dropped convoId was the second half of
  the original bug. (5 tests.)

workspaces/index.tsx shrinks ~110 lines; share/$token.tsx uses openConversation
at the owner-redirect and fork-confirm sites. Behavior-preserving otherwise
(verified by an adversarial diff review: 0 real regressions; every confirmed
finding was a positive equivalence check or a deliberate fix).

Follow-ups identified by the review, deferred to keep this PR scoped to the
selection root cause: extract useMcpHealthCheck, useMessageQueue, and a unified
fork flow (each duplicated near-verbatim in the deprecated /chat route).
Rewind / rewind-&-fork into the middle of an assistant message at part boundaries. Two adversarial review rounds; CI green.
…nRouter save paths

The default OpenRouter path stored a non-faithful assistant `content`: only the
LAST agentic iteration's text on a normal finish, and "" at max-iterations,
while parts[] held one text part per iteration. This broke the invariant the
mid-message-rewind feature relies on (content == contentFromParts(parts)) and
meant prior assistant turns were represented to the model — and indexed for
search — by only their last paragraph.

- Add content_from_parts(parts) helper mirroring the TS contentFromParts
  (convex/messageParts.ts) and the ACP gateway "".join(text_parts)
  (session_manager.py): text parts only, joined with no separator.
- Normal save persists content_from_parts(all_parts) (the full multi-iteration
  join) instead of last-iteration collected_content.
- _save_interrupted is now self-reconciling: it appends the in-flight text as a
  trailing part IFF it isn't already the last text part, then derives BOTH the
  persisted content and parts from that reconciled list — so content ==
  contentFromParts(parts) holds with no text lost or duplicated at every
  interrupted site (mid-stream exception, truncation abort, max-iterations).
- The done event now sends the SAME faithful_content. Required, not cosmetic:
  the frontend's convexHasMessage handshake clears the streaming bubble only
  when lastMsg.content === pendingDoneContent (chat-messages.tsx:743-746); if
  the persisted join and done.content diverged, a multi-iteration bubble would
  never clear.

Unchanged (correct as-is): streaming delta events and the intra-turn
messages.append that build this turn's running OpenRouter message list.

Implemented + adversarially verified via workflow (8 lenses: multi-iteration,
single/tool-only, mid-stream in-progress-text, truncation/max-iter, the
content==parts invariant, done-event/client, consumers, tests/lockfile).

Follow-up (not in this PR): conversations/messages.saveInterruptedMessage trusts
client-supplied content without recomputing from parts — the one remaining path
not covered by the invariant.

Tests: +11 (content_from_parts ×7, _save_interrupted reconcile ×4). Full fastapi
suite 263 passed / 11 skipped; ruff clean; uv.lock untouched.
content_from_parts as single source of truth in chat.py; self-reconciling _save_interrupted; done-event handshake preserved. CI green.
…rruptedMessage

The last persistence path that trusted client-supplied content. When parts are
present, recompute content from them (contentFromParts) instead of storing the
raw arg — so the invariant the mid-message-rewind feature relies on holds even
if a caller sends mismatched content. Falls back to the raw content only when no
parts were captured.

Safe w.r.t. the convexHasMessage handshake (lastMsg.content === pendingDoneContent,
chat-messages.tsx:743): the streaming client keeps state.content in lockstep with
its text parts (onToken appends to both; onThinking/onToolCall never touch
content), so contentFromParts(parts) equals the content the client sent and the
bubble still clears.

Tests: +3 (recompute ignores divergent client content; fallback with no parts;
auth rejection). convex messages suite 24 passed.
…rruptedMessage

Closes the last persistence path not covered by the content/parts invariant. Safe vs the streaming handshake. CI green.
…om-summary

Gives developers observability into context compaction and agency over how to
continue: see each compaction summary inline, and on a compacted conversation
choose "continue full chat" (today's behavior) or "new session from summary"
(a fresh clone seeded with the summary instead of the bloated transcript).

Capture (verified live on claude-agent-acp@0.44.0 via a standalone ACP probe):
- emitRawSDKMessages already opted-in; add `compact_boundary` + `user` filters.
  compact_boundary carries metadata (trigger, pre/post tokens); the summary
  prose arrives as a synthetic user message (string content) that the adapter
  drops from session/update but forwards raw — detected by Claude Code's
  canonical "This session is being continued…" preamble.
- parse_sdk_compaction + _handle_sdk_compaction merge boundary+summary into one
  `compaction` SSE event, gated on a real boundary to avoid false positives.
- Persisted mid-turn (survives SSE disconnect) via save_compaction →
  compactions:record (internal mutation; owner derived server-side).

Data model: new append-only `compactions` table (kept out of messages.parts so
forks never copy it) + `seededFromCompactionId` on conversations.

Clone: compactions.cloneFromCompaction forks a conversation carrying the
harness/workspace forward (reusing fork lineage) and seeds one summary message;
_build_replay_preamble detects the single summary seed and replays it in full
with summary framing (not the 4000-char harness-switch truncation).

UI: CompactionPanel (query-driven, no schema part change) renders each
compaction as a collapsible card + the continue-vs-clone banner, in both the
chat and workspaces routes.

Tests: parse_sdk_compaction + replay-preamble branch (11 new). All suites
green: pytest 263, convex 161, frontend 218.
Self-review (the workflow review was down on API 529s):
- CompactionPanel is keyed by conversationId so its local 'dismissed' state
  resets when switching conversations (ChatMessages isn't remounted per convo).
- Clear session.pending_compaction at turn start so a boundary with no
  following summary can't be paired with a later turn's user message that
  merely echoes the compaction preamble.
Capture Claude Code compaction summaries over ACP (verified live), persist to a new compactions table, surface them inline, and offer continue-full-chat vs new-session-from-summary. Required CI green; adversarial self-review done (workflow review blocked by API 529 overload).
Adds a `rewindSeams` user setting (default on) controlling whether the
mid-message rewind seams render. Gated in chat-messages.tsx (`seamsEnabled`
prop) and wired from userSettings in both /chat and /workspaces. Toggle lives
in the settings dialog under Display. Gates ONLY the seams — whole-message
rewind/fork are unaffected.

- schema: userSettings.rewindSeams (optional bool; absent = on)
- userSettings get/update + DEFAULTS
- settings-dialog: "Mid-message rewind" checkbox
- chat-messages: seamsEnabled gate; both routes pass userSettings.rewindSeams

Tests: userSettings round-trip + updated default-shape assertions (10 pass).
Frontend 218 pass; tsc baseline; biome clean.
…indings)

Exhaustive audit of seams across 87 cases found 3 issues (all in the rewind
action handlers); the geometry, gating, and the new setting verified solid.

- MAJOR: destructive in-place Rewind silently swallowed agent-session-reset
  failure. resetServerAgentSessions returns false (never throws) on a genuine
  network/5xx failure, so the try/catch couldn't see it — Convex truncated but
  the warm ACP session kept the rewound turns (silent view↔agent desync).
  resetAgentSessionForRewind now returns that boolean; both in-place paths warn
  the user (and suggest fork) when the reset fails. No false alarm for the
  stateless OpenRouter case (200, 0 sessions → ok).
- MINOR: forkAtPart had no isBusy() guard — in the post-stream pendingDone
  window the seam targets the prior turn, so a fork could silently omit the
  just-finished turn. Added the guard (parity with handleRewindToPart).
- MINOR: in-place Rewind silently no-op'd when busy. Both paths now toast
  "Can't rewind/fork while the turn is finishing." instead of returning silently.

Frontend 218 pass; tsc baseline; biome clean.
…upted-turn hooks (#124)

Three blocks were copy-pasted (near-)byte-identically in both the /chat and
/workspaces routes. Extract them into shared, testable hooks so the two routes
stop drifting and ~270 lines of duplication per route are removed:

- hooks/use-mcp-health-check.ts — was byte-identical in both routes. Owns the
  mcpHealthStatuses state + runHealthCheck + refreshHealth + the URL-keyed
  effect; reads useAuth internally; returns { mcpHealthStatuses, refreshHealth }.
- hooks/use-persist-interrupted-turn.ts — the persistInterruptedTurn body. Reads
  the chat-stream context; takes a getFallbackModel callback so it stays
  harness-agnostic (model fallback = state.model ?? getFallbackModel()).
- hooks/use-message-queue.ts — the send-while-streaming queue (enqueue/dequeue,
  send-now interrupt+flush, drain-after-turn, post-sync processing, post-stream
  drain effect). The route keeps the route-specific sendQueuedMessage (passed in)
  and a residual effect that clears MCP-failure banners on convo switch (split
  out of the old combined clear effect). 7 unit tests cover the queue mechanics.

Behavior-preserving: identical logic relocated; only the convo-switch clear was
split (queue vs MCP banners) and handleStreamSynced moved after the hook call so
it can consume processQueuedAfterSync. Full frontend suite green (225); zero new
type errors. Header/SandboxSelector dedup and fork unification stay deferred
(real structural drift / different backends).
User setting to toggle seams (default on, gates only seams); 87-case audit + 3 action-handler fixes (agent-reset desync warning, busy-window guards). CI green.
DIodide and others added 25 commits June 21, 2026 16:43
Mirrors the chat-share architecture for harnesses, and adds a Manage Sharing
page (reached from the bottom-right sidebar rail) for all of a user's shares.

Share Harness:
- harnessShareGrants table (public link | email invite | bound user), mirrors
  shareGrants 1:1 (auth ALWAYS via an active grant; secrets never denormalized).
  Lock is a single `sharedLocked` flag on the harness. shares.ts helpers
  (isActiveGrant generic, clamp*, token-min, avatar hosts) exported + reused.
- harnessShares.ts: owner mgmt (ensure/rotate public link, invite-by-email,
  role, lock, revoke, unshare, listings); a chromeless public viewer query
  (getSharedHarness) behind a REDACTED projection (no authToken / mcp url /
  agentCredentialId / sandbox ids / ownerUserId — a test asserts the denylist);
  cloneSharedHarness (drops every secret); editSharedHarness (editor + unlocked
  only, non-secret fields); listIncomingSharedHarnesses for the recipient.
- Email "bind later": invite stores granteeEmail (an invite POINTER, never an
  auth key); FastAPI POST /api/harness-shares/claim resolves the caller's
  Clerk-VERIFIED emails (server-side) and binds via bindHarnessGrantsInternal.
- Frontend: /share-harness/$token chromeless viewer (redacted config + clone,
  signed-in/out), HarnessShareDialog (public link + email + lock + roles), a
  "Share" card action, a "Shared with you" section on /harnesses (clone + an
  editor-only edit dialog the lock gates), claim-on-mount.

Manage Sharing:
- /manage-sharing route + a "Sharing" MANAGE_TABS entry (auto-adds the header
  tab AND the bottom-right rail icon). shares.listMySharedConversations
  (new by_owner index, backfill-tolerant) lists shared chats with revoke /
  change-role / stop-sharing; shared harnesses listed too.

Live-run on the owner's account is DEFERRED (the plan showed agent-mode is
structurally incompatible with the current session-ownership model and
default-loop live-run needs its own focused PR); clone fully covers using a
shared harness today.

Tests: +11 Convex (redaction denylist, owner gating, clone secret-drop, lock/
role on editSharedHarness, email bind-later, listings), +3 FastAPI (claim).
Convex 206, FastAPI 326, web 235; biome clean, tsc 21/21 baseline.
Restructures the README screenshots per review: a single characteristic shot of
the chat app (an agent doing real work — MCP doc lookup, terminal build with an
"exit 0", file edit, a subagent, and a Workflow card, in a colored workspace)
sits at the top; everything else moves into a collapsed <details> gallery
(subagent + workflow card, Context7 MCP connected, harnesses grouped by status,
the harness editor, the share dialog). Drops the older chat-view.png.

Captured loginless via dev-auth against a real deployment with seeded data.
No secret-leak or cross-tenant holes — redaction + authz model held. Fixes:

- [MED] listIncomingSharedHarnesses dedup was role-blind: a user holding both a
  viewer and an editor grant on one harness could be shown the viewer card
  (Edit hidden) though resolveHarnessRole grants editor. Sort editor-first
  before dedup so the card reflects the strongest active grant.
- [MED] Owners couldn't change an email/bound recipient's role (only the public
  link had a toggle). Wire setHarnessShareRole on recipient rows in both the
  dialog and the Manage Sharing harness section. bindHarnessGrantsInternal now
  MERGES instead of duplicating when a user is invited twice (keeps the stronger
  role, one grant per (harness,user)).
- [LOW] sharedLocked persisted after unshareHarness → a later re-share silently
  started locked. Clear it on unshare.
- [LOW] Reconcile the clone-vs-public-view URL policy: clone keeps the MCP url
  (the recipient's own copy needs it); fix the viewer copy to claim only
  "Credentials stay private" (the anonymous view still withholds urls).
- [LOW] Clear the clone-resume intent before the attempt so a failed resumed
  clone can't silently re-fire on reload.
- [LOW] Remove a stale "don't invite yourself" comment (enforced at bind) and
  update "three manage screens" comments after the 4th (Sharing) tab.

+1 Convex test (viewer+editor → editor, merged to one grant). Convex 207,
FastAPI 326, web 235; biome clean, tsc 21/21.
The /harnesses email-claim relay keyed on a component-local useRef, so it
re-fired a Clerk lookup + bindHarnessGrantsInternal on every navigation back to
the page. Gate it on a sessionStorage key instead (cleared on failure so it
retries next visit) — once-per-session, matching the clone-intent pattern.
…files

A Skill Pack bundles a set of skills with optional AGENTS.md / CLAUDE.md
context. Attach multiple packs to a harness (instead of, or alongside, loose
skills). For agentic (ACP) harnesses the gateway writes the context to the
sandbox root and materializes each skill's SKILL.md so Claude Code loads it.

New 'Skill Packs' manage screen (sidebar icon + /skill-packs route) to
create/edit/delete packs; a pack picker in the harness create + edit flows.

Convex:
- skillPacks table + CRUD (skillPacks.ts); harnesses carry skillPackIds
  (create/update/duplicate/resolveForCollab); remove() detaches harnesses.
- internalQuery resolveForGateway: concatenates AGENTS.md/CLAUDE.md per pack,
  unions skills (de-duped), joins cached SKILL.md — owner-scoped.

FastAPI:
- HarnessConfig.skill_pack_ids; resolve_skill_pack_context (Convex client).
- session_manager._attach_skill_pack_context writes AGENTS.md (all agentic),
  CLAUDE.md + optional @AGENTS.md import (claude-code), and
  ~/.claude/skills/<slug>/SKILL.md (claude-code), on create AND revive.
  Path-slug sanitized; best-effort (never blocks a session).
- Re-provision PRUNES previously Harness-managed context (sentinel-marked)
  before writing, so removed skills / detached packs clear from persistent
  sandboxes without touching user-authored files.
- Default OpenRouter loop (chat.py) unions pack skills into its skill manifest.

Frontend:
- manage-tabs Skill Packs nav; /skill-packs route (list + editor reusing the
  skill picker + AGENTS.md/CLAUDE.md + @import checkbox); SkillPackPicker;
  harness-stream sends skill_pack_ids; onboarding + harness-edit attach packs.

Adversarially verified (multi-agent): cross-user access + path traversal are
defended; fixed the stale-context lifecycle bug found in review. Tests: Convex
203, FastAPI 341 (incl. new skillPacks/context-injection/prune tests), frontend
235; tsc/biome/ruff clean on changed files.
Drove a live Claude Code agent (in a real Daytona sandbox) and captured genuine
output, replacing the previously-seeded transcript mockups:

- hero: a real session exploring a sandbox — actual tool calls, a terminal run
  with exit 0, and the real result (node v22.22.3, primes 2..29).
- background-agents panel with two real subagents running in parallel.
- a multi-step run showing the plan, edits, and a passing test suite.
- an approval card from a real run (the agent asking before a write).

Keeps the MCP / harnesses / harness-edit / share shots. Hero stays at the top;
the rest remain in the collapsed gallery.
… files

Adds Skill Packs: a creatable entity bundling skills + optional AGENTS.md/CLAUDE.md context, attachable to harnesses. For agentic harnesses the gateway writes the context to the sandbox root and materializes SKILL.md files (sentinel-guarded pruning clears removed skills/packs). New /skill-packs manage screen + harness-flow picker; default loop unions pack skills. Adversarially verified; cross-user access + path traversal defended.
The code review already re-runs on every push (synchronize), but there was no
way to request a fresh review WITHOUT pushing. Add a workflow_dispatch trigger
(pr_number input) so it can be re-run on demand from the Actions tab /
`gh workflow run claude-code-review.yml -f pr_number=<n>`; the job derives the
PR number and checks out the PR head for either event.
Cut the Features section to ~a third — name each feature, drop the specifics.
docs: launch-ready README + GPLv3 LICENSE
- [LOW] The claim-once sessionStorage gate was cleared only on a fetch reject
  (network error), but the endpoint returns 200 on soft-failures and 401 on the
  post-sign-in token race — so a failed claim was suppressed for the whole
  session. Now: _verified_emails returns None on a TRANSIENT Clerk failure (vs
  [] for genuine no-emails) → the endpoint reports {ok:false}; the client clears
  the session key on any non-success (res not ok OR body.ok false) so it retries
  next visit, while a real success (incl. ok:true/bound:0) stays claimed.
- [LOW] editSharedHarness skipped the systemPrompt 4000-char cap the owner's
  harnesses.update enforces, letting a less-trusted editor write unbounded data
  into the owner's doc. Export + apply assertSystemPromptLength, and clamp name.

+1 Convex test (oversized systemPrompt rejected) +1 FastAPI test (transient →
ok:false, no bind). Convex 208, FastAPI 327, web 235; biome clean, tsc 21/21.
feat(sharing): share harnesses + Manage Sharing page
…port, reliable materialization

UX: the skill-pack editor was a modal with a NESTED catalog modal. It's now a
full page (routes/skill-packs/new + $packId render a shared SkillPackEditor:
left = form, right = embedded catalog panel — no nested modals). The list page
is list-only and navigates to those routes.

Bulk import: convex/skills.ts importSkillRepo lists a repo's skills via the
GitHub trees API, fetches + caches each SKILL.md (+ the repo's AGENTS.md /
CLAUDE.md), indexes them, and returns them to drop into a pack. The editor adds
an 'import a GitHub repo' input (e.g. greensock/gsap-skills) and four pre-built
templates. Repo input is validated/host-pinned (no SSRF); './..' rejected.

Reliability (the deep dive): the ACP gateway used to SILENTLY SKIP skills whose
SKILL.md wasn't cached, with no fallback — so a freshly-added skill didn't reach
Claude Code until ensureSkillDetails happened to finish. Fix: extracted the
GitHub fetch into app/services/skill_content.py (fetch_skill_md), refactored
chat.py to share it, and the gateway now back-fills missing SKILL.md on demand —
bounded by an 8s budget + a 20-skill cap so it can never stall provisioning, and
authenticated via GITHUB_TOKEN (5000/hr vs 60/hr). Distinct skills sharing a
trailing id no longer collide on one ~/.claude/skills dir.

Adversarially verified (multi-agent): chat.py refactor is faithful; nested-modal
gone; gsap path confirmed against the live repo; fixed the two operational
majors (unbounded back-fill, unauthenticated fetch) the review found. Tests:
new test_skill_content.py + slug/cap tests; Convex 203, FastAPI 350, frontend
235; tsc clean (no new errors).
…t, reliable materialization

Replaces the nested-modal skill-pack editor with full-page routes + embedded catalog; adds importSkillRepo (bulk import an owner/repo + its AGENTS.md/CLAUDE.md) and pre-built templates; and makes skills reliably reach Claude Code (gateway back-fills uncached SKILL.md from GitHub, bounded + GITHUB_TOKEN-authed, via a shared skill_content.py also used by the default loop). Adversarially verified.
…ct GitHub rate limit

Clicking a skill-pack template/import showed a cryptic '[CONVEX
A(skills:importSkillRepo)] Server Error' on staging. Root cause: repo skill
discovery uses the api.github.com git/trees endpoint, which is 60/hr per IP
when unauthenticated. With no GITHUB_TOKEN set on the deployment (shared IP),
that call is rate-limited -> discovery returns empty -> the action threw a
plain Error, which Convex masks as a generic 'Server Error'. (ensureSkillDetails
works token-less because it hits raw.githubusercontent.com first; only
discovery needs the rate-limited API.)

The real fix is operational — set GITHUB_TOKEN on the Convex deployment (the
code already reads process.env.GITHUB_TOKEN). This commit makes the failure
diagnosable instead of cryptic:
- listRepoSkillIds now reports rateLimited (403 w/ x-ratelimit-remaining=0, or
  429) vs notFound (404/403).
- importSkillRepo throws ConvexError (Convex surfaces these; plain Errors are
  masked) with actionable messages: invalid repo / rate-limited+set-GITHUB_TOKEN
  / repo-not-found / no-skills.
- the editor's import catch reads ConvexError.data so the real reason reaches
  the toast.

Verified (multi-agent): root cause confirmed; the with-token happy path has no
other throw and breaches no Convex limit (8 or 60 skills); ConvexError surfaces
correctly. convex tsc clean; no new web type errors.
…te limit

importSkillRepo's repo discovery hits the unauthenticated api.github.com rate limit when no GITHUB_TOKEN is set, and the plain Error was masked as a generic 'Server Error'. Now throws ConvexError with actionable messages (rate-limited -> set GITHUB_TOKEN / repo-not-found / no-skills); the editor surfaces ConvexError.data. The operational fix is to set GITHUB_TOKEN on the deployment.
feat(usage): real Claude subscription usage bars (5h + weekly)
Follow-up fixes from a recall-mode review of the harness-sharing feature
(#146). Correctness:

- editSharedHarness: a less-trusted editor could BLANK the owner's name/
  model by saving an empty field (only an upper bound existed). Empty/
  whitespace name/model are now ignored, and the Edit dialog disables Save
  when either is empty.
- harnesses.remove: cascade-delete the harness's harnessShareGrants. Orphaned
  grants were un-revokable (revoke re-asserts ownership via the now-deleted
  harness) and a stale public token kept resolving to a dangling id.
- publicHarnessProjection.hasAuth: derive from authType (!= "none") instead of
  Boolean(authToken) — oauth/tiger_junction servers DO require auth but keep
  their secret off the harness row, so the viewer wrongly showed "no auth".
- share-harness viewer: guard requestClone with an in-flight ref so the manual
  Clone button + auto-resume effect can't create two clones; clear the pending
  clone intent on the owner-redirect path so it doesn't linger for its TTL.
- /harnesses: wrap the synchronous sessionStorage access in the claim effect
  in try/catch (a sandboxed/partitioned context threw and broke the page); and
  don't flash EmptyState while the incoming-shares query is still in flight.

Cleanup:
- listMySharedHarnesses now sorts newest-first, matching the adjacent
  listMySharedConversations on the Manage Sharing page.
- editSharedHarness: drop the unused `grantId` arg (authz is resolved via the
  bound grant), which falsely implied the edit was grant-scoped.

Adds regression tests for the empty-field guard and the remove cascade.
Addresses findings from an xhigh review of #150:

- Scope the fetch to claude-code: `elif cred_id and session.agent_id ==
  "claude-code"`. It was firing for ANY agent with a linked credential —
  _fetch_subscription_usage hardcodes "claude-code", so a codex/cursor cred
  raised + logged a stack trace (caught) roughly once a minute while streaming.
- Treat an empty `{}` rate-limit snapshot like an absent one (`if rl:` instead
  of `is not None`): an empty dict no longer blocks the fetch or clobbers a
  good stored snapshot.
- Dedupe the buckets write: only persist when the fetched snapshot changed
  (mirrors the flat path) — no redundant Convex write every ~60s on a long
  session.
- Bound `_sub_usage_fetched_at`: prune entries past the TTL once it grows large,
  so the per-credential debounce map can't grow unbounded on a long-lived process.
- Key the account-limit bars on a stable window id, not the human label, so two
  windows that fall back to the generic "Claude account" label can't collide.

Skipped by design: the /v1/messages ping (the only way to read the unified
rate-limit headers — count_tokens omits them, /api/oauth/usage is scope-blocked)
and the wait-out-the-TTL-on-failure debounce (intentional, avoids hammering).
fix(usage): harden subscription-usage fetch (review follow-ups to #150)
…wups

fix(sharing): address xhigh code-review findings on harness sharing
…153)

Persistent-sandbox unification (per-workspace boxes) created sandboxes with
auto_stop but no auto_delete, so Daytona stopped → archived → kept them
forever. Leaked session-owned boxes (teardown missed on a gateway restart)
and abandoned workspace boxes both piled up as archived sandboxes; archived
boxes take ~3 min to wake, so once enough accumulate the whole Daytona
account crawls and Claude Code sessions hang on cold start.

- Set auto_delete_interval on creation (both ACP + code-exec create paths).
  Scratch (session-owned) boxes hold nothing durable → reclaimed in 1 day
  (before they even archive); persistent workspace/code-exec boxes hold the
  user's files → 14-day grace. The "continuously stopped" clock spans the
  archived period, so archived boxes do get reclaimed. Both intervals are
  env-tunable; non-positive clamps to "disabled" (Daytona reads 0 as
  "delete immediately on stop", a data-loss footgun).
- Self-heal in _provision_once: a box Daytona auto-deletes still looks
  "owned" to verify_sandbox_owner (it checks Convex, not Daytona), so the
  next session would attach a ghost and error. On DaytonaNotFoundError for an
  attach, drop the stale Convex link and — for a workspace-unification box —
  create a fresh persistent one and relink. An explicit, user-chosen harness
  sandbox can't be fabricated, so that surfaces (link still cleared).
- Tests: interval selection + clamp, and the missing-attach heal decision.
Release notes for the 1.0.0 major release covering the full unreleased
span since v0.2.1 (PRs #81#153): live session following, rewind/fork,
chat + harness sharing & collaboration, Skill Packs, per-workspace agent
sandboxes, workspace credentials, per-credential usage, Claude Code
config, and reliability/integrity hardening. Devops/infra setup
(Redis Streams prod provisioning, CI, deploy plumbing) intentionally
excluded — user-facing changes only.
@DIodide DIodide self-assigned this Jun 23, 2026
…ity)

From the xhigh code review of #154. User-facing fixes only:

- chat(collab): never inject the OWNER's GitHub token or workspace env
  credentials on an editor-collaborator turn — the agent runs in the
  owner's sandbox and a collaborator could echo them back via a sandbox
  command. Secrets stay server-side. (HIGH)
- credentials: extend the reserved-name denylist with the proxy / TLS-trust
  / package-registry families (HTTP(S)_PROXY, NO_PROXY, NODE_EXTRA_CA_CERTS,
  SSL_CERT_FILE, REQUESTS/CURL_CA_BUNDLE, PIP_INDEX_URL, …) and git-config
  injection (GIT_CONFIG_*/GIT_PROXY_COMMAND/GIT_SSH, NPM_CONFIG_*), so a
  credential can't MITM or hijack the sandbox's outbound traffic. (HIGH)
- usage: parse the unified rate-limit reset header robustly (int → RFC 3339
  fallback) so a format variance can't drop the reset (and defeat the
  stale-window self-heal). (HIGH)
- harness-share: clear sharedLocked when the LAST grant is revoked one-by-one
  (matching unshareHarness), so a later re-share doesn't start locked.
- conversations: bound the workspace re-stamp scan with .take(8192) like
  fork(), so adopting/moving a very long conversation can't blow the
  per-transaction read/write limit.
- chat-restore: pick the genuinely most-recent chat by lastMessageAt instead
  of the first match in the pinned-first list.
- stream-bus: give followers a separate Redis connection pool so a crowd of
  viewers can't starve the latency-critical producer/tee path.
- message-queue: flush the post-sync drain via an explicit signal instead of
  relying on the send-callback identity changing between turns.

Tests: credential denylist families, last-grant lock-clear, deterministic
post-sync drain. fastapi 361, convex 219, web 240 — all green.
Re-review (xhigh) of fa3177e surfaced gaps in the fixes themselves:

- credentials: enforce the reserved-name denylist at RESOLVE/injection time
  too (resolve_workspace_env), not just at creation — a row stored before
  the denylist was expanded (e.g. HTTP_PROXY, GIT_CONFIG_*) would otherwise
  still be decrypted + injected. Shared is_reserved_env_name() helper. (MED)
- usage: the RFC-3339 reset fallback parsed an offset-less timestamp as naive
  (local-tz), skewing resetsAt by the UTC offset; assume UTC when tzinfo is
  absent.
- message-queue: armPendingSend no longer clobbers an already-armed send —
  it requeues the colliding message at the front so neither is dropped
  (handleSendNow / processQueuedAfterSync lacked drainQueueAfterTurn's guard).
- harness-share: the last-grant lock-clear now keys on ACTIVE grants
  (isActiveGrant), like the rest of the module, so a future soft-revoke /
  expiry can't leave an inactive row that skips the unlock.
- conversations: extract MESSAGE_SCAN_CAP constant for the four .take(8192)
  sites and document the partial-restamp bound (a >cap-message conversation
  adopts/moves fine but its oldest messages stay out of workspace search).
- stream-bus tests: reset/close the new follower client globals in the
  fixtures (teardown symmetry after the producer/follower pool split).

Tests added: reserved-name skipped at resolve; no-drop on arm collision.
fastapi 362, convex 219, web 241 — all green.
@DIodide DIodide deployed to staging June 23, 2026 14:15 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant