Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 20 additions & 39 deletions .claude/commands/automate.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Phase 3: Parallel test writing (agents)
├── desktop-tester-1..N (desktop app tests)
└── mcp-tester (mcp server tests, if applicable)
Phase 4: Test reality check (lead, after all testers done)
Phase 5: Full test run (lead)
Phase 5: Scoped test run (new + affected) (lead)
Phase 6: CI verification (lead)
Phase 7: Summary (lead)
```
Expand Down Expand Up @@ -247,60 +247,40 @@ If issues are found, fix them directly.

---

## Phase 5: Full Test Run
## Phase 5: Scoped Test Run

After reality check passes, run ALL created tests to confirm everything passes together.
Verify the tests **this command just wrote** pass. Do NOT run the full suite — that is `/finalize`'s job, and running it here doubles the wait with no new signal.

### 5a. Desktop tests (all new test files)
### 5a. New test files together

Run new test files together first:
Run every test file created in Phase 3 in a single invocation:

```bash
cd apps/desktop && npx vitest run [space-separated list of all new test files]
```

### 5b. Desktop tests (full sharded run — match CI)
All new tests must pass. If any fail, fix in place and re-run only the failing files.

Run the full suite the same way CI does — sharded 8-way. Run all 8 shards in parallel:
### 5b. Affected existing tests

```bash
cd apps/desktop && npx vitest run --shard=1/8
cd apps/desktop && npx vitest run --shard=2/8
cd apps/desktop && npx vitest run --shard=3/8
cd apps/desktop && npx vitest run --shard=4/8
cd apps/desktop && npx vitest run --shard=5/8
cd apps/desktop && npx vitest run --shard=6/8
cd apps/desktop && npx vitest run --shard=7/8
cd apps/desktop && npx vitest run --shard=8/8
```

Or run a specific workspace project:

```bash
cd apps/desktop && npx vitest run --project unit-main
cd apps/desktop && npx vitest run --project unit-renderer
cd apps/desktop && npx vitest run --project unit-shared
```

### 5c. MCP server tests (if applicable)

```bash
cd apps/mcp-server && npm test
```

### 5d. Run affected existing tests

If code changes could break existing tests (e.g., changed a service function's signature), run those existing test files too:
If the branch's source changes could break existing tests (e.g., changed a service function's signature, renamed an exported type, altered shared contracts), run those existing test files — NOT the full suite:

```bash
cd apps/desktop && npx vitest run [affected existing test files]
```

Scope "affected" narrowly — direct importers of touched modules and their test siblings. Do not expand to "everything in the same feature folder."

**If tests fail:**
- Check if it's a flaky test (retry once)
- If a specific test fails consistently, fix it and re-run only that file
- Do NOT re-run all tests — only the failed ones

### 5c. Not this command's job

- **Full sharded suite run:** `/finalize` runs all 8 shards (and `test-ade-cli`) the same way CI does. Skip it here.
- **Build / typecheck / lint:** also deferred to `/finalize`.

---

## Phase 6: CI Verification
Expand Down Expand Up @@ -354,9 +334,10 @@ Read `.github/workflows/ci.yml`. Verify:
### Test Files Created:
- [List each file with test count]

### Full Suite Run:
- Desktop: PASS (X tests)
- MCP Server: PASS (X tests)
### Scoped Test Run:
- New test files: PASS (X tests across Y files)
- Affected existing tests: PASS (X tests) or N/A
- NOTE: Full sharded suite run is deferred to `/finalize`.

### CI Coverage:
- vitest.workspace.ts: All new tests matched by include patterns
Expand Down Expand Up @@ -394,7 +375,7 @@ Mark as **"completed"** ONLY if ALL of the following are true:

1. ALL tests pass
2. All applicable test types were created per gap tracker
3. Full test run passed (Phase 5)
3. Scoped test run passed (Phase 5 — new + affected only; full suite deferred to /finalize)
4. CI covers all new test files (Phase 6)
5. No tests with silent null guards
6. No tests that mock the thing being tested
Expand Down
40 changes: 28 additions & 12 deletions .claude/commands/finalize.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,35 +285,51 @@ cd apps/web && npm run typecheck
cd apps/desktop && npm run lint
```

### 3e. Desktop tests (sharded — match CI exactly)
### 3e. Desktop tests — full suite, sharded 8-way, run in PARALLEL

Shard like CI (8 shards in parallel) to avoid timeout. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding runs across all of them automatically:
`/finalize` is the gate that runs the whole test suite. Run **all 8 shards concurrently** — not sequentially. Running them serially takes 8× longer and masks real CI wall-clock behavior.

The command must be identical to `.github/workflows/ci.yml` (job `test-desktop`, matrix shard 1–8, step at line 139):

```
- run: cd apps/desktop && npx vitest run --shard=${{ matrix.shard }}/8
```

Locally that maps to 8 parallel Bash invocations in a single tool-call round:

```bash
cd apps/desktop && npx vitest run --shard=1/8
cd apps/desktop && npx vitest run --shard=2/8
cd apps/desktop && npx vitest run --shard=3/8
cd apps/desktop && npx vitest run --shard=4/8
cd apps/desktop && npx vitest run --shard=5/8
cd apps/desktop && npx vitest run --shard=6/8
cd apps/desktop && npx vitest run --shard=7/8
cd apps/desktop && npx vitest run --shard=8/8
cd apps/desktop && npx vitest run --shard=1/8 # shard 1 of 8
cd apps/desktop && npx vitest run --shard=2/8 # shard 2 of 8
cd apps/desktop && npx vitest run --shard=3/8 # shard 3 of 8
cd apps/desktop && npx vitest run --shard=4/8 # shard 4 of 8
cd apps/desktop && npx vitest run --shard=5/8 # shard 5 of 8
cd apps/desktop && npx vitest run --shard=6/8 # shard 6 of 8
cd apps/desktop && npx vitest run --shard=7/8 # shard 7 of 8
cd apps/desktop && npx vitest run --shard=8/8 # shard 8 of 8
```

Or run specific projects when you only need a subset:
Issue these as 8 concurrent Bash tool calls in a single message (one call per shard). Do not chain them with `&&` or `;` or run them one at a time. The workspace has 3 projects (`unit-main`, `unit-renderer`, `unit-shared`) — sharding distributes across all three automatically.

If a shard fails, re-run **only that shard** (or, better, only the specific failing test file inside it). Never re-run all 8 shards to verify a one-file fix.

Workspace-project subsets exist for debugging only; they are NOT a substitute for the sharded run in `/finalize`:

```bash
cd apps/desktop && npx vitest run --project unit-main # ~150+ main-process tests
cd apps/desktop && npx vitest run --project unit-renderer # ~85+ renderer tests
cd apps/desktop && npx vitest run --project unit-shared # ~7 shared/preload tests
```

### 3f. ADE CLI tests
### 3f. ADE CLI tests — separate CI job, run alongside the 8 shards

CI runs `test-ade-cli` as its own parallel job (`.github/workflows/ci.yml:156`). Locally, include it in the same parallel tool-call round as the 8 desktop shards — it's effectively a 9th concurrent invocation, not something to run after:

```bash
cd apps/ade-cli && npm test
```

Do NOT run apps/mcp-server tests — the MCP server was removed; the agent-facing surface lives in `apps/ade-cli`.

### 3g. Build all apps

```bash
Expand Down
113 changes: 113 additions & 0 deletions .claude/commands/shipLane.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
name: shipLane
description: 'Autonomously drive a lane through CI + review until merged (automate → finalize → poll/fix loop, self-paced wake-ups, max 5 iterations)'
---

# Ship Lane Command

Drive the current lane from "work is ready" to "merged on main" without manual shepherding.

**Usage:**
- `/shipLane` — auto-detects state (existing PR on current branch, or needs initial push)
- `/shipLane <pr-number>` — operate on a specific PR (useful if you checked out a different branch mid-loop)

**Arguments:** $ARGUMENTS

---

## Source of truth

**Follow the playbook at `docs/playbooks/ship-lane.md`.** All phase logic, state schema, commands, decision rules, and bot-ping rules live there. This wrapper only defines how Claude Code's team + wake-up primitives map onto the playbook.

If you are re-invoked by a scheduled wake-up, read `.ade/shipLane/<sanitized-branch>.json` first. If `status == running`, skip Phase 0 and go straight to Phase 1.

---

## Execution mode: autonomous

This command runs end-to-end without user interaction. Do NOT:
- Ask the user to confirm, choose, or approve anything.
- Pause between phases to request direction.
- Stop on non-fatal warnings — log them and continue.
- Ask whether to apply a fix — apply, verify, commit.

The only user-visible output is the per-iteration summary and the final Phase 5 exit summary.

---

## Concurrency: TeamCreate is MANDATORY

Check the available tools. If `TeamCreate` is in scope, you MUST use it. Do not fall back to `Agent` calls when a team is available.

### Team composition

Create one team at the start of the invocation, reuse it across iterations.

```
ship-lane team
├── lead (this session's main agent)
├── poll-agent — runs every iteration, returns structured summary only
├── rebase-agent — spawned only when behindMain or conflicts exist
├── ci-fix-agent — spawned only when CI failures exist
├── review-fix-agent — spawned only when new valid comments exist
└── conflict-resolver — spawned by rebase-agent for >5-file conflicts
```

Initial team setup should also create:
- `automate-agent` — invoked once in Phase 0 (only when there is no existing PR)
- `finalize-agent` — invoked once in Phase 0 (only when there is no existing PR)

### Delegation rules

- The lead NEVER reads raw CI logs or full comment threads. It reads the poll-agent's structured summary (see playbook §1.3).
- Fix agents get minimum scope: failing test paths + error snippets, or comment bodies + file anchors.
- Fix agents edit files directly; they do not commit.
- The lead commits and pushes after verifying `git diff`.
- Rebase-agent runs alone when active — no concurrent file edits from other agents.

### Fallback (TeamCreate not available)

If `TeamCreate` is genuinely not in scope for this session:

- Use parallel `Agent` tool calls for independent work (poll, ci-fix + review-fix in the same iteration).
- Use serial `Agent` calls for rebase (must run alone) and Phase 0 setup (automate then finalize).
- Same delegation rules apply — keep the lead's context clean by summarizing sub-agent output aggressively.

---

## Scheduling wake-ups

Use `ScheduleWakeup` at the end of each iteration (playbook §5.3) with the same command re-invocation as the `prompt`:

```
ScheduleWakeup({
delaySeconds: <270 | 720 | 1800 per playbook>,
reason: "shipLane iter <N>: <CI running | waiting on review | just pushed>",
prompt: "/shipLane $ARGUMENTS"
})
```

Pass `$ARGUMENTS` through so a PR-number argument is preserved across wake-ups.

Do NOT schedule a wake if `status` is `done-clean`, `done-max`, or `blocked` — print the summary and stop.

---

## Phase 0 safety rails (Claude Code specific)

Before running `automate-agent` and `finalize-agent` in Phase 0:

1. Confirm `$ARGUMENTS` is empty OR matches a PR number on the current branch. If the PR number is for a different branch, `git checkout` to that branch first.
2. Confirm `git status` is clean of foreign changes you don't expect. If the working tree has staged changes, commit them with `ship: checkpoint before automate/finalize` so the automate/finalize pipeline runs against a known baseline.
3. Confirm `origin` is a GitHub remote (`git remote get-url origin`) — `gh pr create` needs it.

If any rail fails, exit `blocked` with a clear reason in the state file and stop.

---

## References

- `docs/playbooks/ship-lane.md` — full phase logic (source of truth).
- `.claude/commands/automate.md` — invoked by `automate-agent` in Phase 0.
- `.claude/commands/finalize.md` — invoked by `finalize-agent` in Phase 0.
- `.github/workflows/ci.yml` — CI job names and shard count (`8`) that the local fallback tests mirror.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,13 @@ xcuserdata/
apps/ios/.dry-run-derived-data/
apps/ios/build/
ios-signing/
.asc/artifacts/

# Tool configs (personal)
.codex/
.pnpm-store/
/apps/desktop/.ade
/.ade/shipLane/
/.playwright-mcp
/.codex-derived-data
package-lock.json
4 changes: 4 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
- The ADE CLI lives in `apps/ade-cli` and shares core services with the desktop app.
- State is primarily stored under `.ade/` inside the active project, with runtime metadata in SQLite and machine-local files under `.ade/secrets`, `.ade/cache`, and `.ade/artifacts`.

## Playbooks

- `docs/playbooks/ship-lane.md` — autonomous PR-to-merge driver (automate → finalize → poll-fix loop). Any agent CLI can follow it directly; Claude Code wraps it as `/shipLane`.

## Working norms

- Preserve existing desktop app patterns before introducing new abstractions.
Expand Down
17 changes: 16 additions & 1 deletion apps/ade-cli/src/adeRpcServer.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1823,7 +1823,12 @@ describe("adeRpcServer", () => {
cols: 120,
rows: 36,
tracked: true,
toolType: "claude-orchestrated"
toolType: "claude-orchestrated",
command: "claude",
args: expect.arrayContaining(["--model", "claude-sonnet-4-6", "--permission-mode", "default", "Implement API wiring"]),
env: expect.objectContaining({
ADE_DEFAULT_ROLE: "agent",
}),
})
);
expect(response.structuredContent.startupCommand).toContain("claude");
Expand Down Expand Up @@ -1853,6 +1858,16 @@ describe("adeRpcServer", () => {
expect(response.structuredContent.startupCommand).toContain("claude");
expect(response.structuredContent.startupCommand).toContain("ADE_RUN_ID=run-1");
expect(response.structuredContent.startupCommand).toContain("ADE_ATTEMPT_ID=attempt-workspace-roots");
expect(fixture.runtime.ptyService.create).toHaveBeenCalledWith(
expect.objectContaining({
command: "claude",
env: expect.objectContaining({
ADE_RUN_ID: "run-1",
ADE_ATTEMPT_ID: "attempt-workspace-roots",
ADE_DEFAULT_ROLE: "agent",
}),
})
);
});

it("rejects config-toml permission mode for Claude spawn_agent sessions", async () => {
Expand Down
Loading
Loading