Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
552 changes: 502 additions & 50 deletions package-lock.json

Large diffs are not rendered by default.

10 changes: 7 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
},
"type": "module",
"engines": {
"node": ">=20"
"node": ">=20.18.1"
},
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand Down Expand Up @@ -59,13 +59,17 @@
"@agent-relay/fleet": "^9.0.2",
"@agent-relay/harness-driver": "^9.0.2",
"@agent-relay/integration-prompts": "^9.0.2",
"@relayfile/relay-helpers": "^0.4.2",
"@relayfile/sdk": "^0.10.9",
"@relayflows/core": "^1.0.3",
"agent-relay": "^9.0.1",
"zod": "^3.25.76"
},
"overrides": {
"@scalar/postman-to-openapi": "0.4.10",
"listr2": "9.0.5"
},
"devDependencies": {
"@relayflows/cli": "^1.0.3",
"@relayflows/core": "^1.0.3",
"@types/node": "^22.10.2",
"esbuild": "^0.24.2",
"tsc-alias": "^1.8.17",
Expand Down
29 changes: 28 additions & 1 deletion planning/factory-cloud-watches-local-node-linear-issue.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,31 @@
Title: [factory] EPIC — cloud watches → local-node execution; label-driven single agent / workflow / team
# ⚠️ HISTORICAL — v1 epic, superseded 2026-06-18

This document is the **original (v1) factory extraction epic** written 2026-06-15. It captures the initial framing — a local-vs-cloud execution split, with the factory brain on the operator's laptop and a "minimal slice" of fleet placement.

**That framing was superseded mid-extraction** by Will Washburn's `relay/specs/fleet-delivery.md` RFC (2026-06-06) and the conversation it triggered. The unified-node model that shipped is meaningfully different from what's described below:

- **There is no local-vs-cloud execution split.** Daytona sandboxes, laptops, mac minis, EC2 boxes are all the same primitive: nodes. Each advertises capabilities; placement picks an eligible node.
- **The factory is a spec-emitter**, not an executor. It emits `spawn { capability, persona, … }` invocations into the relay fleet; placement is the fleet's job.
- **`single | workflow | team` are recipes** — patterns over one spawn primitive — not three distinct code paths.

**Current architecture (read these, not this doc):**

- `relay/specs/fleet-delivery.md` — the load-bearing RFC. Two planes (messaging fabric + compute layer), spawn-as-action, idempotency + at-least-once + reconcile.
- `cloud/packages/web/lib/proactive-runtime/factory-cloud-orchestrator.ts` — the cloud-hosted factory brain (Phase 2 / the v1 doc's "cloud lift", now real).
- `cloud/packages/web/lib/proactive-runtime/factory-fleet-emitter.ts` — emits fleet spawns from the orchestrator.
- `cloud/packages/web/lib/proactive-runtime/team-launch-n1.ts` — proactive runtime, **explicitly cut over from the legacy Daytona `launchMember` path to fleet** (see the cutover comment around line 503).
- `factory/` repo (`AgentWorkforce/factory`, published as `@agent-relay/factory` on npm) — the extracted package.
- `factory/src/fleet/relay-fleet-client.ts` — `RelayFleetClient`, the thin client over the fleet protocol.

**Phases that actually shipped under the unified model:** P1 StateStore port (pear#371), P2 config split (pear#372), P3 publish-prep (pear#370), P4 extraction (factory `5f32a5a` + npm `@agent-relay/factory@0.1.1`), P5 Pear teardown (pear#373), p7 recipe scoping (factory#1), p10 RelayFleetClient (factory#2), p13 node-definition (factory#3), plus the proactive cutover (cloud `factory-cloud-orchestrator.ts` + `factory-fleet-emitter.ts` + `team-launch-n1.ts`).

**Why this doc is preserved:** §4 (package extraction details), §5 (config split), and §8 (phased plan structure) were correct in v1 and were used as-is during execution. §2 (architecture diagram), §3 (label → shape), §6 ("minimal slice" of #1056), and §10 (non-goals) are wrong under the shipped architecture — they describe a local-cloud split that does not exist in the code.

A proper v2 rewrite was planned (see the v2 handoff drafted at `pear/handoff-factory-unified-node-architecture.md`, Deliverable 1) but was never produced because the unified-node architecture shipped directly as code without an intermediate v2 document. This stamp serves in its place.

---

Title: [factory] EPIC — cloud watches → local-node execution; label-driven single agent / workflow / team (HISTORICAL — see stamp above)

Team: AR
Suggested status: Design / Epic
Expand Down
4 changes: 2 additions & 2 deletions planning/factory-unified-node-architecture-linear-issue.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ The recipe is one knob with three expansions — not three execution paths. The
| Recipe (label) | Spawn-set emitted | Capability | Persona / workflow source | Roster from repo labels |
|---|---|---|---|---|
| `agent:single` | **1** spawn `{ capability: 'spawn:claude' (or per-persona harness), persona: <X>, node?: <repo-label placement>, session_ref? }` | `spawn:claude` / `spawn:codex` | `agents/<persona>/persona.ts` | ignored for count (always 1); repo label still informs placement (which checkout/node) |
| `agent:workflow` | **1** spawn `{ capability: 'workflow:run', workflow: '<path>.{yaml,ts,py}', inputs }` — the node runs `relayflows run <workflow>`, which may emit further **child spawns** | `workflow:run` | workflow file defines its own roster; personas referenced by its steps resolve from `agents/` | ignored for roster; repo labels become workflow inputs |
| `agent:workflow` | **1** spawn `{ capability: 'workflow:run', workflow: '<path>.{yaml,ts,py}', inputs }` — the node invokes the Relayflows SDK in-process, which may emit further **child spawns** | `workflow:run` | workflow file defines its own roster; personas referenced by its steps resolve from `agents/` | ignored for roster; repo labels become workflow inputs |
| `agent:team` | **N** implementer spawns + **1** reviewer spawn + roster metadata. This is the logic that today lives in `cloud/.../teams/spawn-team.ts`, reconstructed as a recipe over the spawn primitive | `spawn:claude` / `spawn:codex` per member | `agents/cloud-team-implementer/persona.ts`, `agents/cloud-team-reviewer/persona.ts` | **one implementer per repo label** (capped at 4 per AR-272); reviewer naming unchanged |

Concrete example (today's AR-267 team): labels `cloud`, `relayfile`, `agent:team` → emit `spawn{spawn:claude, persona: cloud-team-implementer, node-target via cloud checkout}`, `spawn{… relayfile checkout}`, and `spawn{spawn:claude, persona: cloud-team-reviewer}`. Placement, execution, and completion are all fleet-side.
Expand Down Expand Up @@ -155,7 +155,7 @@ v1 called relay#1056 "a minimal slice the factory needs." Under unified-node, **

## 8. Open questions (surface to operator)

1. **`workflow:run` capability handler shape.** When a node picks up `{capability:'workflow:run'}`, does it (a) shell out to the `relayflows` CLI, (b) embed the runtime in-process, or (c) call relayflows as a service? Proposed in Phase 3: shell out to `relayflows run <workflow>` on the node (simplest; the node already has the harness + repo checkout; child spawns ride the same fleet). Confirm.
1. **`workflow:run` capability handler shape.** Decided: the node embeds the Relayflows runtime through `@relayflows/core`; it does not depend on a globally installed CLI. The node already has the harness + repo checkout, and child spawns ride the same fleet.
2. **Single-recipe in cloud.** Cloud has no single-agent path today (only team via `spawn-team.ts`, proactive via `team-launch-n1`). `agent:single` is just a 1-spawn recipe — confirm it needs nothing beyond team-recipe at N=1.
3. **Multi-node placement preference.** Laptop + mac-mini both advertise `spawn:claude` — RFC §6 says least-loaded. Good enough for v1? (Assumed yes.)
4. **Persona discovery single source.** Both cloud (team-recipe construction) and the node-side workflow runtime must read `AgentWorkforce/agents/`. Confirm both point at the same registry.
Expand Down
4 changes: 2 additions & 2 deletions planning/linear-issue-factory-phase-3-fleet-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Implement `RelayFleetClient implements FleetClient` (the port at `pear/packages/

- **`SpawnInput.capability`** is `'spawn:claude' | 'spawn:codex'` today (ports/fleet.ts:4–15); extend the type to include **`'workflow:run'`** for the workflow recipe.
- **invocationId lifecycle (RFC §7):** the client supplies an `invocationId` (idempotency key); observes `pending → dispatched(node) → completed(agent_id)`; relies on the fleet for dedup, reschedule-on-node-loss, and reconcile (first-to-`completed` wins). The factory does NOT implement placement, scheduling, or reconcile — it observes the lifecycle and reports completion upward.
- **`workflow:run` capability handler (open question 1 — proposed):** the node-side handler for `{capability:'workflow:run', workflow:<path>}` **shells out to `relayflows run <workflow>`** in the node's repo checkout. Rationale: the node already has the harness + checkout; child spawns the workflow emits ride the same fleet; no embedded runtime or service dependency. The `relayflows` CLI is a dependency of the node's harness definition (Phase 4). Confirm with operator before building.
- **`workflow:run` capability handler:** the node-side handler for `{capability:'workflow:run', workflow:<path>}` invokes `@relayflows/core` in the node's repo checkout. The node already has the harness + checkout; child spawns the workflow emits ride the same fleet. No globally installed Relayflows CLI is required.
- **No reuse of `InternalFleetClient`'s broker-direct path** beyond reference — `RelayFleetClient` talks the fleet protocol, not the local `HarnessDriverClient`.

## End-to-end verification (captured artifact required)
Expand All @@ -46,7 +46,7 @@ Implement `RelayFleetClient implements FleetClient` (the port at `pear/packages/
3. An `agent:single` spawn round-trips the fleet and lands on whichever eligible node is live (factory targeted none).
4. Completion observed via the `invocationId` lifecycle; Linear writeback fires.
5. Node-loss mid-spawn reschedules the same `invocationId` with no double-spawn (captured).
6. The `workflow:run` handler decision (shell-out to `relayflows run`) is documented + implemented or explicitly deferred with the chosen alternative recorded.
6. The `workflow:run` handler decision (embedded Relayflows SDK) is documented + implemented.

## Out of scope

Expand Down
6 changes: 3 additions & 3 deletions planning/linear-issue-factory-phase-4-node-registration.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Today the operator runs `pear factory start` — a daemon that owns orchestratio

- **Reads `NodeConfig`** (p2): `workspaceId`, `capabilities` (`spawn:claude` / `spawn:codex` / `workflow:run`), and repo checkout paths (`cloneRoot` / `clonePaths`, pear#369 compact form).
- **Registers + advertises** per RFC §9 control surface: `node.register` (name, capabilities, version, max_agents, tags, resume cursor), `node.heartbeat` (~10–15s: load, active_agents), `node.deregister` on shutdown, `inventory.sync` (re-announce live agents on reconnect: agent_id, name, invocationId, session_ref).
- **Handles `action.invoke`** from Relaycast (RFC §9 Relaycast→Broker): for `spawn:claude`/`spawn:codex`, spawn the harness in the mapped checkout (the existing local PTY path — `InternalFleetClient`'s `spawnPty` is the reference impl); for `workflow:run`, **shell out to `relayflows run <workflow>`** in the checkout (per Phase 3's contract). Emits `agent.register` / `action.result` / `delivery.ack` back.
- **Handles `action.invoke`** from Relaycast (RFC §9 Relaycast→Broker): for `spawn:claude`/`spawn:codex`, spawn the harness in the mapped checkout (the existing local PTY path — `InternalFleetClient`'s `spawnPty` is the reference impl); for `workflow:run`, invoke the Relayflows SDK in the checkout. Emits `agent.register` / `action.result` / `delivery.ack` back.
- **No orchestration logic.** No triage, no merge-gate, no batch state — all cloud (Phase 2). The node is dumb compute that advertises what it can run.
- **The broker already auto-starts:** `agent-relay fleet serve` calls `startBrokerWithPortFallback` (`relay/packages/cli/src/cli/commands/fleet.ts:144`) before serving — one command boots the broker + registers the node, no separate `agent-relay up`.

Expand All @@ -31,15 +31,15 @@ A laptop, mac mini, EC2 box, or autospawned Daytona sandbox all run this same re
1. From a machine with NO running broker, run `agent-relay local factory` with only a `NodeConfig`.
2. Capture: the broker auto-starts, the node appears in the fleet roster (`agent-relay fleet nodes`) with the advertised capabilities + a live heartbeat.
3. From cloud factory triage (Phase 2), an `agent:single` spawn placed by Relaycast onto this node executes in the correct local checkout; capture the agent running + `action.result` completion.
4. Capture an `agent:workflow` spawn: the node runs `relayflows run <workflow>` and any child spawns ride the fleet.
4. Capture an `agent:workflow` spawn: the node invokes the Relayflows SDK and any child spawns ride the fleet.
5. Capture reconnect reconcile: drop the node's network < TTL, restore; `inventory.sync` re-announces live agents; no duplicate spawns.

## Acceptance criteria

1. `agent-relay local factory` registers the machine as a node from `NodeConfig` alone; broker auto-starts (cold-broker proof).
2. Node advertises `capabilities` + `clonePaths`; appears in the roster with heartbeat.
3. A cloud-placed `spawn:claude` / `spawn:codex` executes in the mapped checkout; `action.result` reports completion.
4. A cloud-placed `workflow:run` runs `relayflows run <workflow>` in the checkout.
4. A cloud-placed `workflow:run` invokes `@relayflows/core` in the checkout.
5. Reconnect inventory-sync reconciles without duplicate spawns (captured).
6. The command contains zero orchestration logic (no triage/merge/state).

Expand Down
6 changes: 6 additions & 0 deletions src/cli/fleet.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,9 @@ describe('fleet CLI runtime', () => {
expect(ensureLocalMount).toHaveBeenCalledWith('rw_7ccfea89', process.cwd(), {
acceptableWorkspaceIds: ['50587328-441d-4acb-b8f3-dbe1b3c5de99'],
})
expect(ensureLocalMount).toHaveBeenCalledWith('rw_7ccfea89', '/work/pear', {
acceptableWorkspaceIds: ['50587328-441d-4acb-b8f3-dbe1b3c5de99'],
})
} finally {
await rm(root, { recursive: true, force: true })
}
Expand Down Expand Up @@ -669,6 +672,9 @@ describe('fleet CLI runtime', () => {
expect(ensureLocalMount).toHaveBeenCalledWith('factory-cli-test', process.cwd(), {
acceptableWorkspaceIds: undefined,
})
expect(ensureLocalMount).toHaveBeenCalledWith('factory-cli-test', '/work/pear', {
acceptableWorkspaceIds: undefined,
})
expect(createFactory).toHaveBeenCalledTimes(1)
expect(factory.start).toHaveBeenCalledWith({ mode: 'live' })
expect(factory.runLoop).not.toHaveBeenCalled()
Expand Down
32 changes: 32 additions & 0 deletions src/cli/fleet.ts
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ async function runFactoryCommand(
await (deps.ensureLocalMount ?? ensureLocalMount)(workspaceId, process.cwd(), {
acceptableWorkspaceIds: acceptableMountIds,
})
await ensureClonePathMounts(deps, workspaceId, config, acceptableMountIds)
const waiter = createStopSignalWaiter()
let stoppedBySignal = false
const flushAndExit = async (code: number): Promise<void> => {
Expand Down Expand Up @@ -313,6 +314,7 @@ async function runFactoryCommand(
}
}
if (command.action === 'run-once') {
await ensureClonePathMounts(deps, workspaceId, config, acceptableMountIds)
writeJson(out, await factory.runOnce({ dryRun: globals.dryRun }))
return 0
}
Expand Down Expand Up @@ -344,6 +346,7 @@ async function runFactoryCommand(
return 0
}

await ensureClonePathMounts(deps, workspaceId, config, acceptableMountIds)
const removeSignalHandlers = installFactoryStopSignalHandlers(factory, {
processLike: deps.stopSignalProcessLike,
})
Expand Down Expand Up @@ -375,6 +378,35 @@ async function runFactoryCommand(
return 0
}

/**
* Ensures the relayfile mount is running at each configured clone path so
* spawned agents can resolve `.integrations` relative to their working
* directory (the checkout path). The mount daemon started at the daemon CWD
* is not automatically accessible from a different directory, and agents need
* these paths for integration writebacks (Slack, GitHub, etc.).
*/
async function ensureClonePathMounts(
deps: FleetCliDeps,
workspaceId: string,
config: FactoryConfig,
acceptableMountIds?: readonly string[],
): Promise<void> {
const mountFn = deps.ensureLocalMount ?? ensureLocalMount
const mountOpts = { acceptableWorkspaceIds: acceptableMountIds }
const daemonCwd = resolve(process.cwd())
for (const clonePath of new Set(Object.values(config.clonePaths ?? {}))) {
const resolved = resolve(clonePath)
if (resolved !== daemonCwd) {
try {
await mountFn(workspaceId, resolved, mountOpts)
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
process.stderr.write(`[factory] warning: could not start relayfile mount at ${resolved}: ${message}\n`)
}
}
}
}

function parseFactoryCommand(args: string[]): ParsedCommand {
const [action, issueOrPr, ...flags] = args
if (action === 'start') {
Expand Down
Loading