Triage gate: classify complexity before dispatch, reject or sharpen specs instead of failing on ambiguous issues

## Why now

Two independent prospect calls (Nicole Turnage @ Apricot, 2026-06-23; Nango / Marcin, 2026-06-25) hit the same failure mode with factory-shape products:

**Nicole:** zero-touch Linear → PR pipeline produced *"a giant plate of spaghetti"* on an under-specified ticket. The agent made assumptions the spec didn't cover, didn't ask for clarification, shipped a PR that missed three branches of a five-step flow.

**Marcin (verbatim):**
> "We started from, can we go from linear to a PR? Which is, like, low hanging fruit. But it is so complicated. A lot of the stuff is so complicated that eventually an engineer has to look. The prompting of an issue is the easy part. It takes 10% of the time, then the rest is 90%."

He then proposed the fix verbatim:
> "It would be useful where at the point of creating linear issues, it could figure out how complicated a solution to this is. And suggest that I can just do this now. Like, if it just basically finds out that hey, this is pretty easy, then it triggers a run to give you a PR. Unless there are some things here that we might not be happy about, so let's not do it."

This is two-of-two confirmation that **bare factory has a complexity ceiling on complex codebases**. Triage today decides *how* to do work; it doesn't decide whether to try.

## What we already have

- `src/triage/heuristic.ts` — routing + scope (single / workflow / team) + thin-description detection (140-char threshold)
- `src/triage/llm.ts` — LLM-backed triage decisions
- `src/triage/tiered.ts` — tiered escalation
- `ClarificationRecord` in `src/state/store.ts` — \"wait for human in Slack thread\" state
- Slack mid-task clarification: when an agent gets stuck, it can ask in the thread and pause until answered

What that flow gets right: the agent CAN ask. What it gets wrong for the cases above: **clarification happens too late — after dispatch, deep into the work, when half a PR is already wrong and the agent doesn't know enough to ask the right questions.** Nicole's engineer let the agent run past every gate. Marcin's bare \"linear → PR\" run dispatched issues that no amount of mid-task clarification could rescue.

## What to build

A **pre-dispatch triage gate** that runs before any coding agent is spawned. Reads the issue, the linked design/spec/screenshots, and a quick scan of the affected files. Outputs one of:

- **ship** → dispatch to factory normally
- **needs-spec** → don't dispatch. Post specific questions to the Slack thread (or Linear comments) describing what's ambiguous. Wait for answers. Re-evaluate.
- **too-complex** → don't dispatch. Post a structured \"why this isn't suitable for autonomy\" message. Suggest splitting, scope reduction, or human pickup.

The decision must be visible — *why* an issue was classified the way it was, what specific signals tipped it.

## Scope — concrete work

### Phase 1: new `triage/complexity.ts` engine alongside existing engines

- [ ] New decision type `ComplexityVerdict = 'ship' | 'needs-spec' | 'too-complex'` extending `TriageDecision`
- [ ] LLM-backed classifier with structured-output schema (likely re-use the patterns in `llm.ts`)
- [ ] Signals to weigh:
  - Description quality (length is a weak proxy; semantic completeness is better)
  - Linked design/Figma/spec presence
  - Number of files likely touched (route-detection already exists in heuristic.ts)
  - Cross-surface changes (UI + backend + state = higher risk)
  - Edge case enumeration in the description (\"what about X\" / \"if Y then Z\" patterns = good sign)
  - Acceptance criteria specificity
  - Whether the codebase has tests covering the touched paths
- [ ] Output includes per-signal reasoning so failures are debuggable

### Phase 2: wire it into the dispatch pipeline as a gate

- [ ] In `orchestrator/factory.ts` (or wherever triage feeds dispatch): if verdict ≠ `ship`, don't dispatch
- [ ] For `needs-spec`: post structured Slack thread / Linear comment with specific questions; persist a `WaitingSpecRecord` (mirrors `ClarificationRecord`); re-trigger when the issue is updated
- [ ] For `too-complex`: post the rationale, label the issue (`needs-human` or similar), exit cleanly

### Phase 3: feedback loop

- [ ] Track outcomes: when we ship vs needs-spec vs too-complex, what actually happened to the PR? Merged clean / merged-with-fixes / closed-without-merge / human-took-over?
- [ ] Per-repo calibration — the threshold for \"too complex\" in a small startup repo ≠ enterprise monorepo
- [ ] Surface metrics: % of issues gated, distribution of verdicts, downstream merge rate by verdict

### Phase 4 (stretch): proactive issue-creation gate

Marcin's framing was *\"at the point of creating linear issues.\"* So beyond gating dispatch, optionally run the same classifier when an issue is CREATED:

- [ ] Linear/GitHub webhook on issue creation
- [ ] Inline comment from factory: \"this looks like a 2-hour autonomous task — want me to handle it?\" or \"this needs more scoping before we can dispatch — here's what's missing\"
- [ ] Author can address the gaps and re-trigger evaluation

## Acceptance / success

Measured outcomes after rollout on a real backlog:
- **Merge-rate of dispatched PRs goes up** (because we stopped dispatching the doomed ones)
- **Time-to-clarification goes down** (questions asked at the right moment, not 30 minutes into a wrong implementation)
- **\"Plate of spaghetti\" rate goes to ~0** — PRs that miss obvious requirements should be caught at the gate

Demonstrable to a prospect (Marcin specifically): show a complex issue → gate refuses → posts specific questions → after a human edit, gate accepts → factory dispatches → clean PR. The contrast vs bare factory is the demo.

## Non-goals

- Not replacing the existing heuristic/LLM triage engines — this is a NEW gate that runs after current triage classifies scope/routes
- Not making the gate so conservative it refuses everything (defeats the purpose)
- Not requiring perfect classification on day 1 — calibrate against real outcomes, learn

## Open questions

1. **Where does the gate live in the pipeline?** Most natural is after current triage (we know scope/routes) but before agent dispatch. Confirm with the orchestrator flow.
2. **What signals matter most?** Description length is weak. Specific signals likely matter more (acceptance criteria, edge-case enumeration, linked design). Need real-data calibration.
3. **Does the classifier need codebase context?** Routes already point at files — should the classifier read sample files via relayfile to assess complexity? Probably yes for v2, no for v1.
4. **Slack vs Linear for clarification questions?** Slack is current pattern but Linear-native comments are where issue authors live. Probably both, configurable per repo.
5. **How does this interact with the existing `ClarificationRecord` mid-task flow?** Two layers (pre-dispatch + mid-task) both legitimate, but worth thinking about whether mid-task clarification effectively becomes the failure case (\"gate said ship, agent still hit ambiguity\").

## Related

- Customer signal: `sales/nicole-turnage/transcript-06-23-26.txt` (Nicole / Apricot — zero-touch failure)
- Customer signal: `sales/nango/transcript-06-25-26.txt` (Marcin — verbatim spec for this feature)
- Cross-portfolio Push-vs-Pull doc: Notion \"Push vs. Pull\" page (gap labeled \"Triage gate\")
- Existing triage: `src/triage/heuristic.ts`, `src/triage/llm.ts`, `src/triage/tiered.ts`
- Existing clarification primitive: `ClarificationRecord` in `src/state/store.ts`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triage gate: classify complexity before dispatch, reject or sharpen specs instead of failing on ambiguous issues #46

Why now

What we already have

What to build

Scope — concrete work

Phase 1: new `triage/complexity.ts` engine alongside existing engines

Phase 2: wire it into the dispatch pipeline as a gate

Phase 3: feedback loop

Phase 4 (stretch): proactive issue-creation gate

Acceptance / success

Non-goals

Open questions

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Triage gate: classify complexity before dispatch, reject or sharpen specs instead of failing on ambiguous issues #46

Description

Why now

What we already have

What to build

Scope — concrete work

Phase 1: new triage/complexity.ts engine alongside existing engines

Phase 2: wire it into the dispatch pipeline as a gate

Phase 3: feedback loop

Phase 4 (stretch): proactive issue-creation gate

Acceptance / success

Non-goals

Open questions

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: new `triage/complexity.ts` engine alongside existing engines