Why now
Two independent prospect calls (Nicole Turnage @ Apricot, 2026-06-23; Nango / Marcin, 2026-06-25) hit the same failure mode with factory-shape products:
Nicole: zero-touch Linear → PR pipeline produced "a giant plate of spaghetti" on an under-specified ticket. The agent made assumptions the spec didn't cover, didn't ask for clarification, shipped a PR that missed three branches of a five-step flow.
Marcin (verbatim):
"We started from, can we go from linear to a PR? Which is, like, low hanging fruit. But it is so complicated. A lot of the stuff is so complicated that eventually an engineer has to look. The prompting of an issue is the easy part. It takes 10% of the time, then the rest is 90%."
He then proposed the fix verbatim:
"It would be useful where at the point of creating linear issues, it could figure out how complicated a solution to this is. And suggest that I can just do this now. Like, if it just basically finds out that hey, this is pretty easy, then it triggers a run to give you a PR. Unless there are some things here that we might not be happy about, so let's not do it."
This is two-of-two confirmation that bare factory has a complexity ceiling on complex codebases. Triage today decides how to do work; it doesn't decide whether to try.
What we already have
src/triage/heuristic.ts — routing + scope (single / workflow / team) + thin-description detection (140-char threshold)
src/triage/llm.ts — LLM-backed triage decisions
src/triage/tiered.ts — tiered escalation
ClarificationRecord in src/state/store.ts — "wait for human in Slack thread" state
- Slack mid-task clarification: when an agent gets stuck, it can ask in the thread and pause until answered
What that flow gets right: the agent CAN ask. What it gets wrong for the cases above: clarification happens too late — after dispatch, deep into the work, when half a PR is already wrong and the agent doesn't know enough to ask the right questions. Nicole's engineer let the agent run past every gate. Marcin's bare "linear → PR" run dispatched issues that no amount of mid-task clarification could rescue.
What to build
A pre-dispatch triage gate that runs before any coding agent is spawned. Reads the issue, the linked design/spec/screenshots, and a quick scan of the affected files. Outputs one of:
- ship → dispatch to factory normally
- needs-spec → don't dispatch. Post specific questions to the Slack thread (or Linear comments) describing what's ambiguous. Wait for answers. Re-evaluate.
- too-complex → don't dispatch. Post a structured "why this isn't suitable for autonomy" message. Suggest splitting, scope reduction, or human pickup.
The decision must be visible — why an issue was classified the way it was, what specific signals tipped it.
Scope — concrete work
Phase 1: new triage/complexity.ts engine alongside existing engines
Phase 2: wire it into the dispatch pipeline as a gate
Phase 3: feedback loop
Phase 4 (stretch): proactive issue-creation gate
Marcin's framing was "at the point of creating linear issues." So beyond gating dispatch, optionally run the same classifier when an issue is CREATED:
Acceptance / success
Measured outcomes after rollout on a real backlog:
- Merge-rate of dispatched PRs goes up (because we stopped dispatching the doomed ones)
- Time-to-clarification goes down (questions asked at the right moment, not 30 minutes into a wrong implementation)
- "Plate of spaghetti" rate goes to ~0 — PRs that miss obvious requirements should be caught at the gate
Demonstrable to a prospect (Marcin specifically): show a complex issue → gate refuses → posts specific questions → after a human edit, gate accepts → factory dispatches → clean PR. The contrast vs bare factory is the demo.
Non-goals
- Not replacing the existing heuristic/LLM triage engines — this is a NEW gate that runs after current triage classifies scope/routes
- Not making the gate so conservative it refuses everything (defeats the purpose)
- Not requiring perfect classification on day 1 — calibrate against real outcomes, learn
Open questions
- Where does the gate live in the pipeline? Most natural is after current triage (we know scope/routes) but before agent dispatch. Confirm with the orchestrator flow.
- What signals matter most? Description length is weak. Specific signals likely matter more (acceptance criteria, edge-case enumeration, linked design). Need real-data calibration.
- Does the classifier need codebase context? Routes already point at files — should the classifier read sample files via relayfile to assess complexity? Probably yes for v2, no for v1.
- Slack vs Linear for clarification questions? Slack is current pattern but Linear-native comments are where issue authors live. Probably both, configurable per repo.
- How does this interact with the existing
ClarificationRecord mid-task flow? Two layers (pre-dispatch + mid-task) both legitimate, but worth thinking about whether mid-task clarification effectively becomes the failure case ("gate said ship, agent still hit ambiguity").
Related
- Customer signal:
sales/nicole-turnage/transcript-06-23-26.txt (Nicole / Apricot — zero-touch failure)
- Customer signal:
sales/nango/transcript-06-25-26.txt (Marcin — verbatim spec for this feature)
- Cross-portfolio Push-vs-Pull doc: Notion "Push vs. Pull" page (gap labeled "Triage gate")
- Existing triage:
src/triage/heuristic.ts, src/triage/llm.ts, src/triage/tiered.ts
- Existing clarification primitive:
ClarificationRecord in src/state/store.ts
Why now
Two independent prospect calls (Nicole Turnage @ Apricot, 2026-06-23; Nango / Marcin, 2026-06-25) hit the same failure mode with factory-shape products:
Nicole: zero-touch Linear → PR pipeline produced "a giant plate of spaghetti" on an under-specified ticket. The agent made assumptions the spec didn't cover, didn't ask for clarification, shipped a PR that missed three branches of a five-step flow.
Marcin (verbatim):
He then proposed the fix verbatim:
This is two-of-two confirmation that bare factory has a complexity ceiling on complex codebases. Triage today decides how to do work; it doesn't decide whether to try.
What we already have
src/triage/heuristic.ts— routing + scope (single / workflow / team) + thin-description detection (140-char threshold)src/triage/llm.ts— LLM-backed triage decisionssrc/triage/tiered.ts— tiered escalationClarificationRecordinsrc/state/store.ts— "wait for human in Slack thread" stateWhat that flow gets right: the agent CAN ask. What it gets wrong for the cases above: clarification happens too late — after dispatch, deep into the work, when half a PR is already wrong and the agent doesn't know enough to ask the right questions. Nicole's engineer let the agent run past every gate. Marcin's bare "linear → PR" run dispatched issues that no amount of mid-task clarification could rescue.
What to build
A pre-dispatch triage gate that runs before any coding agent is spawned. Reads the issue, the linked design/spec/screenshots, and a quick scan of the affected files. Outputs one of:
The decision must be visible — why an issue was classified the way it was, what specific signals tipped it.
Scope — concrete work
Phase 1: new
triage/complexity.tsengine alongside existing enginesComplexityVerdict = 'ship' | 'needs-spec' | 'too-complex'extendingTriageDecisionllm.ts)Phase 2: wire it into the dispatch pipeline as a gate
orchestrator/factory.ts(or wherever triage feeds dispatch): if verdict ≠ship, don't dispatchneeds-spec: post structured Slack thread / Linear comment with specific questions; persist aWaitingSpecRecord(mirrorsClarificationRecord); re-trigger when the issue is updatedtoo-complex: post the rationale, label the issue (needs-humanor similar), exit cleanlyPhase 3: feedback loop
Phase 4 (stretch): proactive issue-creation gate
Marcin's framing was "at the point of creating linear issues." So beyond gating dispatch, optionally run the same classifier when an issue is CREATED:
Acceptance / success
Measured outcomes after rollout on a real backlog:
Demonstrable to a prospect (Marcin specifically): show a complex issue → gate refuses → posts specific questions → after a human edit, gate accepts → factory dispatches → clean PR. The contrast vs bare factory is the demo.
Non-goals
Open questions
ClarificationRecordmid-task flow? Two layers (pre-dispatch + mid-task) both legitimate, but worth thinking about whether mid-task clarification effectively becomes the failure case ("gate said ship, agent still hit ambiguity").Related
sales/nicole-turnage/transcript-06-23-26.txt(Nicole / Apricot — zero-touch failure)sales/nango/transcript-06-25-26.txt(Marcin — verbatim spec for this feature)src/triage/heuristic.ts,src/triage/llm.ts,src/triage/tiered.tsClarificationRecordinsrc/state/store.ts