Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating recurring-task signal

# Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating "recurring task" signal

## TL;DR

`skillopt_sleep` harvests every `~/.claude/projects/<slug>/*.jsonl` transcript and turns each into a `TaskRecord` whose intent is the first user prompt. But Claude Code generates transcripts for **three kinds of sessions** — and only one of them is a real user task:

| Session type | `entrypoint` | `isSidechain` | What it actually is |
|---|---|---|---|
| Interactive Claude Code | `"cli"` | `false` | A real user prompt (what mining wants) |
| Headless `claude -p` | `"sdk-cli"` | `false` | A script / cron / Stop-hook / SDK call |
| Task/Agent sub-agent | varies | **`true`** | One parent session's fan-out, not a separate user ask |

`harvest.py:digest_transcript` reads `cwd`, `gitBranch`, `timestamp`, `message.content`, but ignores both `entrypoint` and `isSidechain` (they're present at the top level of every transcript record produced by recent Claude Code versions). The result is that **infrastructure traffic dominates the mined corpus** on any machine where the user has invested in automation (skills with sub-agents, Stop hooks that call `claude -p`, etc.).

## Concrete evidence (one user's machine)

On my machine, `python -m skillopt_sleep harvest --scope all --source claude --lookback-hours 720` returned 40 tasks from 120 sessions. The top three "recurring patterns" by frequency:

| Pattern | Mined tasks | Real source |
|---|---|---|
| `Translate the file <chunk000N.md> to Chinese …` | 9 | One invocation of an existing `translate-book` skill which fans out 9 parallel sub-agents (`isSidechain: true`) |
| `You are merging a personal AI memory file (X) across machines (m4/m5/m2) …` | 4 | A user-written Stop hook (`~/.tasklog/memory_merge.py`) that runs `claude -p` automatically at session end (`entrypoint: "sdk-cli"`) |
| `Score how well the response satisfies the rubric …` / `You are SkillOpt's optimizer …` | 38 (after running the deterministic experiment) | SkillOpt-Sleep's own optimizer/judge prompts that leaked from `experiments/run_experiment.py` into `~/.claude/projects/` (see related bug below) |

In other words: **the top "user pains" the miner surfaced were each already-solved infrastructure** — one is a skill, one is a script, one is SkillOpt-Sleep itself.

False-positive rate on the top-3 ranked candidates: **3/3** on this corpus. If a deficient-seed `CLAUDE.md` were nightly optimized against this mining signal, the optimizer would propose edits to handle phantom recurrence, not actual user pain.

## Why this matters

The README and `docs/sleep/README.md` frame SkillOpt-Sleep as learning from "your past sessions" — implicitly, **the user's** past sessions. For the cohort most likely to adopt the plugin (people who already use Claude Code intensively and have built automation around it), the mined corpus is dominated by their own automation surface rather than their own asks. The validation gate keeps the worst case bounded, but the *signal* the optimizer chases is structurally wrong, which lowers expected upside and inflates token spend on uninteresting candidate edits.

## Reproduce

```bash
# Pick any ~/.claude/projects/*/<id>.jsonl that came from a sub-agent or claude -p,
# and verify the fields are there:
jq -s '.[0] | {entrypoint, isSidechain}' < ~/.claude/projects/<slug>/<id>.jsonl
# Expect: {"entrypoint": "sdk-cli", "isSidechain": false}   for `claude -p`
# Expect: {"entrypoint": "cli",     "isSidechain": true}    for a sub-agent
```

Then run harvest on that machine and inspect how many returned tasks have a one-shot prompt template that no human would type twice. On a heavily-automated machine the answer dominates.

## Proposed fix (small)

`harvest.py:digest_transcript` already iterates every record. Capture the first record's `entrypoint` and `isSidechain` into `SessionDigest`, then filter at `harvest()`:

```python
# in digest_transcript, when we see the first record with these fields:
entrypoint = rec.get("entrypoint") or entrypoint
is_sidechain = rec.get("isSidechain") if "isSidechain" in rec else is_sidechain

# pass them through to SessionDigest, then in harvest():
def _is_user_session(d: SessionDigest) -> bool:
    if d.is_sidechain:
        return False              # sub-agent fan-out
    if d.entrypoint == "sdk-cli":
        return False              # headless claude -p (script / hook / SDK)
    return True
```

Defaults: include only `entrypoint == "cli"` and `isSidechain == false`. Backward-compat / opt-out via an env var or a `--include-headless` flag for users who *do* want their automation surface mined (e.g. someone training a skill that's specifically invoked via SDK).

Both filters are derived directly from fields Claude Code already writes; no schema change, no LLM call, no new dependency.

## Optional follow-on signals (cheaper but noisier)

When `entrypoint`/`isSidechain` aren't present (older Claude Code transcripts), two soft signals correlate strongly with infrastructure in practice:

- **`cwd` starts with `/private/var/folders/` or `/tmp/` or `$TMPDIR`** → almost certainly a `mkdtemp()`-spawned session, not a user task.
- **`n_user_turns == 1` and `len(user_prompts[0]) > 800`** → almost certainly a templated invocation, not an interactive ask (real users iterate).

A user-configurable allow/deny list of project paths would also help (some users *do* run real work in scratch dirs).

## Related: a separate but adjacent bug

`python -m skillopt_sleep.experiments.run_experiment` writes its synthetic Claude session transcripts into the real `~/.claude/projects/`, leaving 142 fake project directories like `-private-var-folders-ks-.../-skillopt-sleep-claude-*` on my machine after a single run (`run_experiment --persona researcher --persona programmer`). Subsequent harvests are then polluted by SkillOpt-Sleep's *own* optimizer/judge prompts — at least on first read this is the experiment harness forgetting to pass `--claude-home <tmpdir>`. Happy to file this as its own issue if you'd prefer; flagging it here because it amplifies exactly the failure mode above.

## Why I'm filing this

I ran `dry-run` and the two deterministic experiments to evaluate adoption. The dry-run's mining output (40 tasks, ranked by recurrence) looked impressively well-targeted on first read — until I traced each top cluster and found that each was already solved by existing local automation. That's a UX problem in addition to a signal-quality one: the plugin looks like it's working when it's mostly cataloging its own host's automation.

Happy to draft a PR if the maintainers prefer that route. I'd want to keep it focused on `harvest.py` (drop in two `SessionDigest` fields + the default filter + an opt-out env var) and leave the pollution bug to a separate change. Let me know.


Pattern	Mined tasks	Real source
`Translate the file <chunk000N.md> to Chinese …`	9	One invocation of an existing `translate-book` skill which fans out 9 parallel sub-agents (`isSidechain: true`)
`You are merging a personal AI memory file (X) across machines (m4/m5/m2) …`	4	A user-written Stop hook (`~/.tasklog/memory_merge.py`) that runs `claude -p` automatically at session end (`entrypoint: "sdk-cli"`)
`Score how well the response satisfies the rubric …` / `You are SkillOpt's optimizer …`	38 (after running the deterministic experiment)	SkillOpt-Sleep's own optimizer/judge prompts that leaked from `experiments/run_experiment.py` into `~/.claude/projects/` (see related bug below)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating recurring-task signal #62

Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating "recurring task" signal

TL;DR

Concrete evidence (one user's machine)

Why this matters

Reproduce

Proposed fix (small)

Optional follow-on signals (cheaper but noisier)

Related: a separate but adjacent bug

Why I'm filing this

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Session type	`entrypoint`	`isSidechain`	What it actually is
Interactive Claude Code	`"cli"`	`false`	A real user prompt (what mining wants)
Headless `claude -p`	`"sdk-cli"`	`false`	A script / cron / Stop-hook / SDK call
Task/Agent sub-agent	varies	`true`	One parent session's fan-out, not a separate user ask

Mining counts headless claude -p calls and sub-agent fan-out as user tasks, inflating recurring-task signal #62

Description

Mining counts headless claude -p calls and sub-agent fan-out as user tasks, inflating "recurring task" signal

TL;DR

Concrete evidence (one user's machine)

Why this matters

Reproduce

Proposed fix (small)

Optional follow-on signals (cheaper but noisier)

Related: a separate but adjacent bug

Why I'm filing this

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating recurring-task signal #62

Mining counts headless `claude -p` calls and sub-agent fan-out as user tasks, inflating "recurring task" signal