Mining counts headless claude -p calls and sub-agent fan-out as user tasks, inflating "recurring task" signal
TL;DR
skillopt_sleep harvests every ~/.claude/projects/<slug>/*.jsonl transcript and turns each into a TaskRecord whose intent is the first user prompt. But Claude Code generates transcripts for three kinds of sessions — and only one of them is a real user task:
| Session type |
entrypoint |
isSidechain |
What it actually is |
| Interactive Claude Code |
"cli" |
false |
A real user prompt (what mining wants) |
Headless claude -p |
"sdk-cli" |
false |
A script / cron / Stop-hook / SDK call |
| Task/Agent sub-agent |
varies |
true |
One parent session's fan-out, not a separate user ask |
harvest.py:digest_transcript reads cwd, gitBranch, timestamp, message.content, but ignores both entrypoint and isSidechain (they're present at the top level of every transcript record produced by recent Claude Code versions). The result is that infrastructure traffic dominates the mined corpus on any machine where the user has invested in automation (skills with sub-agents, Stop hooks that call claude -p, etc.).
Concrete evidence (one user's machine)
On my machine, python -m skillopt_sleep harvest --scope all --source claude --lookback-hours 720 returned 40 tasks from 120 sessions. The top three "recurring patterns" by frequency:
| Pattern |
Mined tasks |
Real source |
Translate the file <chunk000N.md> to Chinese … |
9 |
One invocation of an existing translate-book skill which fans out 9 parallel sub-agents (isSidechain: true) |
You are merging a personal AI memory file (X) across machines (m4/m5/m2) … |
4 |
A user-written Stop hook (~/.tasklog/memory_merge.py) that runs claude -p automatically at session end (entrypoint: "sdk-cli") |
Score how well the response satisfies the rubric … / You are SkillOpt's optimizer … |
38 (after running the deterministic experiment) |
SkillOpt-Sleep's own optimizer/judge prompts that leaked from experiments/run_experiment.py into ~/.claude/projects/ (see related bug below) |
In other words: the top "user pains" the miner surfaced were each already-solved infrastructure — one is a skill, one is a script, one is SkillOpt-Sleep itself.
False-positive rate on the top-3 ranked candidates: 3/3 on this corpus. If a deficient-seed CLAUDE.md were nightly optimized against this mining signal, the optimizer would propose edits to handle phantom recurrence, not actual user pain.
Why this matters
The README and docs/sleep/README.md frame SkillOpt-Sleep as learning from "your past sessions" — implicitly, the user's past sessions. For the cohort most likely to adopt the plugin (people who already use Claude Code intensively and have built automation around it), the mined corpus is dominated by their own automation surface rather than their own asks. The validation gate keeps the worst case bounded, but the signal the optimizer chases is structurally wrong, which lowers expected upside and inflates token spend on uninteresting candidate edits.
Reproduce
# Pick any ~/.claude/projects/*/<id>.jsonl that came from a sub-agent or claude -p,
# and verify the fields are there:
jq -s '.[0] | {entrypoint, isSidechain}' < ~/.claude/projects/<slug>/<id>.jsonl
# Expect: {"entrypoint": "sdk-cli", "isSidechain": false} for `claude -p`
# Expect: {"entrypoint": "cli", "isSidechain": true} for a sub-agent
Then run harvest on that machine and inspect how many returned tasks have a one-shot prompt template that no human would type twice. On a heavily-automated machine the answer dominates.
Proposed fix (small)
harvest.py:digest_transcript already iterates every record. Capture the first record's entrypoint and isSidechain into SessionDigest, then filter at harvest():
# in digest_transcript, when we see the first record with these fields:
entrypoint = rec.get("entrypoint") or entrypoint
is_sidechain = rec.get("isSidechain") if "isSidechain" in rec else is_sidechain
# pass them through to SessionDigest, then in harvest():
def _is_user_session(d: SessionDigest) -> bool:
if d.is_sidechain:
return False # sub-agent fan-out
if d.entrypoint == "sdk-cli":
return False # headless claude -p (script / hook / SDK)
return True
Defaults: include only entrypoint == "cli" and isSidechain == false. Backward-compat / opt-out via an env var or a --include-headless flag for users who do want their automation surface mined (e.g. someone training a skill that's specifically invoked via SDK).
Both filters are derived directly from fields Claude Code already writes; no schema change, no LLM call, no new dependency.
Optional follow-on signals (cheaper but noisier)
When entrypoint/isSidechain aren't present (older Claude Code transcripts), two soft signals correlate strongly with infrastructure in practice:
cwd starts with /private/var/folders/ or /tmp/ or $TMPDIR → almost certainly a mkdtemp()-spawned session, not a user task.
n_user_turns == 1 and len(user_prompts[0]) > 800 → almost certainly a templated invocation, not an interactive ask (real users iterate).
A user-configurable allow/deny list of project paths would also help (some users do run real work in scratch dirs).
Related: a separate but adjacent bug
python -m skillopt_sleep.experiments.run_experiment writes its synthetic Claude session transcripts into the real ~/.claude/projects/, leaving 142 fake project directories like -private-var-folders-ks-.../-skillopt-sleep-claude-* on my machine after a single run (run_experiment --persona researcher --persona programmer). Subsequent harvests are then polluted by SkillOpt-Sleep's own optimizer/judge prompts — at least on first read this is the experiment harness forgetting to pass --claude-home <tmpdir>. Happy to file this as its own issue if you'd prefer; flagging it here because it amplifies exactly the failure mode above.
Why I'm filing this
I ran dry-run and the two deterministic experiments to evaluate adoption. The dry-run's mining output (40 tasks, ranked by recurrence) looked impressively well-targeted on first read — until I traced each top cluster and found that each was already solved by existing local automation. That's a UX problem in addition to a signal-quality one: the plugin looks like it's working when it's mostly cataloging its own host's automation.
Happy to draft a PR if the maintainers prefer that route. I'd want to keep it focused on harvest.py (drop in two SessionDigest fields + the default filter + an opt-out env var) and leave the pollution bug to a separate change. Let me know.
Mining counts headless
claude -pcalls and sub-agent fan-out as user tasks, inflating "recurring task" signalTL;DR
skillopt_sleepharvests every~/.claude/projects/<slug>/*.jsonltranscript and turns each into aTaskRecordwhose intent is the first user prompt. But Claude Code generates transcripts for three kinds of sessions — and only one of them is a real user task:entrypointisSidechain"cli"falseclaude -p"sdk-cli"falsetrueharvest.py:digest_transcriptreadscwd,gitBranch,timestamp,message.content, but ignores bothentrypointandisSidechain(they're present at the top level of every transcript record produced by recent Claude Code versions). The result is that infrastructure traffic dominates the mined corpus on any machine where the user has invested in automation (skills with sub-agents, Stop hooks that callclaude -p, etc.).Concrete evidence (one user's machine)
On my machine,
python -m skillopt_sleep harvest --scope all --source claude --lookback-hours 720returned 40 tasks from 120 sessions. The top three "recurring patterns" by frequency:Translate the file <chunk000N.md> to Chinese …translate-bookskill which fans out 9 parallel sub-agents (isSidechain: true)You are merging a personal AI memory file (X) across machines (m4/m5/m2) …~/.tasklog/memory_merge.py) that runsclaude -pautomatically at session end (entrypoint: "sdk-cli")Score how well the response satisfies the rubric …/You are SkillOpt's optimizer …experiments/run_experiment.pyinto~/.claude/projects/(see related bug below)In other words: the top "user pains" the miner surfaced were each already-solved infrastructure — one is a skill, one is a script, one is SkillOpt-Sleep itself.
False-positive rate on the top-3 ranked candidates: 3/3 on this corpus. If a deficient-seed
CLAUDE.mdwere nightly optimized against this mining signal, the optimizer would propose edits to handle phantom recurrence, not actual user pain.Why this matters
The README and
docs/sleep/README.mdframe SkillOpt-Sleep as learning from "your past sessions" — implicitly, the user's past sessions. For the cohort most likely to adopt the plugin (people who already use Claude Code intensively and have built automation around it), the mined corpus is dominated by their own automation surface rather than their own asks. The validation gate keeps the worst case bounded, but the signal the optimizer chases is structurally wrong, which lowers expected upside and inflates token spend on uninteresting candidate edits.Reproduce
Then run harvest on that machine and inspect how many returned tasks have a one-shot prompt template that no human would type twice. On a heavily-automated machine the answer dominates.
Proposed fix (small)
harvest.py:digest_transcriptalready iterates every record. Capture the first record'sentrypointandisSidechainintoSessionDigest, then filter atharvest():Defaults: include only
entrypoint == "cli"andisSidechain == false. Backward-compat / opt-out via an env var or a--include-headlessflag for users who do want their automation surface mined (e.g. someone training a skill that's specifically invoked via SDK).Both filters are derived directly from fields Claude Code already writes; no schema change, no LLM call, no new dependency.
Optional follow-on signals (cheaper but noisier)
When
entrypoint/isSidechainaren't present (older Claude Code transcripts), two soft signals correlate strongly with infrastructure in practice:cwdstarts with/private/var/folders/or/tmp/or$TMPDIR→ almost certainly amkdtemp()-spawned session, not a user task.n_user_turns == 1andlen(user_prompts[0]) > 800→ almost certainly a templated invocation, not an interactive ask (real users iterate).A user-configurable allow/deny list of project paths would also help (some users do run real work in scratch dirs).
Related: a separate but adjacent bug
python -m skillopt_sleep.experiments.run_experimentwrites its synthetic Claude session transcripts into the real~/.claude/projects/, leaving 142 fake project directories like-private-var-folders-ks-.../-skillopt-sleep-claude-*on my machine after a single run (run_experiment --persona researcher --persona programmer). Subsequent harvests are then polluted by SkillOpt-Sleep's own optimizer/judge prompts — at least on first read this is the experiment harness forgetting to pass--claude-home <tmpdir>. Happy to file this as its own issue if you'd prefer; flagging it here because it amplifies exactly the failure mode above.Why I'm filing this
I ran
dry-runand the two deterministic experiments to evaluate adoption. The dry-run's mining output (40 tasks, ranked by recurrence) looked impressively well-targeted on first read — until I traced each top cluster and found that each was already solved by existing local automation. That's a UX problem in addition to a signal-quality one: the plugin looks like it's working when it's mostly cataloging its own host's automation.Happy to draft a PR if the maintainers prefer that route. I'd want to keep it focused on
harvest.py(drop in twoSessionDigestfields + the default filter + an opt-out env var) and leave the pollution bug to a separate change. Let me know.