Skip to content

refactor: act on the OCH duplication scan (providers, defineTool, cross-package single-sourcing)#280

Merged
theagenticguy merged 11 commits into
mainfrom
chore/remove-duckdb-ladybug-traces
Jul 4, 2026
Merged

refactor: act on the OCH duplication scan (providers, defineTool, cross-package single-sourcing)#280
theagenticguy merged 11 commits into
mainfrom
chore/remove-duckdb-ladybug-traces

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

Summary

Acts on the OCH duplication scan (jscpd — 3,450 dup lines, 4.38%). Net −1,110 source lines across the ingestion providers, the MCP tool layer, the pipeline phases, and cross-package rosters — with graphHash byte-identity preserved throughout (a new characterization harness enforces it) and every scan finding addressed.

What changed, by finding

  • Finding 1 — providers. Collapsed the near-identical extractCalls / extractDefinitions / extractHeritage loop shells into extractCallsGeneric / extractDefinitionsGeneric / extractHeritageRefBased in extract-helpers.ts, each config-driven per provider (receiver strategies, kind resolvers, promote/exported rules, heritage rules). Hoisted the two byte-identical private helpers (13 copies). python defs and the bespoke heritage providers (csharp/go/ts/java) stay custom; extractImports is irreducibly per-language and left alone.
  • Finding 2 — mcp/tools. Added a defineTool factory folding the register + withStore + try/catch + next-steps/staleness boilerplate (audit A8). Converted dependencies / license_audit / project_profile (and shared the context PROCESS_STEP reader) to @opencodehub/core-ops capabilities. Wire names, tool count (29), and annotations unchanged.
  • Finding 3 — pipeline. The cited clone was a duplicated extension→language switch, not analysis bodies. Routed cross-file / mro / accesses through the existing canonical detectLanguage (fixes a latent bug: cross-file's local switch silently omitted cobol) and extracted the incremental carry-forward loop into carryForwardEdges.
  • Cross-package. Single-sourced NODE_COLUMNS + the relation roster into core-typesfixes the D15 staleness bug where the MCP schema resource advertised only 26 of 73 logical node columns to SQL-authoring agents (now 73).

Safety

  • New characterization harness (characterization.test.ts): 16 providers × 4 extractors, value-locking golden, negative-self-check-proven. Gated every provider conversion.
  • incremental-determinism (graphHash full-vs-incremental) 4/4 throughout; mise run check green; two independent adversarial reviews found no drift.
  • dart calls are intentionally uncaptured and documented in DART_QUERY: dart's tree-sitter grammar has no invocation node, so a sound single-query capture is impossible and an unsound one would pollute CALLS-derived features with field-read false positives. Same class as the roadmap's Rust/Swift SCIP gaps.

Test plan

  • mise run check (lint + typecheck + all package tests + banned-strings) — exit 0
  • characterization.test.ts 65/65 · incremental-determinism.test.ts 4/4
  • server.test.ts 29-tool roster contract intact

🤖 Generated with Claude Code

theagenticguy and others added 11 commits July 3, 2026 14:22
Complete the single-file SQLite migration cleanup: delete dead code left by
the two-backend removal and scrub every prior-backend name from shipping
source, so `duckdb`/`ladybug`/`lbug`/`kuzu` appear only in decision history
(ADRs, CHANGELOGs) and removal prose.

Dead code deleted (~864 LOC):
- storage/src/schema-ddl.ts: whole file, superseded by inline DDL, zero callers
- cli analyze.ts: 345-LOC wide-column row decoder, superseded by payload-JSON
  rehydration through the store's typed listNodes/listEdges finders
- storage/src/test-utils/conformance.ts: 448-LOC assertIGraphStoreConformance
  suite with zero callers; drop the false "SqliteStore opts into this suite"
  claim in interface.ts. The load-bearing assertGraphParity/rebuildFromStore
  parity harness is kept (sqlite-parity.test.ts uses it).

Traces scrubbed to zero in live .ts source (9 files) plus 3 CI/acceptance
scripts and 8 per-package READMEs/docs that misdescribed current behavior.

Tests: fix one dead assertion (interface.test.ts store-path values), migrate
~40 fixture path literals and comments across 20 test files, and keep the
sqlite-adapter.test.ts guard that asserts no .lbug/.duckdb sidecar ever
reappears. No test file deleted.

Enforcement: check-banned-strings.sh now hard-bans the four prior-backend
names in packages/*/src (excluding tests + test-utils), fixes its own stale
"LadybugDB is the default backend" comments, so the traces cannot regrow.

Also fixes pack-determinism-audit.sh gating on the nonexistent .codehub/duck.db
(acceptance gate 16 was a silent permanent SKIP; now gates on store.sqlite).

Carries an in-flight fix: re-ingest cached scan.sarif on the analyze scan-skip
fast-path so a replace-mode bulkLoad no longer wipes prior findings to zero
(+ analyze-findings-survival.test.ts regression test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e guard

Two correctness bugs from the tech-debt audit (both "green tests, wrong
production behavior"):

B1 [P0] — MCP pack_codebase shipped a hollow pack + divergent packHash.
callRealPackEngine called generatePack({...}, { store }), omitting the
provenance bundle (commit, repoOriginUrl, chunkerFiles, grammarCommits) the
CLI wires. The manifest preimage binds commit/repoOriginUrl/fileHash/
chonkie_version/grammar_commits, so an MCP-triggered pack produced an empty
ast-chunks.jsonl, a byte-range-free context-bom, commit="", and a packHash
that silently diverged from the CLI's for the same repo+commit. Fix: move
resolvePackProvenance from cli/commands/code-pack.ts into @opencodehub/pack
(new provenance.ts) so both entry points call it; MCP now passes ...provenance.
provenance.test.ts pins that omitting provenance changes the packHash and that
equal provenance yields an equal packHash across entry points.

B2 [P1] — traverse recursive CTE used an unanchored instr(path, id) cycle
guard. Node ids have no disambiguating suffix, so one id being a SUBSTRING of
another already on the path (Class:a.ts:Foo in Class:a.ts:FooBar) was read as
a revisit and pruned, dropping the node and its whole subtree — silently
under-reporting blast radius in impact/api_impact/verdict. Fix: anchor both
operands on comma delimiters so only a whole id counts as a revisit.

B2b [P1] — the final SELECT grouped by node_id with a bare path column, so
SQLite picked an arbitrary tied row on equal-depth paths (diamond graphs),
making the reported predecessor/path nondeterministic. Fix: rank by
(depth, path) via ROW_NUMBER and keep rank 1 (lexicographically smallest).

traverse-substring-id.test.ts covers both; verified it fails against the
pre-fix query and passes after. B5 (pack-determinism-audit.sh gating on the
nonexistent duck.db) was fixed in the prior commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-of-concept

Introduce @opencodehub/core-ops — a transport-free "capability" layer the MCP
tools and CLI commands share, so the byte-identical resolve/finder/filter/
projection logic (audit findings D4/D7) lives once and each surface is a thin
adapter that only maps the plain Output into its own transport.

New package @opencodehub/core-ops (deps: analysis + storage + core-types; leaf,
no cycle — both cli and mcp already depend on all three):
- capability.ts: Capability<Input,Output> = { id, execute(input, ctx) }, where
  ctx carries the already-open { graph, temporal } store + resolved repoName.
  Input validation and repo/store lifecycle stay at each surface's boundary for
  now (the resolvers differ — MCP carries AMBIGUOUS_REPO the CLI does not);
  unifying them behind a StoreProvider + defineTool/defineCommand factories is
  the next step. CapabilityStore is the single seam the deferred A1 accessor
  collapse will flip.
- string-or.ts: the one canonical stringOr (kills the D7 copy in both surfaces).
- caps/findings.ts: findingsCapability — the shared reader (listFindings push-
  down of severity+ruleId), TS post-finder (severity="none", scanner, filePath
  substring), and row projection. Zod-free: each surface keeps its own schema
  (MCP raw-shape for the SDK, CLI commander flags), which was never the
  duplicated part.
- caps/findings.test.ts: 6 unit tests over a fake CapabilityStore — the first
  isolated coverage of that shared logic.

Rewire both surfaces to call findingsCapability, unchanged public behavior:
- cli/commands/findings.ts 111->63 LOC; keeps runFindings(opts) + storeFactory
  seam; findings.test.ts stays green (3/3).
- mcp/tools/list-findings.ts 191->147 LOC; keeps runListFindings /
  registerListFindingsTool signatures + the full next_steps/staleness envelope.

Verified: core-ops + cli findings tests green; every edited file typecheck-clean
and biome-clean. (The mcp package cannot be test-run in this sandbox — its zod
install is corrupted, unrelated to this change; list-findings.ts itself
typechecks clean.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the new workspace package (its analysis/storage/core-types workspace
links + typescript/@types/node dev deps) in pnpm-lock.yaml. Companion to
51ba06e, which added the package but not its lockfile entry. Run via
`pnpm install --no-frozen-lockfile`; no other dependency changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ix D15 schema staleness

Lift NODE_COLUMNS + RELATION_COLUMNS to a new core-types/node-columns.ts and
have storage/column-encode.ts and mcp/resources/repo-schema.ts import the one
copy; point storage/relations.ts at core-types RELATION_TYPES. Fixes the D15
staleness bug where the MCP schema resource advertised only 26 of 73 logical
node columns to SQL-authoring agents (now 73). Adds drift-guard tests asserting
storage rosters deep-equal core-types.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed capabilities

Fold the register-tool + withStore + try/catch + next-steps/staleness envelope
boilerplate (audit A8) into one defineTool factory, parameterized by a capability
+ args-projector + presenter. Convert dependencies/license_audit/project_profile
to thin adapters over new core-ops capabilities; retire their local stringOr
copies. Wire names, tool count (29), and annotations unchanged, so the server
contract test stays green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the near-identical extractCalls/extractDefinitions/extractHeritage loop
shells into extractCallsGeneric + extractDefinitionsGeneric + extractHeritageRefBased
in extract-helpers.ts, each config-driven per provider (receiver strategies,
kind resolvers, promote/exported rules, heritage rules). Hoist findNameInside +
qualifiedForCapture (13 copies). python defs + bespoke heritage (csharp/go/ts/java)
kept custom; extractImports left per-language. Adds a characterization harness
(16 providers x 4 extractors, value-locking golden) so graphHash byte-identity is
enforced on every conversion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…apability

Lift the byte-identical fetchProcessParticipation reader out of the CLI and MCP
context implementations into a shared contextCapability. resolveTarget + CALLS
traversal stay per-surface (they genuinely differ); the MCP presenter
(owner/cochange/confidence/buckets) stays MCP-side. Output envelopes unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nguage

Replace the duplicated inferLanguageFromFile extension switch in cross-file/mro/
accesses with the existing detectLanguage helper (fixes a latent bug where
cross-file's local switch silently omitted cobol) and extract the cross-file/mro
incremental carry-forward loop into incremental-helper.carryForwardEdges. Adds an
extension-parity pin + cobol/multi-dot determinism fixtures. Documents in DART_QUERY
why dart calls are intentionally uncaptured (grammar has no invocation node).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ount

Seven durable lessons from the duplication-scan remediation (characterization
harness, config-factory collapse, grammar-reachability, jscpd overcounting,
dead-guard unification, template-literal DSL comment). Correct ROADMAP edge-kind
count 23->25.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
packages/core-ops landed without its scope being enumerated, so commitlint
rejects the two core-ops-scoped commits already on this branch. Add the scope
per the config's own 'extend when new workspace packages land' note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@theagenticguy theagenticguy merged commit 337dc5e into main Jul 4, 2026
38 checks passed
@theagenticguy theagenticguy deleted the chore/remove-duckdb-ladybug-traces branch July 4, 2026 01:51
@github-actions github-actions Bot mentioned this pull request Jul 4, 2026
theagenticguy pushed a commit that referenced this pull request Jul 4, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.10.8</summary>

##
[0.10.8](root-v0.10.7...root-v0.10.8)
(2026-07-04)


### Refactoring

* act on the OCH duplication scan (providers, defineTool, cross-package
single-sourcing)
([#280](#280))
([337dc5e](337dc5e))
</details>

<details><summary>cli: 0.10.8</summary>

##
[0.10.8](cli-v0.10.7...cli-v0.10.8)
(2026-07-04)


### Refactoring

* act on the OCH duplication scan (providers, defineTool, cross-package
single-sourcing)
([#280](#280))
([337dc5e](337dc5e))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant