Skip to content

Commit b8f3822

Browse files
Adds a new Agent Skill - Acquire-Codebase-Knowledge (#1373)
* feat(skill): add acquire-codebase-knowledge skill documentation * feat(templates): add architecture, concerns, conventions, integrations, stack, structure, and testing documentation templates * feat(references): add inquiry checkpoints and stack detection documentation * feat(scan): add script to collect project discovery information for acquire-codebase-knowledge skill * feat(skills): add acquire-codebase-knowledge skill for codebase mapping and documentation * feat(scan): enhance scan script with absolute path handling and improved output variable validation * feat(scan): replace bash script with Python script for project discovery information collection * feat(skills): update acquire-codebase-knowledge skill to replace scan.sh with scan.py
1 parent e163a40 commit b8f3822

File tree

12 files changed

+1450
-0
lines changed

12 files changed

+1450
-0
lines changed

docs/README.skills.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
2626

2727
| Name | Description | Bundled Assets |
2828
| ---- | ----------- | -------------- |
29+
| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`<br />`references/inquiry-checkpoints.md`<br />`references/stack-detection.md`<br />`scripts/scan.py` |
2930
| [add-educational-comments](../skills/add-educational-comments/SKILL.md) | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None |
3031
| [agent-governance](../skills/agent-governance/SKILL.md) | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:<br />- Building AI agents that call external tools (APIs, databases, file systems)<br />- Implementing policy-based access controls for agent tool usage<br />- Adding semantic intent classification to detect dangerous prompts<br />- Creating trust scoring systems for multi-agent workflows<br />- Building audit trails for agent actions and decisions<br />- Enforcing rate limits, content filters, or tool restrictions on agents<br />- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None |
3132
| [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md) | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.<br />Use this skill when:<br />- Evaluating an agent system's security posture before production deployment<br />- Running a compliance check against OWASP ASI 2026 standards<br />- Mapping existing security controls to the 10 agentic risks<br />- Generating a compliance report for security review or audit<br />- Comparing agent framework security features against the standard<br />- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None |
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
name: acquire-codebase-knowledge
3+
description: 'Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery.'
4+
license: MIT
5+
compatibility: 'Cross-platform. Requires Python 3.8+ and git. Run scripts/scan.py from the target project root.'
6+
metadata:
7+
version: "1.3"
8+
enhancements:
9+
- Multi-language manifest detection (25+ languages supported)
10+
- CI/CD pipeline detection (10+ platforms)
11+
- Container & orchestration detection
12+
- Code metrics by language
13+
- Security & compliance config detection
14+
- Performance testing markers
15+
argument-hint: 'Optional: specific area to focus on, e.g. "architecture only", "testing and concerns"'
16+
---
17+
18+
# Acquire Codebase Knowledge
19+
20+
Produces seven populated documents in `docs/codebase/` covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume.
21+
22+
## Output Contract (Required)
23+
24+
Before finishing, all of the following must be true:
25+
26+
1. Exactly these files exist in `docs/codebase/`: `STACK.md`, `STRUCTURE.md`, `ARCHITECTURE.md`, `CONVENTIONS.md`, `INTEGRATIONS.md`, `TESTING.md`, `CONCERNS.md`.
27+
2. Every claim is traceable to source files, config, or terminal output.
28+
3. Unknowns are marked as `[TODO]`; intent-dependent decisions are marked `[ASK USER]`.
29+
4. Every document includes a short "evidence" list with concrete file paths.
30+
5. Final response includes numbered `[ASK USER]` questions and intent-vs-reality divergences.
31+
32+
## Workflow
33+
34+
Copy and track this checklist:
35+
36+
```
37+
- [ ] Phase 1: Run scan, read intent documents
38+
- [ ] Phase 2: Investigate each documentation area
39+
- [ ] Phase 3: Populate all seven docs in docs/codebase/
40+
- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items
41+
```
42+
43+
## Focus Area Mode
44+
45+
If the user supplies a focus area (for example: "architecture only" or "testing and concerns"):
46+
47+
1. Always run Phase 1 in full.
48+
2. Fully complete focus-area documents first.
49+
3. For non-focus documents not yet analyzed, keep required sections present and mark unknowns as `[TODO]`.
50+
4. Still run the Phase 4 validation loop on all seven documents before final output.
51+
52+
### Phase 1: Scan and Read Intent
53+
54+
1. Run the scan script from the target project root:
55+
```bash
56+
python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt
57+
```
58+
Where `$SKILL_ROOT` is the absolute path to the skill folder. Works on Windows, macOS, and Linux.
59+
60+
**Quick start:** If you have the path inline:
61+
```bash
62+
python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt
63+
```
64+
65+
2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them.
66+
3. Summarise the stated project intent before reading any source code.
67+
68+
### Phase 2: Investigate
69+
70+
Use the scan output to answer questions for each of the seven templates. Load [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) for the full per-template question list.
71+
72+
If the stack is ambiguous (multiple manifest files, unfamiliar file types, no `package.json`), load [`references/stack-detection.md`](references/stack-detection.md).
73+
74+
### Phase 3: Populate Templates
75+
76+
Copy each template from `assets/templates/` into `docs/codebase/`. Fill in this order:
77+
78+
1. [STACK.md](assets/templates/STACK.md) — language, runtime, frameworks, all dependencies
79+
2. [STRUCTURE.md](assets/templates/STRUCTURE.md) — directory layout, entry points, key files
80+
3. [ARCHITECTURE.md](assets/templates/ARCHITECTURE.md) — layers, patterns, data flow
81+
4. [CONVENTIONS.md](assets/templates/CONVENTIONS.md) — naming, formatting, error handling, imports
82+
5. [INTEGRATIONS.md](assets/templates/INTEGRATIONS.md) — external APIs, databases, auth, monitoring
83+
6. [TESTING.md](assets/templates/TESTING.md) — frameworks, file organization, mocking strategy
84+
7. [CONCERNS.md](assets/templates/CONCERNS.md) — tech debt, bugs, security risks, perf bottlenecks
85+
86+
Use `[TODO]` for anything that cannot be determined from code. Use `[ASK USER]` where the right answer requires team intent.
87+
88+
### Phase 4: Validate, Repair, Verify
89+
90+
Run this mandatory validation loop before finalizing:
91+
92+
1. Validate each doc against `references/inquiry-checkpoints.md`.
93+
2. For each non-trivial claim, confirm at least one evidence reference exists.
94+
3. If any required section is missing or unsupported:
95+
- Fix the document.
96+
- Re-run validation.
97+
4. Repeat until all seven docs pass.
98+
99+
Then present a summary of all seven documents, list every `[ASK USER]` item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1.
100+
101+
Validation pass criteria:
102+
103+
- No unsupported claims.
104+
- No empty required sections.
105+
- Unknowns use `[TODO]` rather than assumptions.
106+
- Team-intent gaps are explicitly marked `[ASK USER]`.
107+
108+
---
109+
110+
## Gotchas
111+
112+
**Monorepos:** Root `package.json` may have no source — check for `workspaces`, `packages/`, or `apps/` directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately.
113+
114+
**Outdated README:** README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact.
115+
116+
**TypeScript path aliases:** `tsconfig.json` `paths` config means imports like `@/foo` don't map directly to the filesystem. Map aliases to real paths before documenting structure.
117+
118+
**Generated/compiled output:** Never document patterns from `dist/`, `build/`, `generated/`, `.next/`, `out/`, or `__pycache__/`. These are artefacts — document source conventions only.
119+
120+
**`.env.example` reveals required config:** Secrets are never committed. Read `.env.example`, `.env.template`, or `.env.sample` to discover required environment variables.
121+
122+
**`devDependencies` ≠ production stack:** Only `dependencies` (or equivalent, e.g. `[tool.poetry.dependencies]`) runs in production. Document linters, formatters, and test frameworks separately as dev tooling.
123+
124+
**Test TODOs ≠ production debt:** TODOs inside `test/`, `tests/`, `__tests__/`, or `spec/` are coverage gaps, not production technical debt. Separate them in `CONCERNS.md`.
125+
126+
**High-churn files = fragile areas:** Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in `CONCERNS.md`.
127+
128+
---
129+
130+
## Anti-Patterns
131+
132+
| ❌ Don't | ✅ Do instead |
133+
|---------|--------------|
134+
| "Uses Clean Architecture with Domain/Data layers." (when no such directories exist) | State only what directory structure actually shows. |
135+
| "This is a Next.js project." (without checking `package.json`) | Check `dependencies` first. State what's actually there. |
136+
| Guess the database from a variable name like `dbUrl` | Check manifest for `pg`, `mysql2`, `mongoose`, `prisma`, etc. |
137+
| Document `dist/` or `build/` naming patterns as conventions | Source files only. |
138+
139+
---
140+
141+
## Enhanced Scan Output Sections
142+
143+
The `scan.py` script now produce the following sections in addition to the original output:
144+
145+
- **CODE METRICS** — Total files, lines of code by language, largest files (complexity signals)
146+
- **CI/CD PIPELINES** — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.
147+
- **CONTAINERS & ORCHESTRATION** — Docker, Docker Compose, Kubernetes, Vagrant configs
148+
- **SECURITY & COMPLIANCE** — Snyk, Dependabot, SECURITY.md, SBOM, security policies
149+
- **PERFORMANCE & TESTING** — Benchmark configs, profiling markers, load testing tools
150+
151+
Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns.
152+
153+
---
154+
155+
## Bundled Assets
156+
157+
| Asset | When to load |
158+
|-------|-------------|
159+
| [`scripts/scan.py`](scripts/scan.py) | Phase 1 — run first, before reading any code (Python 3.8+ required) |
160+
161+
| [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) | Phase 2 — load for per-template investigation questions |
162+
| [`references/stack-detection.md`](references/stack-detection.md) | Phase 2 — only if stack is ambiguous |
163+
| [`assets/templates/STACK.md`](assets/templates/STACK.md) | Phase 3 step 1 |
164+
| [`assets/templates/STRUCTURE.md`](assets/templates/STRUCTURE.md) | Phase 3 step 2 |
165+
| [`assets/templates/ARCHITECTURE.md`](assets/templates/ARCHITECTURE.md) | Phase 3 step 3 |
166+
| [`assets/templates/CONVENTIONS.md`](assets/templates/CONVENTIONS.md) | Phase 3 step 4 |
167+
| [`assets/templates/INTEGRATIONS.md`](assets/templates/INTEGRATIONS.md) | Phase 3 step 5 |
168+
| [`assets/templates/TESTING.md`](assets/templates/TESTING.md) | Phase 3 step 6 |
169+
| [`assets/templates/CONCERNS.md`](assets/templates/CONCERNS.md) | Phase 3 step 7 |
170+
171+
Template usage mode:
172+
173+
- Default mode: complete only the "Core Sections (Required)" in each template.
174+
- Extended mode: add optional sections only when the repo complexity justifies them.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Architecture
2+
3+
## Core Sections (Required)
4+
5+
### 1) Architectural Style
6+
7+
- Primary style: [layered/feature/event-driven/other]
8+
- Why this classification: [short evidence-backed rationale]
9+
- Primary constraints: [2-3 constraints that shape design]
10+
11+
### 2) System Flow
12+
13+
```text
14+
[entry] -> [processing] -> [domain logic] -> [data/integration] -> [response/output]
15+
```
16+
17+
Describe the flow in 4-6 steps using file-backed evidence.
18+
19+
### 3) Layer/Module Responsibilities
20+
21+
| Layer or module | Owns | Must not own | Evidence |
22+
|-----------------|------|--------------|----------|
23+
| [name] | [responsibility] | [non-responsibility] | [file] |
24+
25+
### 4) Reused Patterns
26+
27+
| Pattern | Where found | Why it exists |
28+
|---------|-------------|---------------|
29+
| [singleton/repository/adapter/etc] | [path] | [reason] |
30+
31+
### 5) Known Architectural Risks
32+
33+
- [Risk 1 + impact]
34+
- [Risk 2 + impact]
35+
36+
### 6) Evidence
37+
38+
- [path/to/entrypoint]
39+
- [path/to/main-layer-files]
40+
- [path/to/data-or-integration-layer]
41+
42+
## Extended Sections (Optional)
43+
44+
Add only when needed:
45+
46+
- Startup or initialization order details
47+
- Async/event topology diagrams
48+
- Anti-pattern catalog with refactoring paths
49+
- Failure-mode analysis and resilience posture
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Codebase Concerns
2+
3+
## Core Sections (Required)
4+
5+
### 1) Top Risks (Prioritized)
6+
7+
| Severity | Concern | Evidence | Impact | Suggested action |
8+
|----------|---------|----------|--------|------------------|
9+
| [high/med/low] | [issue] | [file or scan output] | [impact] | [next action] |
10+
11+
### 2) Technical Debt
12+
13+
List the most important debt items only.
14+
15+
| Debt item | Why it exists | Where | Risk if ignored | Suggested fix |
16+
|-----------|---------------|-------|-----------------|---------------|
17+
| [item] | [reason] | [path] | [risk] | [fix] |
18+
19+
### 3) Security Concerns
20+
21+
| Risk | OWASP category (if applicable) | Evidence | Current mitigation | Gap |
22+
|------|--------------------------------|----------|--------------------|-----|
23+
| [risk] | [A01/A03/etc or N/A] | [path] | [what exists] | [what is missing] |
24+
25+
### 4) Performance and Scaling Concerns
26+
27+
| Concern | Evidence | Current symptom | Scaling risk | Suggested improvement |
28+
|---------|----------|-----------------|-------------|-----------------------|
29+
| [issue] | [path/metric] | [symptom] | [risk] | [action] |
30+
31+
### 5) Fragile/High-Churn Areas
32+
33+
| Area | Why fragile | Churn signal | Safe change strategy |
34+
|------|-------------|-------------|----------------------|
35+
| [path] | [reason] | [recent churn evidence] | [approach] |
36+
37+
### 6) `[ASK USER]` Questions
38+
39+
Add unresolved intent-dependent questions as a numbered list.
40+
41+
1. [ASK USER] [question]
42+
43+
### 7) Evidence
44+
45+
- [scan output section reference]
46+
- [path/to/code-file]
47+
- [path/to/config-or-history-evidence]
48+
49+
## Extended Sections (Optional)
50+
51+
Add only when needed:
52+
53+
- Full bug inventory
54+
- Component-level remediation roadmap
55+
- Cost/effort estimates by concern
56+
- Dependency-risk and ownership mapping
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Coding Conventions
2+
3+
## Core Sections (Required)
4+
5+
### 1) Naming Rules
6+
7+
| Item | Rule | Example | Evidence |
8+
|------|------|---------|----------|
9+
| Files | [RULE] | [EXAMPLE] | [FILE] |
10+
| Functions/methods | [RULE] | [EXAMPLE] | [FILE] |
11+
| Types/interfaces | [RULE] | [EXAMPLE] | [FILE] |
12+
| Constants/env vars | [RULE] | [EXAMPLE] | [FILE] |
13+
14+
### 2) Formatting and Linting
15+
16+
- Formatter: [TOOL + CONFIG FILE]
17+
- Linter: [TOOL + CONFIG FILE]
18+
- Most relevant enforced rules: [RULE_1], [RULE_2], [RULE_3]
19+
- Run commands: [COMMANDS]
20+
21+
### 3) Import and Module Conventions
22+
23+
- Import grouping/order: [RULE]
24+
- Alias vs relative import policy: [RULE]
25+
- Public exports/barrel policy: [RULE]
26+
27+
### 4) Error and Logging Conventions
28+
29+
- Error strategy by layer: [SHORT SUMMARY]
30+
- Logging style and required context fields: [SUMMARY]
31+
- Sensitive-data redaction rules: [SUMMARY]
32+
33+
### 5) Testing Conventions
34+
35+
- Test file naming/location rule: [RULE]
36+
- Mocking strategy norm: [RULE]
37+
- Coverage expectation: [RULE or TODO]
38+
39+
### 6) Evidence
40+
41+
- [path/to/lint-config]
42+
- [path/to/format-config]
43+
- [path/to/representative-source-file]
44+
45+
## Extended Sections (Optional)
46+
47+
Add only for large or inconsistent codebases:
48+
49+
- Layer-specific error handling matrix
50+
- Language-specific strictness options
51+
- Repo-specific commit/branching conventions
52+
- Known convention violations to clean up
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# External Integrations
2+
3+
## Core Sections (Required)
4+
5+
### 1) Integration Inventory
6+
7+
| System | Type (API/DB/Queue/etc) | Purpose | Auth model | Criticality | Evidence |
8+
|--------|---------------------------|---------|------------|-------------|----------|
9+
| [name] | [type] | [purpose] | [auth] | [high/med/low] | [file] |
10+
11+
### 2) Data Stores
12+
13+
| Store | Role | Access layer | Key risk | Evidence |
14+
|-------|------|--------------|----------|----------|
15+
| [db/cache/etc] | [role] | [module] | [risk] | [file] |
16+
17+
### 3) Secrets and Credentials Handling
18+
19+
- Credential sources: [env/secrets manager/config]
20+
- Hardcoding checks: [result]
21+
- Rotation or lifecycle notes: [known/unknown]
22+
23+
### 4) Reliability and Failure Behavior
24+
25+
- Retry/backoff behavior: [implemented/none/partial]
26+
- Timeout policy: [where configured]
27+
- Circuit-breaker or fallback behavior: [if any]
28+
29+
### 5) Observability for Integrations
30+
31+
- Logging around external calls: [yes/no + where]
32+
- Metrics/tracing coverage: [yes/no + where]
33+
- Missing visibility gaps: [list]
34+
35+
### 6) Evidence
36+
37+
- [path/to/integration-wrapper]
38+
- [path/to/config-or-env-template]
39+
- [path/to/monitoring-or-logging-config]
40+
41+
## Extended Sections (Optional)
42+
43+
Add only when needed:
44+
45+
- Endpoint-by-endpoint catalog
46+
- Auth flow sequence diagrams
47+
- SLA/SLO per integration
48+
- Region/failover topology notes

0 commit comments

Comments
 (0)