Skip to content

oceanusXXD/code2skill

code2skill

PyPI version Python versions License

Language: English | 简体中文

code2skill turns a Python repository into instruction files for coding assistants.

It scans source code and configuration, writes a .code2skill/ bundle, generates focused Skill documents, and publishes them to Codex, Claude Code, Cursor, GitHub Copilot, or Windsurf. The files stay in the repository, so maintainers can review them, run them in CI, and update them when code changes.

Use it when a Python project needs coding assistants to follow the current module boundaries, workflows, API contracts, and maintenance rules.

What This Repository Can Do

  • Analyze a Python repository with AST semantic extraction, import graph checks, call/type/data-flow evidence, config extraction, and file-role inference.
  • Write a .code2skill/ bundle with a project summary, references, a Skill plan, generated Skills, a report, and incremental state.
  • Estimate model cost and affected Skills before generation.
  • Generate Skill Markdown from repository evidence using OpenAI Responses API, OpenAI-compatible Responses endpoints, Claude, or Qwen.
  • Publish generated Skills into AGENTS.md, CLAUDE.md, .cursor/rules/*, .github/copilot-instructions.md, and .windsurfrules.
  • Refresh outputs in CI with full or incremental mode.
  • Validate the bundle and target files with doctor.

Who It Is For

User Need What code2skill provides
Python maintainers Assistants should follow local architecture and naming patterns Source-based Skill files and readiness checks
DevEx and platform teams Several services need the same assistant setup process CLI, Python API, CI refresh, and shared output layout
Open-source maintainers Contributors need public project instructions instead of untracked notes Committed files that can be reviewed with the rest of the repo
Tooling evaluators One repository needs to work with several coding assistants One generated Skill layer adapted into multiple target formats

Common Scenarios

Scenario When to use it Expected result
First assistant setup A repo starts using Codex, Cursor, Claude Code, Copilot, or Windsurf scan, adapt, and doctor produce a ready target file
Pull request refresh Code changes may make previous instructions stale ci --mode auto reports changed files, affected files, and affected Skills
Multi-tool setup A team uses more than one coding assistant adapt --target all writes consistent target files
Platform automation A DevEx team runs the workflow across many Python services Python API returns structured results and readiness status
Contributor onboarding New contributors need project-specific implementation rules Generated Skills and docs describe the repo's working contracts

Architecture

code2skill pipeline

The final product is a repository-owned Skill layer, not a chat transcript. Structural artifacts stay available for review, cost estimation, CI refresh, and readiness checks.

Example Generated Skills

Generated Skills are source-cited Markdown files under .code2skill/skills/*.md. These shortened examples show the kind of output code2skill is designed to produce from repository evidence.

Repository analysis pipeline
# Repository Analysis Pipeline

## Overview
Use this Skill when changing how code2skill scans a repository, builds evidence, or writes structural artifacts.

## Core Rules
- Keep `execute_repository(...)` as the orchestration entrypoint. Source: src/code2skill/core.py
- Resolve dependencies through `ImportGraph` before ranking files or computing affected files. Source: src/code2skill/import_graph.py, src/code2skill/impact.py
- Treat `project-summary.md`, `skill-blueprint.json`, `report.json`, and `state/analysis-state.json` as review and CI artifacts. Source: src/code2skill/core.py

## Common Flows
1. Scan candidates and extract source/config summaries.
2. Build import graph, PageRank, evidence coverage, and blueprint.
3. Render summary/reference/report artifacts before optional Skill generation.
Assistant target publishing
# Assistant Target Publishing

## Overview
Use this Skill when publishing generated Skills into Codex, Claude Code, Cursor, GitHub Copilot, or Windsurf target files.

## Core Rules
- Use `adapt` for target publishing; generated target content must stay inside managed blocks or manifest-tracked files. Source: src/code2skill/adapt.py, src/code2skill/capabilities/adapt/targets.py
- Run `doctor` after adaptation to verify the bundle, Skill plan, generated Skill files, state, and selected target output. Source: src/code2skill/capabilities/adoption_service.py
- Preserve hand-written target-file content outside the managed block. Source: src/code2skill/capabilities/output_bundle_service.py

## Common Flows
1. Generate or refresh `.code2skill/skills/*.md`.
2. Run `code2skill adapt . --target <tool>`.
3. Run `code2skill doctor . --target <tool>`.

Benchmark

code2skill is evaluated on structural evidence extraction before any LLM call. The benchmark compares two simple baselines against the semantic scanner used by the Skill generation pipeline.

Structural evidence benchmark

Method Gold evidence recall
Path-only baseline 0.044
AST symbols baseline 0.356
code2skill semantic scanner 1.000

The gold set covers route decorators, service calls, type references, data-flow edges, dynamic imports, re-exported symbol dependencies, raised exceptions, main guards, and internal dependency edges. Reproduce it with:

python benchmarks/evaluate_structural_evidence.py

Details: Benchmark Notes, result JSON.

Install

Requires Python 3.10 or newer.

python -m pip install code2skill
code2skill --version
code2skill --help

The expected CLI commands are scan, estimate, ci, adapt, and doctor.

If the console script is not on PATH, use the module entry point:

python -m code2skill --help

First Run

Run a no-LLM structural check first. This verifies that the package can read the repository and write the local artifact bundle.

code2skill scan . --structure-only

Preview model cost and incremental impact:

code2skill estimate .

Generate Skills with a model provider:

export QWEN_API_KEY=...
code2skill scan . --llm qwen --model qwen-plus-latest

Publish the generated Skill layer to an AI tool:

code2skill adapt . --target codex

Check that the bundle and target file are ready to use:

code2skill doctor . --target codex

Review and commit the files that matter for your workflow:

  • .code2skill/adoption-guide.md
  • .code2skill/skills/index.md
  • .code2skill/skills/*.md
  • adapted target files such as AGENTS.md, CLAUDE.md, .cursor/rules/*, .github/copilot-instructions.md, or .windsurfrules

Use .code2skill/report.json to inspect selected files, execution mode, changed files, affected Skills, cost estimates, and generated outputs.

Model Configuration

Common environment variables:

export CODE2SKILL_LLM=qwen
export CODE2SKILL_MODEL=qwen-plus-latest
export CODE2SKILL_OUTPUT_DIR=.code2skill
export CODE2SKILL_MAX_SKILLS=6
export CODE2SKILL_BASE_REF=origin/main

Provider keys:

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export QWEN_API_KEY=...

OpenAI Responses API:

export CODE2SKILL_LLM=openai
export CODE2SKILL_MODEL=gpt-4o-mini
export CODE2SKILL_OPENAI_API_KEY=...
code2skill scan .

OpenAI-compatible Responses endpoint:

export CODE2SKILL_LLM=openai
export CODE2SKILL_MODEL=<responses-compatible-model>
export CODE2SKILL_OPENAI_API_KEY=...
export CODE2SKILL_OPENAI_BASE_URL=https://example.com/v1
code2skill scan .

CODE2SKILL_OPENAI_BASE_URL may point either to a /v1 base URL or directly to a /responses endpoint.

Commands

Command Calls LLM Writes files Primary purpose
scan Yes, unless --structure-only Yes Full local analysis and Skill generation
estimate No report.json only Cost and impact preview
ci Yes, unless --structure-only Yes Automation-friendly full or incremental refresh
adapt No Yes Publish generated Skills to target AI tool files
doctor No No Validate bundle, Skill plan, state, target files, and readiness

Output Layout

The default artifact directory is .code2skill/.

Path Purpose
adoption-guide.md Repository-specific adoption checklist and next workflow
project-summary.md Human-readable repository summary with evidence coverage and import graph signals
skill-blueprint.json Structural repository blueprint with evidence counts and dependency graph stats
skill-plan.json LLM-planned Skill inventory
references/*.md Architecture, style, workflow, and API references
skills/index.md Generated Skill index
skills/*.md Generated AI working instructions
report.json Execution metrics, cost estimates, changed files, affected Skills, and artifact lists
state/analysis-state.json Incremental CI cache

Target Tools

Target Command Output
Codex code2skill adapt . --target codex AGENTS.md
Claude Code code2skill adapt . --target claude CLAUDE.md
Cursor code2skill adapt . --target cursor .cursor/rules/*.md plus .cursor/rules/.code2skill-manifest.json
GitHub Copilot code2skill adapt . --target copilot .github/copilot-instructions.md
Windsurf code2skill adapt . --target windsurf .windsurfrules
All targets code2skill adapt . --target all all of the above

Merge-style targets use a managed block:

<!-- code2skill:start -->
...
<!-- code2skill:end -->

Content outside the managed block is preserved. Cursor uses copied Skill files and a manifest so later runs can remove stale generated files while keeping unmanaged team rules.

CI Refresh

After the first bundle exists, use ci --mode auto to reuse state and regenerate only affected Skill outputs when code changes.

code2skill ci . --mode auto --base-ref origin/main --head-ref HEAD
code2skill adapt . --target codex
code2skill doctor . --target codex

The first CI run usually falls back to full because no state exists yet. Later runs can use .code2skill/state/analysis-state.json and skill-plan.json to decide whether incremental refresh is safe.

Python API

The package root exports the supported high-level API:

from pathlib import Path

from code2skill import adapt_repository, doctor, estimate, scan

repo = Path(".")

preview = estimate(repo)
result = scan(
    repo,
    llm_provider="qwen",
    llm_model="qwen-plus-latest",
    max_skills=6,
)
written = adapt_repository(repo, target="codex")
readiness = doctor(repo, target="codex")

print(preview.report_path)
print(result.generated_skills)
print(written)
print(readiness.ready, readiness.score)

For lower-level automation, use create_scan_config(...) with scan_repository(...), estimate_repository(...), or run_ci_repository(...).

Documentation

Guarantees

  • Python-first analysis using ast, import graph analysis, file-role inference, and pattern detection.
  • Evidence-first prompts that require source references and keep uncertainty explicit.
  • Outputs written to files instead of kept in chat history.
  • Measurable runs through report.json.
  • Incremental operation through state reuse, diff impact, and affected Skill mapping.
  • Readiness validation through doctor.

Limitations

  • Optimized for Python repositories.
  • Non-Python code is scanned only as supporting context, not as a first-class analysis target.
  • Output quality still depends on repository clarity and the selected model.
  • The package is in the 0.1.x stage and public behavior may continue to evolve.

License

Apache-2.0. See LICENSE.

About

将代码库一键转化为 AI 专属技能。支持本地运行与 CI/CD 自动更新的上下文提取 CLI 工具。

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages