|
1 | 1 | --- |
2 | 2 | name: sensei |
3 | | -description: "Improve skill frontmatter compliance iteratively using the Ralph loop pattern. **WORKFLOW SKILL**. WHEN: \"run sensei\", \"sensei help\", \"improve skill\", \"fix frontmatter\", \"skill compliance\", \"frontmatter audit\", \"score skill\", \"check skill tokens\". DO NOT USE FOR: writing new skills from scratch (use skill-authoring), general code review. INVOKES: waza CLI, git." |
| 3 | +description: "**WORKFLOW SKILL** — Iteratively improve skill frontmatter compliance using the Ralph loop pattern. WHEN: \"run sensei\", \"sensei help\", \"improve skill\", \"fix frontmatter\", \"skill compliance\", \"frontmatter audit\", \"score skill\", \"check skill tokens\". INVOKES: token counting tools, test runners, git commands. FOR SINGLE OPERATIONS: use token CLI directly for counts/checks." |
4 | 4 | license: MIT |
5 | 5 | metadata: |
6 | 6 | author: Microsoft |
7 | | - version: "1.0.0" |
8 | | -compatibility: |
9 | | - platforms: "copilot-chat" |
| 7 | + version: "1.0.5" |
10 | 8 | --- |
11 | 9 |
|
12 | 10 | # Sensei |
13 | 11 |
|
14 | | -Iteratively improve skills until they pass waza check compliance. |
| 12 | +> "A true master teaches not by telling, but by refining." - The Skill Sensei |
15 | 13 |
|
16 | | -## Usage |
| 14 | +Automates skill frontmatter improvement using the [Ralph loop pattern](https://github.com/soderlind/ralph) - iteratively improving skills until they reach Medium-High compliance with passing tests, then checking token usage and prompting for action. |
17 | 15 |
|
| 16 | +## Help |
| 17 | + |
| 18 | +When user says "sensei help" or asks how to use sensei, show this: |
| 19 | + |
| 20 | +``` |
| 21 | +╔══════════════════════════════════════════════════════════════════╗ |
| 22 | +║ SENSEI - Skill Frontmatter Compliance Improver ║ |
| 23 | +╠══════════════════════════════════════════════════════════════════╣ |
| 24 | +║ ║ |
| 25 | +║ USAGE: ║ |
| 26 | +║ Run sensei on <skill-name> # Single skill ║ |
| 27 | +║ Run sensei on <skill-name> --skip-integration # Fast mode ║ |
| 28 | +║ Run sensei on <skill1>, <skill2>, ... # Multiple skills ║ |
| 29 | +║ Run sensei on all Low-adherence skills # Batch by score ║ |
| 30 | +║ Run sensei on all skills # All skills ║ |
| 31 | +║ ║ |
| 32 | +║ EXAMPLES: ║ |
| 33 | +║ Run sensei on appinsights-instrumentation ║ |
| 34 | +║ Run sensei on azure-security --skip-integration ║ |
| 35 | +║ Run sensei on azure-security, azure-observability ║ |
| 36 | +║ Run sensei on all Low-adherence skills ║ |
| 37 | +║ ║ |
| 38 | +║ WHAT IT DOES: ║ |
| 39 | +║ 1. READ - Load skill's SKILL.md, tests, and token count ║ |
| 40 | +║ 2. SCORE - Check compliance (Low/Medium/Medium-High/High) ║ |
| 41 | +║ 3. SCAFFOLD - Create tests from template if missing ║ |
| 42 | +║ 4. IMPROVE - Add WHEN: triggers (cross-model optimized) ║ |
| 43 | +║ 5. TEST - Run tests, fix if needed ║ |
| 44 | +║ 6. REFERENCES- Validate markdown links ║ |
| 45 | +║ 7. TOKENS - Check token budget, gather suggestions ║ |
| 46 | +║ 8. SUMMARY - Show before/after with suggestions ║ |
| 47 | +║ 9. PROMPT - Ask: Commit, Create Issue, or Skip? ║ |
| 48 | +║ 10. REPEAT - Until Medium-High score + tests pass ║ |
| 49 | +║ ║ |
| 50 | +║ TARGET SCORE: Medium-High ║ |
| 51 | +║ ✓ Description > 150 chars, ≤ 60 words ║ |
| 52 | +║ ✓ Has "WHEN:" trigger phrases (preferred) ║ |
| 53 | +║ ✓ No "DO NOT USE FOR:" (unless disambiguation-critical) ║ |
| 54 | +║ ✓ SKILL.md < 500 tokens (soft limit) ║ |
| 55 | +║ ║ |
| 56 | +║ MORE INFO: ║ |
| 57 | +║ See .github/skills/sensei/README.md for full documentation ║ |
| 58 | +║ ║ |
| 59 | +╚══════════════════════════════════════════════════════════════════╝ |
| 60 | +``` |
| 61 | + |
| 62 | +## When to Use |
| 63 | + |
| 64 | +- Improving a skill's frontmatter compliance score |
| 65 | +- Adding trigger phrases and anti-triggers to skill descriptions |
| 66 | +- Batch-improving multiple skills at once |
| 67 | +- Auditing and fixing Low-adherence skills |
| 68 | + |
| 69 | +## Invocation Modes |
| 70 | + |
| 71 | +### Single Skill |
| 72 | +``` |
| 73 | +Run sensei on azure-deploy |
| 74 | +``` |
| 75 | + |
| 76 | +### Multiple Skills |
| 77 | +``` |
| 78 | +Run sensei on azure-security, azure-observability |
| 79 | +``` |
| 80 | + |
| 81 | +### By Adherence Level |
| 82 | +``` |
| 83 | +Run sensei on all Low-adherence skills |
| 84 | +``` |
| 85 | + |
| 86 | +### All Skills |
18 | 87 | ``` |
19 | | -Run sensei on <skill-name> |
20 | | -Run sensei on <skill1>, <skill2> |
21 | 88 | Run sensei on all skills |
22 | 89 | ``` |
23 | 90 |
|
| 91 | +### GEPA Mode (Deep Optimization) |
| 92 | +``` |
| 93 | +Run sensei on my-skill --gepa |
| 94 | +Run sensei on my-skill --gepa --skip-integration |
| 95 | +Run sensei on all skills --gepa |
| 96 | +``` |
| 97 | + |
| 98 | +When `--gepa` is used, Step 5 (IMPROVE) is replaced with GEPA evolutionary optimization. |
| 99 | +Instead of template-based improvements, GEPA parses trigger prompt arrays from the existing |
| 100 | +test harness and combines them with content quality heuristics to build a fitness function. |
| 101 | +An LLM proposes and evaluates many candidate improvements automatically. Note: GEPA does not |
| 102 | +execute Jest tests directly — it uses the test data (prompts) as evaluation inputs. |
| 103 | + |
| 104 | +**GEPA score-only mode** (no LLM calls, just evaluate current quality): |
| 105 | +``` |
| 106 | +Run sensei score my-skill |
| 107 | +Run sensei score all skills |
| 108 | +``` |
| 109 | + |
24 | 110 | ## The Ralph Loop |
25 | 111 |
|
26 | | -For each skill, repeat until compliant (max 5 iterations): |
| 112 | +For each skill, execute this loop until score >= Medium-High AND tests pass: |
| 113 | + |
| 114 | +1. **READ** - Load `plugin/skills/{skill-name}/SKILL.md`, tests, and token count |
| 115 | +2. **SCORE** - Run spec-based compliance check (see [SCORING.md](references/SCORING.md)): |
| 116 | + - Validate `name` per [agentskills.io spec](https://agentskills.io/specification) (no `--`, no start/end `-`, lowercase alphanumeric) |
| 117 | + - Check description length and word count (≤60 words) |
| 118 | + - Check triggers (WHEN: preferred, USE FOR: accepted) |
| 119 | + - Warn on "DO NOT USE FOR:" (risky in multi-skill environments — **exception**: REQUIRED for skills that share trigger overlap with broader skills like `azure-prepare`) |
| 120 | + - Preserve optional spec fields (`license`, `metadata`, `allowed-tools`) if present |
| 121 | +3. **CHECK** - If score >= Medium-High AND tests pass → go to TOKENS step |
| 122 | +4. **SCAFFOLD** - If `tests/{skill-name}/` doesn't exist, create from `tests/_template/` |
| 123 | +5. **IMPROVE FRONTMATTER** - Add WHEN: triggers (stay under 60 words and 1024 chars) |
| 124 | +5b. **IMPROVE WITH GEPA** (when `--gepa` flag is set) — Replaces step 5 (IMPROVE FRONTMATTER) with automated optimization; step 6 (IMPROVE TESTS) still runs normally: |
| 125 | + - Auto-discovers `tests/{skill-name}/triggers.test.ts` and extracts prompt arrays |
| 126 | + - Builds a GEPA evaluator scoring content quality + trigger accuracy based on those trigger prompt arrays (not Jest test pass/fail results) |
| 127 | + - Runs `python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize --skill {skill-name} --skills-dir plugin/skills --tests-dir tests` |
| 128 | + - Shows diff of optimized SKILL.md for user approval |
| 129 | + - GEPA uses existing test trigger definitions as configuration — it does not execute, replace, or modify Jest tests |
| 130 | +6. **IMPROVE TESTS** - Update `shouldTriggerPrompts` and `shouldNotTriggerPrompts` to match the finalized frontmatter (including any GEPA changes) |
| 131 | +7. **VERIFY** - Run `cd tests && npm test -- --testPathPatterns={skill-name}` |
| 132 | +8. **VALIDATE REFERENCES** - Run `cd scripts && npm run references {skill-name}` to check markdown links |
| 133 | +9. **TOKENS** - Check token budget and line count (< 500 lines per spec), gather optimization suggestions |
| 134 | +10. **SUMMARY** - Display before/after comparison with unimplemented suggestions |
| 135 | +11. **PROMPT** - Ask user: Commit, Create Issue, or Skip? |
| 136 | +12. **REPEAT** - Go to step 2 (max 5 iterations per skill) |
| 137 | + |
| 138 | +## Scoring Criteria (Quick Reference) |
| 139 | + |
| 140 | +Sensei validates skills against the [agentskills.io specification](https://agentskills.io/specification). See [SCORING.md](references/SCORING.md) for full details. |
| 141 | + |
| 142 | +| Score | Requirements | |
| 143 | +|-------|--------------| |
| 144 | +| **Invalid** | Name fails spec validation (consecutive hyphens, start/end hyphen, uppercase, etc.) | |
| 145 | +| **Low** | Basic description, no explicit triggers | |
| 146 | +| **Medium** | Has trigger keywords/phrases, description > 150 chars, >60 words | |
| 147 | +| **Medium-High** | Has "WHEN:" (preferred) or "USE FOR:" triggers, ≤60 words | |
| 148 | +| **High** | Medium-High + compatibility field | |
| 149 | + |
| 150 | +**Target: Medium-High** (distinctive triggers, concise description) |
| 151 | + |
| 152 | +> ⚠️ "DO NOT USE FOR:" is **risky in multi-skill environments** (15+ overlapping skills) — causes keyword contamination on fast-pattern-matching models. Safe for small, isolated skill sets. Use positive routing with `WHEN:` for cross-model safety. |
| 153 | +> |
| 154 | +> **Exception — disambiguation-critical skills:** When a skill's `USE FOR` triggers directly overlap with a broader skill (e.g., `azure-prepare` owns "deploy to Azure"), `DO NOT USE FOR:` is **REQUIRED** to prevent the broader skill from capturing prompts that belong to the specialized skill. Removing it causes routing regressions. Integration tests validate this routing -- run them before removing any `DO NOT USE FOR:` clause. |
| 155 | +
|
| 156 | +**Strongly recommended** (reported as suggestions if missing): |
| 157 | +- `license` — identifies the license applied to the skill |
| 158 | +- `metadata.version` — tracks the skill version for consumers |
| 159 | + |
| 160 | +## Frontmatter Template |
| 161 | + |
| 162 | +Per the [agentskills.io spec](https://agentskills.io/specification), required and optional fields: |
| 163 | + |
| 164 | +```yaml |
| 165 | +--- |
| 166 | +name: skill-name |
| 167 | +description: "[ACTION VERB] [UNIQUE_DOMAIN]. [One clarifying sentence]. WHEN: \"trigger 1\", \"trigger 2\", \"trigger 3\"." |
| 168 | +license: MIT |
| 169 | +metadata: |
| 170 | + version: "1.0" |
| 171 | +# Other optional spec fields — preserve if already present: |
| 172 | +# metadata.author: example-org |
| 173 | +# allowed-tools: Bash(git:*) Read |
| 174 | +--- |
| 175 | +``` |
| 176 | + |
| 177 | +> **IMPORTANT:** Use inline double-quoted strings for descriptions. Do NOT use `>-` folded scalars (incompatible with skills.sh). Do NOT use `|` literal blocks (preserves newlines). Keep total description under 1024 characters and ≤60 words. |
| 178 | +
|
| 179 | +> ⚠️ **"DO NOT USE FOR:" carries context-dependent risk.** In multi-skill environments (10+ skills with overlapping domains), anti-trigger clauses introduce the very keywords that cause wrong-skill activation on Claude Sonnet and fast-pattern-matching models ([evidence](https://gist.github.com/kvenkatrajan/52e6e77f5560ca30640490b4cc65d109)). For small, isolated skill sets (1-5 skills), the risk is low. When in doubt, use positive routing with `WHEN:` and distinctive quoted phrases. |
| 180 | +> |
| 181 | +> **Exception:** `DO NOT USE FOR:` is **REQUIRED** when a specialized skill's triggers overlap with a broader skill (e.g., `azure-hosted-copilot-sdk` vs. `azure-prepare` on "deploy to Azure"). Without the negative discriminator, the broader skill captures prompts that should route to the specialized one. Always run integration tests before removing a `DO NOT USE FOR:` clause. |
27 | 182 |
|
28 | | -1. **READ** — Load SKILL.md and current token count |
29 | | -2. **SCORE** — Run `waza check {skill-name}` for compliance |
30 | | -3. **FIX** — Address issues: tokens, broken links, frontmatter |
31 | | -4. **VERIFY** — Re-run `waza check`; loop if issues remain |
32 | | -5. **COMMIT** — `sensei: improve {skill-name} frontmatter` |
| 183 | +## Test Scaffolding |
33 | 184 |
|
34 | | -Target: High compliance, ≤500 tokens, all links valid. |
| 185 | +When tests don't exist, scaffold from `tests/_template/`: |
35 | 186 |
|
36 | | -## Frontmatter Rules |
| 187 | +```bash |
| 188 | +cp -r tests/_template tests/{skill-name} |
| 189 | +``` |
| 190 | + |
| 191 | +Then update: |
| 192 | +1. `SKILL_NAME` constant in all test files |
| 193 | +2. `shouldTriggerPrompts` - 5+ prompts matching new frontmatter triggers |
| 194 | +3. `shouldNotTriggerPrompts` - 5+ prompts matching anti-triggers |
37 | 195 |
|
38 | | -- Use inline double-quoted `description` (not `>-` folded scalars) |
39 | | -- Lead with action verb + domain; add `WHEN:` trigger phrases |
40 | | -- Keep ≤60 words, ≤1024 chars |
| 196 | +**Commit Messages:** |
| 197 | +``` |
| 198 | +sensei: improve {skill-name} frontmatter |
| 199 | +``` |
41 | 200 |
|
42 | | -## Tools |
| 201 | +## Constraints |
43 | 202 |
|
44 | | -No MCP servers required — uses waza CLI directly. |
| 203 | +- Only modify `plugin/skills/` - these are the Azure skills used by Copilot |
| 204 | +- `.github/skills/` contains meta-skills like sensei for developer tooling |
| 205 | +- Max 5 iterations per skill before moving on |
| 206 | +- Description must stay under 1024 characters |
| 207 | +- SKILL.md should stay under 500 tokens (soft limit) |
| 208 | +- Tests must pass before prompting for action |
| 209 | +- User chooses: Commit, Create Issue, or Skip after each skill |
45 | 210 |
|
46 | | -| Tool | Fallback | |
47 | | -|------|----------| |
48 | | -| waza | `waza check` / `waza run` CLI | |
49 | | -| git | Standard git CLI | |
| 211 | +## Flags |
50 | 212 |
|
51 | | -## Examples |
| 213 | +| Flag | Description | |
| 214 | +|------|-------------| |
| 215 | +| `--skip-integration` | Skip integration tests for faster iteration. Only runs unit and trigger tests. | |
| 216 | +| `--gepa` | Use GEPA evolutionary optimization instead of template-based improvement. Auto-discovers tests and builds evaluator at runtime. | |
52 | 217 |
|
53 | | -- "Run sensei on pipeline-troubleshooting" |
54 | | -- "Run sensei on all skills" |
| 218 | +> ⚠️ Skipping integration tests speeds up the loop but may miss runtime issues. Consider running full tests before final commit. |
55 | 219 |
|
56 | | -## Troubleshooting |
| 220 | +## Reference Documentation |
57 | 221 |
|
58 | | -If waza fails, verify it is installed and you are in the skills directory. |
| 222 | +- [SCORING.md](references/SCORING.md) - Detailed scoring criteria |
| 223 | +- [LOOP.md](references/LOOP.md) - Ralph loop workflow details |
| 224 | +- [EXAMPLES.md](references/EXAMPLES.md) - Before/after examples |
| 225 | +- [TOKEN-INTEGRATION.md](references/TOKEN-INTEGRATION.md) - Token budget integration |
59 | 226 |
|
60 | | -## References |
| 227 | +## Related Skills |
61 | 228 |
|
62 | | -- [SCORING.md](references/SCORING.md) — Scoring criteria and token budgets |
63 | | -- [LOOP.md](references/LOOP.md) — Detailed workflow |
64 | | -- [EXAMPLES.md](references/EXAMPLES.md) — Before/after examples |
| 229 | +- [markdown-token-optimizer](/.github/skills/markdown-token-optimizer) - Token analysis and optimization |
| 230 | +- [skill-authoring](/.github/skills/skill-authoring) - Skill writing guidelines |
0 commit comments