Azure
diff --git a/‎.github/skills/sensei/README.md‎
Lines changed: 9 additions & 9 deletions b/‎.github/skills/sensei/README.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎.github/skills/sensei/SKILL.md‎
Lines changed: 200 additions & 34 deletions b/‎.github/skills/sensei/SKILL.md‎
Lines changed: 200 additions & 34 deletions
@@ -22,7 +22,7 @@ Sensei automates the improvement of [Agent Skills](https://agentskills.io) front
 
 ### The Problem
 
-The [frontmatter audit](https://gist.github.com/spboyer/28c31bf0cafb87489406832633aa31a7) revealed that all SDK skills have:
+The [frontmatter audit](https://gist.github.com/spboyer/28c31bf0cafb87489406832633aa31a7) revealed that all Azure skills have:
 - **0% High adherence** - No skills have triggers + anti-triggers + compatibility
 - **46% Low adherence** - 12 skills have minimal descriptions without clear triggers
 - **0/26 anti-triggers** - No skills tell agents when NOT to use them
@@ -108,7 +108,7 @@ cd tests
 npm install
 
 # Verify tests run
-npm test -- --testPathPattern=azure-validation
+npm test -- --testPathPatterns=azure-validation
 ```
 
 ---
@@ -164,7 +164,7 @@ npm test -- --testPathPattern=azure-validation
 └─────────────────────┬───────────────────────────────────┘
                       ▼
 ┌─────────────────────────────────────────────────────────┐
-│  6. VERIFY: npm test -- --testPathPattern={skill-name}  │
+│  6. VERIFY: npm test -- --testPathPatterns={skill-name}  │
 │     • If tests fail → fix and retry                     │
 │     • If tests pass → continue                          │
 └─────────────────────┬───────────────────────────────────┘
@@ -283,7 +283,7 @@ To reach Medium-High, a skill must have:
 
 ### Token Budget
 
-From **skill-authoring**:
+From [skill-authoring](/.github/skills/skill-authoring):
 - **SKILL.md:** < 500 tokens (soft), < 5000 (hard)
 - **references/*.md:** < 1000 tokens each
 - Check with: `cd scripts && npm run tokens -- check plugin/skills/{skill}/SKILL.md`
@@ -297,7 +297,7 @@ From **skill-authoring**:
 ```yaml
 ---
 name: appinsights-instrumentation
-description: 'Implement retry logic for HTTP client requests with exponential backoff'
+description: 'Instrument a webapp to send useful telemetry data to Azure App Insights'
 ---
 ```
 
@@ -313,7 +313,7 @@ description: 'Implement retry logic for HTTP client requests with exponential ba
 ---
 name: appinsights-instrumentation
 description: >-
-  Implement retry logic with exponential backoff for HTTP requests.
+  Instrument web apps to send telemetry to Azure Application Insights.
   USE FOR: "add App Insights", "instrument my app", "set up monitoring",
   "add telemetry", "track requests", "ASP.NET Core telemetry", "Node.js monitoring".
   DO NOT USE FOR: querying logs (use azure-observability), creating alerts,
@@ -368,7 +368,7 @@ const shouldNotTriggerPrompts = [
 3. Run tests manually to see specific failures:
    ```bash
    cd tests
-   npm test -- --testPathPattern={skill-name} --verbose
+   npm test -- --testPathPatterns={skill-name} --verbose
    ```
 
 ### Skill Not Reaching Target Score
@@ -441,5 +441,5 @@ If Sensei produces unexpected results:
 
 ### Related Skills
 
-- **markdown-token-optimizer** - Token analysis and optimization suggestions
-- **skill-authoring** - Guidelines for writing compliant Agent Skills
+- [markdown-token-optimizer](/.github/skills/markdown-token-optimizer) - Token analysis and optimization suggestions
+- [skill-authoring](/.github/skills/skill-authoring) - Guidelines for writing compliant Agent Skills
@@ -1,64 +1,230 @@
 ---
 name: sensei
-description: "Improve skill frontmatter compliance iteratively using the Ralph loop pattern. **WORKFLOW SKILL**. WHEN: \"run sensei\", \"sensei help\", \"improve skill\", \"fix frontmatter\", \"skill compliance\", \"frontmatter audit\", \"score skill\", \"check skill tokens\". DO NOT USE FOR: writing new skills from scratch (use skill-authoring), general code review. INVOKES: waza CLI, git."
+description: "**WORKFLOW SKILL** — Iteratively improve skill frontmatter compliance using the Ralph loop pattern. WHEN: \"run sensei\", \"sensei help\", \"improve skill\", \"fix frontmatter\", \"skill compliance\", \"frontmatter audit\", \"score skill\", \"check skill tokens\". INVOKES: token counting tools, test runners, git commands. FOR SINGLE OPERATIONS: use token CLI directly for counts/checks."
 license: MIT
 metadata:
   author: Microsoft
-  version: "1.0.0"
-compatibility:
-  platforms: "copilot-chat"
+  version: "1.0.5"
 ---
 
 # Sensei
 
-Iteratively improve skills until they pass waza check compliance.
+> "A true master teaches not by telling, but by refining." - The Skill Sensei
 
-## Usage
+Automates skill frontmatter improvement using the [Ralph loop pattern](https://github.com/soderlind/ralph) - iteratively improving skills until they reach Medium-High compliance with passing tests, then checking token usage and prompting for action.
 
+## Help
+
+When user says "sensei help" or asks how to use sensei, show this:
+
+```
+╔══════════════════════════════════════════════════════════════════╗
+║  SENSEI - Skill Frontmatter Compliance Improver                  ║
+╠══════════════════════════════════════════════════════════════════╣
+║                                                                  ║
+║  USAGE:                                                          ║
+║    Run sensei on <skill-name>              # Single skill        ║
+║    Run sensei on <skill-name> --skip-integration  # Fast mode    ║
+║    Run sensei on <skill1>, <skill2>, ...   # Multiple skills     ║
+║    Run sensei on all Low-adherence skills  # Batch by score      ║
+║    Run sensei on all skills                # All skills       ║
+║                                                                  ║
+║  EXAMPLES:                                                       ║
+║    Run sensei on appinsights-instrumentation                     ║
+║    Run sensei on azure-security --skip-integration               ║
+║    Run sensei on azure-security, azure-observability             ║
+║    Run sensei on all Low-adherence skills                        ║
+║                                                                  ║
+║  WHAT IT DOES:                                                   ║
+║    1. READ      - Load skill's SKILL.md, tests, and token count  ║
+║    2. SCORE     - Check compliance (Low/Medium/Medium-High/High) ║
+║    3. SCAFFOLD  - Create tests from template if missing          ║
+║    4. IMPROVE   - Add WHEN: triggers (cross-model optimized)     ║
+║    5. TEST      - Run tests, fix if needed                       ║
+║    6. REFERENCES- Validate markdown links                        ║
+║    7. TOKENS    - Check token budget, gather suggestions         ║
+║    8. SUMMARY   - Show before/after with suggestions             ║
+║    9. PROMPT    - Ask: Commit, Create Issue, or Skip?            ║
+║   10. REPEAT    - Until Medium-High score + tests pass           ║
+║                                                                  ║
+║  TARGET SCORE: Medium-High                                       ║
+║    ✓ Description > 150 chars, ≤ 60 words                         ║
+║    ✓ Has "WHEN:" trigger phrases (preferred)                     ║
+║    ✓ No "DO NOT USE FOR:" (unless disambiguation-critical)         ║
+║    ✓ SKILL.md < 500 tokens (soft limit)                          ║
+║                                                                  ║
+║  MORE INFO:                                                      ║
+║    See .github/skills/sensei/README.md for full documentation    ║
+║                                                                  ║
+╚══════════════════════════════════════════════════════════════════╝
+```
+
+## When to Use
+
+- Improving a skill's frontmatter compliance score
+- Adding trigger phrases and anti-triggers to skill descriptions
+- Batch-improving multiple skills at once
+- Auditing and fixing Low-adherence skills
+
+## Invocation Modes
+
+### Single Skill
+```
+Run sensei on azure-deploy
+```
+
+### Multiple Skills
+```
+Run sensei on azure-security, azure-observability
+```
+
+### By Adherence Level
+```
+Run sensei on all Low-adherence skills
+```
+
+### All Skills
 ```
-Run sensei on <skill-name>
-Run sensei on <skill1>, <skill2>
 Run sensei on all skills
 ```
 
+### GEPA Mode (Deep Optimization)
+```
+Run sensei on my-skill --gepa
+Run sensei on my-skill --gepa --skip-integration
+Run sensei on all skills --gepa
+```
+
+When `--gepa` is used, Step 5 (IMPROVE) is replaced with GEPA evolutionary optimization.
+Instead of template-based improvements, GEPA parses trigger prompt arrays from the existing
+test harness and combines them with content quality heuristics to build a fitness function.
+An LLM proposes and evaluates many candidate improvements automatically. Note: GEPA does not
+execute Jest tests directly — it uses the test data (prompts) as evaluation inputs.
+
+**GEPA score-only mode** (no LLM calls, just evaluate current quality):
+```
+Run sensei score my-skill
+Run sensei score all skills
+```
+
 ## The Ralph Loop
 
-For each skill, repeat until compliant (max 5 iterations):
+For each skill, execute this loop until score >= Medium-High AND tests pass:
+
+1. **READ** - Load `plugin/skills/{skill-name}/SKILL.md`, tests, and token count
+2. **SCORE** - Run spec-based compliance check (see [SCORING.md](references/SCORING.md)):
+   - Validate `name` per [agentskills.io spec](https://agentskills.io/specification) (no `--`, no start/end `-`, lowercase alphanumeric)
+   - Check description length and word count (≤60 words)
+   - Check triggers (WHEN: preferred, USE FOR: accepted)
+   - Warn on "DO NOT USE FOR:" (risky in multi-skill environments — **exception**: REQUIRED for skills that share trigger overlap with broader skills like `azure-prepare`)
+   - Preserve optional spec fields (`license`, `metadata`, `allowed-tools`) if present
+3. **CHECK** - If score >= Medium-High AND tests pass → go to TOKENS step
+4. **SCAFFOLD** - If `tests/{skill-name}/` doesn't exist, create from `tests/_template/`
+5. **IMPROVE FRONTMATTER** - Add WHEN: triggers (stay under 60 words and 1024 chars)
+5b. **IMPROVE WITH GEPA** (when `--gepa` flag is set) — Replaces step 5 (IMPROVE FRONTMATTER) with automated optimization; step 6 (IMPROVE TESTS) still runs normally:
+   - Auto-discovers `tests/{skill-name}/triggers.test.ts` and extracts prompt arrays
+   - Builds a GEPA evaluator scoring content quality + trigger accuracy based on those trigger prompt arrays (not Jest test pass/fail results)
+   - Runs `python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize --skill {skill-name} --skills-dir plugin/skills --tests-dir tests`
+   - Shows diff of optimized SKILL.md for user approval
+   - GEPA uses existing test trigger definitions as configuration — it does not execute, replace, or modify Jest tests
+6. **IMPROVE TESTS** - Update `shouldTriggerPrompts` and `shouldNotTriggerPrompts` to match the finalized frontmatter (including any GEPA changes)
+7. **VERIFY** - Run `cd tests && npm test -- --testPathPatterns={skill-name}`
+8. **VALIDATE REFERENCES** - Run `cd scripts && npm run references {skill-name}` to check markdown links
+9. **TOKENS** - Check token budget and line count (< 500 lines per spec), gather optimization suggestions
+10. **SUMMARY** - Display before/after comparison with unimplemented suggestions
+11. **PROMPT** - Ask user: Commit, Create Issue, or Skip?
+12. **REPEAT** - Go to step 2 (max 5 iterations per skill)
+
+## Scoring Criteria (Quick Reference)
+
+Sensei validates skills against the [agentskills.io specification](https://agentskills.io/specification). See [SCORING.md](references/SCORING.md) for full details.
+
+| Score | Requirements |
+|-------|--------------|
+| **Invalid** | Name fails spec validation (consecutive hyphens, start/end hyphen, uppercase, etc.) |
+| **Low** | Basic description, no explicit triggers |
+| **Medium** | Has trigger keywords/phrases, description > 150 chars, >60 words |
+| **Medium-High** | Has "WHEN:" (preferred) or "USE FOR:" triggers, ≤60 words |
+| **High** | Medium-High + compatibility field |
+
+**Target: Medium-High** (distinctive triggers, concise description)
+
+> ⚠️ "DO NOT USE FOR:" is **risky in multi-skill environments** (15+ overlapping skills) — causes keyword contamination on fast-pattern-matching models. Safe for small, isolated skill sets. Use positive routing with `WHEN:` for cross-model safety.
+>
+> **Exception — disambiguation-critical skills:** When a skill's `USE FOR` triggers directly overlap with a broader skill (e.g., `azure-prepare` owns "deploy to Azure"), `DO NOT USE FOR:` is **REQUIRED** to prevent the broader skill from capturing prompts that belong to the specialized skill. Removing it causes routing regressions. Integration tests validate this routing -- run them before removing any `DO NOT USE FOR:` clause.
+
+**Strongly recommended** (reported as suggestions if missing):
+- `license` — identifies the license applied to the skill
+- `metadata.version` — tracks the skill version for consumers
+
+## Frontmatter Template
+
+Per the [agentskills.io spec](https://agentskills.io/specification), required and optional fields:
+
+```yaml
+---
+name: skill-name
+description: "[ACTION VERB] [UNIQUE_DOMAIN]. [One clarifying sentence]. WHEN: \"trigger 1\", \"trigger 2\", \"trigger 3\"."
+license: MIT
+metadata:
+  version: "1.0"
+# Other optional spec fields — preserve if already present:
+# metadata.author: example-org
+# allowed-tools: Bash(git:*) Read
+---
+```
+
+> **IMPORTANT:** Use inline double-quoted strings for descriptions. Do NOT use `>-` folded scalars (incompatible with skills.sh). Do NOT use `|` literal blocks (preserves newlines). Keep total description under 1024 characters and ≤60 words.
+
+> ⚠️ **"DO NOT USE FOR:" carries context-dependent risk.** In multi-skill environments (10+ skills with overlapping domains), anti-trigger clauses introduce the very keywords that cause wrong-skill activation on Claude Sonnet and fast-pattern-matching models ([evidence](https://gist.github.com/kvenkatrajan/52e6e77f5560ca30640490b4cc65d109)). For small, isolated skill sets (1-5 skills), the risk is low. When in doubt, use positive routing with `WHEN:` and distinctive quoted phrases.
+>
+> **Exception:** `DO NOT USE FOR:` is **REQUIRED** when a specialized skill's triggers overlap with a broader skill (e.g., `azure-hosted-copilot-sdk` vs. `azure-prepare` on "deploy to Azure"). Without the negative discriminator, the broader skill captures prompts that should route to the specialized one. Always run integration tests before removing a `DO NOT USE FOR:` clause.
 
-1. **READ** — Load SKILL.md and current token count
-2. **SCORE** — Run `waza check {skill-name}` for compliance
-3. **FIX** — Address issues: tokens, broken links, frontmatter
-4. **VERIFY** — Re-run `waza check`; loop if issues remain
-5. **COMMIT** — `sensei: improve {skill-name} frontmatter`
+## Test Scaffolding
 
-Target: High compliance, ≤500 tokens, all links valid.
+When tests don't exist, scaffold from `tests/_template/`:
 
-## Frontmatter Rules
+```bash
+cp -r tests/_template tests/{skill-name}
+```
+
+Then update:
+1. `SKILL_NAME` constant in all test files
+2. `shouldTriggerPrompts` - 5+ prompts matching new frontmatter triggers
+3. `shouldNotTriggerPrompts` - 5+ prompts matching anti-triggers
 
-- Use inline double-quoted `description` (not `>-` folded scalars)
-- Lead with action verb + domain; add `WHEN:` trigger phrases
-- Keep ≤60 words, ≤1024 chars
+**Commit Messages:**
+```
+sensei: improve {skill-name} frontmatter
+```
 
-## Tools
+## Constraints
 
-No MCP servers required — uses waza CLI directly.
+- Only modify `plugin/skills/` - these are the Azure skills used by Copilot
+- `.github/skills/` contains meta-skills like sensei for developer tooling
+- Max 5 iterations per skill before moving on
+- Description must stay under 1024 characters
+- SKILL.md should stay under 500 tokens (soft limit)
+- Tests must pass before prompting for action
+- User chooses: Commit, Create Issue, or Skip after each skill
 
-| Tool | Fallback |
-|------|----------|
-| waza | `waza check` / `waza run` CLI |
-| git | Standard git CLI |
+## Flags
 
-## Examples
+| Flag | Description |
+|------|-------------|
+| `--skip-integration` | Skip integration tests for faster iteration. Only runs unit and trigger tests. |
+| `--gepa` | Use GEPA evolutionary optimization instead of template-based improvement. Auto-discovers tests and builds evaluator at runtime. |
 
-- "Run sensei on pipeline-troubleshooting"
-- "Run sensei on all skills"
+> ⚠️ Skipping integration tests speeds up the loop but may miss runtime issues. Consider running full tests before final commit.
 
-## Troubleshooting
+## Reference Documentation
 
-If waza fails, verify it is installed and you are in the skills directory.
+- [SCORING.md](references/SCORING.md) - Detailed scoring criteria
+- [LOOP.md](references/LOOP.md) - Ralph loop workflow details
+- [EXAMPLES.md](references/EXAMPLES.md) - Before/after examples
+- [TOKEN-INTEGRATION.md](references/TOKEN-INTEGRATION.md) - Token budget integration
 
-## References
+## Related Skills
 
-- [SCORING.md](references/SCORING.md) — Scoring criteria and token budgets
-- [LOOP.md](references/LOOP.md) — Detailed workflow
-- [EXAMPLES.md](references/EXAMPLES.md) — Before/after examples
+- [markdown-token-optimizer](/.github/skills/markdown-token-optimizer) - Token analysis and optimization
+- [skill-authoring](/.github/skills/skill-authoring) - Skill writing guidelines