You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
quality-playbook v1.2.0: state machine analysis and missing safeguard detection (#1238)
Add Step 5a (state machine completeness analysis) and expand Step 6
with missing safeguard detection patterns. These catch two categories
of bugs that defensive pattern analysis alone misses: unhandled states
in lifecycle/status machines, and operations that commit users to
expensive work without adequate preview or termination conditions.
Copy file name to clipboardExpand all lines: docs/README.skills.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,7 +222,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
222
222
|[publish-to-pages](../skills/publish-to-pages/SKILL.md)| Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. |`scripts/convert-pdf.py`<br />`scripts/convert-pptx.py`<br />`scripts/publish.sh`|
223
223
|[pytest-coverage](../skills/pytest-coverage/SKILL.md)| Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None |
224
224
|[python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md)| Generate a complete MCP server project in Python with tools, resources, and proper configuration | None |
225
-
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
225
+
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
226
226
|[quasi-coder](../skills/quasi-coder/SKILL.md)| Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None |
227
227
|[readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md)| Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None |
228
228
|[refactor](../skills/refactor/SKILL.md)| Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None |
Copy file name to clipboardExpand all lines: skills/quality-playbook/SKILL.md
+20-3Lines changed: 20 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
---
2
2
name: quality-playbook
3
-
description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase."
3
+
description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Includes state machine completeness analysis and missing safeguard detection. Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase."
4
4
license: Complete terms in LICENSE.txt
5
5
metadata:
6
-
version: 1.1.0
6
+
version: 1.2.0
7
7
author: Andrew Stellman
8
8
github: https://github.com/andrewstellman/
9
9
---
@@ -13,7 +13,7 @@ metadata:
13
13
**When this skill starts, display this banner before doing anything else:**
14
14
15
15
```
16
-
Quality Playbook v1.1.0 — by Andrew Stellman
16
+
Quality Playbook v1.2.0 — by Andrew Stellman
17
17
https://github.com/andrewstellman/
18
18
```
19
19
@@ -158,6 +158,21 @@ This is the most important step. Search for defensive code patterns — each one
158
158
159
159
Minimum bar: at least 2–3 defensive patterns per core source file. If you find fewer, you're skimming — read function bodies, not just signatures.
160
160
161
+
### Step 5a: Trace State Machines
162
+
163
+
If the project has any kind of state management — status fields, lifecycle phases, workflow stages, mode flags — trace the state machine completely. This catches a category of bugs that defensive pattern analysis alone misses: states that exist but aren't handled.
164
+
165
+
**How to find state machines:** Search for status/state fields in models, enums, or constants (e.g., `status`, `state`, `phase`, `mode`). Search for guards that check status before allowing actions (e.g., `if status == "running"`, `match self.state`). Search for state transitions (assignments to status fields).
166
+
167
+
**For each state machine you find:**
168
+
169
+
1.**Enumerate all possible states.** Read the enum, the constants, or grep for every value the field is assigned. List them all.
170
+
2.**For each consumer of state** (UI handlers, API endpoints, control flow guards), check: does it handle every possible state? A `switch`/`match` without a meaningful default, or an `if/elif` chain that doesn't cover all states, is a gap.
171
+
3.**For each state transition**, check: can you reach every state? Are there states you can enter but never leave? Are there states that block operations that should be available?
172
+
4.**Record gaps as findings.** A status guard that allows action X for "running" but not for "stuck" is a real bug if the user needs to perform action X on stuck processes. A process that enters a terminal state but never triggers cleanup is a real bug.
173
+
174
+
**Why this matters:** State machine gaps produce bugs that are invisible during normal operation but surface under stress or edge conditions — exactly when you need the system to work. A batch processor that can't be killed when it's in "stuck" status, or a watcher that never self-terminates after all work completes, or a UI that refuses to resume a "pending" run, are all symptoms of incomplete state handling. These bugs don't show up in defensive pattern analysis because the code isn't defending against them — it's simply not handling them at all.
175
+
161
176
### Step 5b: Map Schema Types
162
177
163
178
If the project has a validation layer (Pydantic models in Python, JSON Schema, TypeScript interfaces/Zod schemas, Java Bean Validation annotations, Scala case class codecs), read the schema definitions now. For every field you found a defensive pattern for, record what the schema accepts vs. rejects.
@@ -179,6 +194,8 @@ Every project has a different failure profile. This step uses **two sources**
179
194
- "What produces correct-looking output that is actually wrong?" — This is the most dangerous class of bug: output that passes all checks but is subtly corrupted.
180
195
- "What happens at 10x scale that doesn't happen at 1x?" — Chunk boundaries, rate limits, timeout cascading, memory pressure.
181
196
- "What happens when this process is killed at the worst possible moment?" — Mid-write, mid-transaction, mid-batch-submission.
197
+
- "What information does the user need before committing to an irreversible or expensive operation?" — Pre-run cost estimates, confirmation of scope (especially when fan-out or expansion will multiply the work), resource warnings. If the system can silently commit the user to hours of processing or significant cost without showing them what they're about to do, that's a missing safeguard. Search for operations that start long-running processes, submit batch jobs, or trigger expansion/fan-out — and check whether the user sees a preview, estimate, or confirmation with real numbers before the point of no return.
198
+
- "What happens when a long-running process finishes — does it actually stop?" — Polling loops, watchers, background threads, and daemon processes that run until completion should have explicit termination conditions. If the loop checks "is there more work?" but never checks "is all work done?", it will run forever after completion. This is especially common in batch processors and queue consumers.
182
199
183
200
Generate realistic failure scenarios from this knowledge. You don't need to have observed these failures — you know from training that they happen to systems of this type. Write them as **architectural vulnerability analyses** with specific quantities and consequences. Frame each as "this architecture permits the following failure mode" — not as a fabricated incident report. Use concrete numbers to make the severity non-negotiable: "If the process crashes mid-write during a 10,000-record batch, `save_state()` without an atomic rename pattern will leave a corrupted state file — the next run gets JSONDecodeError and cannot resume without manual intervention." Then ground them in the actual code you explored: "Read persistence.py line ~340 (save_state): verify temp file + rename pattern."
State machines are a special category of defensive pattern. When you find status fields, lifecycle phases, or mode flags, trace the full state machine — see SKILL.md Step 5a for the complete process.
1. List every possible state value (read the enum or grep for assignments)
154
+
2. For each handler/consumer that checks state, verify it handles ALL states
155
+
3. Look for states you can enter but never leave (terminal state without cleanup)
156
+
4. Look for operations that should be available in a state but are blocked by an incomplete guard
157
+
158
+
**Converting state machine gaps to scenarios:**
159
+
160
+
```markdown
161
+
### Scenario N: [Status] blocks [operation]
162
+
163
+
**Requirement tag:**[Req: inferred — from handler() status guard]
164
+
165
+
**What happened:** The [handler] only allows [operation] when status is "[allowed_states]", but the system can enter "[missing_state]" status (e.g., due to [condition]). When this happens, the user cannot [operation] and has no workaround through the interface.
166
+
167
+
**The requirement:**[operation] must be available in all states where the user would reasonably need it, including [missing_state].
168
+
169
+
**How to verify:** Set up a [entity] in "[missing_state]" status. Attempt [operation]. Assert it succeeds or provides a clear error with a workaround.
170
+
```
171
+
172
+
## Missing Safeguard Patterns
173
+
174
+
Search for operations that commit the user to expensive, irreversible, or long-running work without adequate preview or confirmation:
175
+
176
+
| Pattern | What to look for |
177
+
|---|---|
178
+
| Pre-commit information gap | Operations that start batch jobs, fan-out expansions, or API calls without showing estimated cost, scope, or duration |
179
+
| Silent expansion | Fan-out or multiplication steps where the final work count isn't known until runtime, with no warning shown |
180
+
| No termination condition | Polling loops, watchers, or daemon processes that check for new work but never check whether all work is done |
181
+
| Retry without backoff | Error handling that retries immediately or on a fixed interval without exponential backoff, risking rate limit floods |
182
+
183
+
**Converting missing safeguards to scenarios:**
184
+
185
+
```markdown
186
+
### Scenario N: No [safeguard] before [operation]
187
+
188
+
**Requirement tag:**[Req: inferred — from init_run()/start_watch() behavior]
189
+
190
+
**What happened:**[Operation] commits the user to [consequence] without showing [missing information]. In practice, a [example] fanned out from [small number] to [large number] units with no warning, resulting in [cost/time consequence].
191
+
192
+
**The requirement:** Before committing to [operation], display [safeguard] showing [what the user needs to see].
193
+
194
+
**How to verify:** Initiate [operation] and assert that [safeguard information] is displayed before the point of no return.
195
+
```
196
+
136
197
## Minimum Bar
137
198
138
199
You should find at least 2–3 defensive patterns per source file in the core logic modules. If you find fewer, read function bodies more carefully — not just signatures and comments.
139
200
140
-
For a medium-sized project (5–15 source files), expect to find 15–30 defensive patterns total. Each one should produce at least one boundary test.
201
+
For a medium-sized project (5–15 source files), expect to find 15–30 defensive patterns total. Each one should produce at least one boundary test. Additionally, trace at least one state machine if the project has status/state fields, and check at least one long-running operation for missing safeguards.
0 commit comments