Gatekeeper v2: Extend to all tools + improve AI prompt with auto-mode rules

## Summary

Extend Gatekeeper plugin from Bash-only coverage to **all Claude Code tools** (Write, Edit, WebFetch, Agent, etc.) and improve the PermissionRequest agent hook prompt with rules derived from Claude Code's [built-in auto mode classifier](https://www.anthropic.com/engineering/claude-code-auto-mode).

## Background

Analysis of Claude Code source (`src/utils/permissions/yoloClassifier.ts`) and Anthropic's engineering blog reveals that the built-in auto mode uses a comprehensive 25+ rule set for DENY decisions and 7 ALLOW rules. Current Gatekeeper only covers Bash commands, leaving other tools unprotected.

The built-in auto mode also distinguishes between **hard deny** (absolute block) and **soft deny** (block unless user explicitly requested). Current Gatekeeper treats all denials the same — this should be differentiated.

## Current State (v1)

- **PreToolUse**: Bash only — regex DENY/ALLOW rules + chain command parser
- **PermissionRequest**: Bash only — agent hook with 7 attack pattern checks
- **Other tools**: No coverage (Write, Edit, WebFetch, Agent, etc.)
- **No soft_deny**: All denials are hard blocks — no intent-aware classification

## Proposed Changes

### 1. Introduce 3-tier decision system (hard_deny / soft_deny / allow)

The key insight from auto-mode source analysis: **not all denials are equal**. Some actions should be absolutely blocked (rm -rf /), while others should be blocked *unless the user explicitly requested them* (git push --force).

#### Hook flow with soft_deny

```
PreToolUse (Layer 1 — static rules, ~0ms)
  │
  ├─ hard_deny  → exit code 2 + stderr message → tool blocked immediately
  │                (rm -rf /, mkfs, dd if=/dev/zero, etc.)
  │
  ├─ allow      → exit code 0 + JSON { permissionDecision: "allow" }
  │                (npm test, git status, ls, etc.)
  │
  ├─ soft_deny  → return null (passthrough) → proceeds to PermissionRequest
  │                (git push --force, npm publish, kubectl apply, etc.)
  │
  └─ unknown    → return null (passthrough) → proceeds to PermissionRequest

PermissionRequest (Layer 2 — AI review, ~2-5s)
  │
  ├─ AI judges: "Did the user explicitly request this?"
  │   ├─ Yes → { decision: { behavior: "allow" } }
  │   └─ No  → { decision: { behavior: "deny", message: "..." } }
  │
  └─ AI unsure → {} (empty response = passthrough to next hook)
      └─ No more hooks → user sees permission prompt (= ask)
```

#### Classification examples

| Command | Layer 1 | Reason |
|---------|---------|--------|
| `rm -rf /` | **hard_deny** | Irreversible system destruction |
| `mkfs.ext4 /dev/sda` | **hard_deny** | Disk format |
| `dd if=/dev/zero of=/dev/sda` | **hard_deny** | Disk zeroing |
| `git push --force` | **soft_deny** → AI | Destructive but sometimes intended |
| `git push origin main` | **soft_deny** → AI | Bypasses PR review but sometimes intended |
| `npm publish` | **soft_deny** → AI | Public release, may be intended |
| `kubectl apply` | **soft_deny** → AI | Infrastructure change, may be intended |
| `terraform apply` | **soft_deny** → AI | Infrastructure change, may be intended |
| `npm test` | **allow** | Safe dev command |
| `git status` | **allow** | Read-only |
| `ls`, `cat`, `grep` | **allow** | Read-only inspection |

#### Implementation in PreToolUse

```typescript
type Decision = 'hard_deny' | 'soft_deny' | 'allow' | null

function classifyBash(cmd: string): { decision: Decision, reason: string } | null {
  // 1. Hard deny — absolute blocks, never passthrough
  for (const rule of HARD_DENY_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'hard_deny', reason: rule.reason }
  }

  // 2. Soft deny — passthrough to AI for intent judgment
  for (const rule of SOFT_DENY_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'soft_deny', reason: rule.reason }
  }

  // 3. Allow — safe commands
  for (const rule of ALLOW_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'allow', reason: rule.reason }
  }

  return null // unknown → passthrough to AI
}

// In the main evaluate function:
switch (result.decision) {
  case 'hard_deny':
    return makeDecision('deny', result.reason)  // exit code 2
  case 'soft_deny':
    return null  // passthrough → PermissionRequest hook handles it
  case 'allow':
    return makeDecision('allow', result.reason)
}
```

#### Soft deny rules (passthrough to AI)

These match auto-mode `soft_deny` rules — dangerous but sometimes intentionally requested:

```typescript
const SOFT_DENY_RULES: Rule[] = [
  // Git destructive
  { pattern: /git\s+push\s+.*--force\b/i, reason: 'Force push needs user intent verification' },
  { pattern: /git\s+push\s+.*\b(main|master)\b/i, reason: 'Push to default branch needs user intent verification' },
  { pattern: /git\s+reset\s+--hard/i, reason: 'Hard reset needs user intent verification' },
  { pattern: /git\s+clean\s+-[a-z]*f/i, reason: 'Git clean needs user intent verification' },

  // Deploy/publish
  { pattern: /^npm\s+publish\b/i, reason: 'Package publish needs user intent verification' },
  { pattern: /^(terraform|pulumi)\s+apply\b/i, reason: 'Infrastructure apply needs user intent verification' },
  { pattern: /^kubectl\s+(apply|delete)\b/i, reason: 'Kubernetes mutation needs user intent verification' },

  // Self-modification
  { pattern: /\b(\.claude\/settings|CLAUDE\.md)\b/i, reason: 'Agent self-modification needs user intent verification' },

  // Security weakening
  { pattern: /\b--no-verify\b/i, reason: 'Skipping verification needs user intent verification' },
  { pattern: /\bchmod\s+777\b/i, reason: 'Broad permission change needs user intent verification' },
]
```

### 2. Extend PreToolUse to all tools (Layer 1 — static rules, ~0ms)

| Tool | Rule Type | hard_deny | soft_deny | allow |
|------|-----------|-----------|-----------|-------|
| **Bash** | Regex | rm -rf /, mkfs, dd, curl\|bash | git push --force, npm publish, kubectl apply | npm test, git status, ls |
| **Write/Edit** | Path-based | — | `.env`, `.claude/settings`, CI configs | project-relative paths |
| **WebFetch** | URL-based | — | paste services, script downloads | localhost, known dev services |
| **Safe tools** | Instant allow | — | — | Read, Glob, Grep, LSP, TaskCreate, etc. |
| **Unknown tools** | Passthrough | — | — | → Layer 2 |

Change `matcher` from `"Bash"` to `""` (all tools).

### 3. Improve PermissionRequest prompt (Layer 2 — AI review)

Key improvements based on auto-mode analysis:

- **Add intent judgment**: "Did the user explicitly request this action?" (auto-mode core principle — catches overeager behavior, essential for soft_deny items)
- **Add context about soft_deny**: Layer 2 receives commands that Layer 1 flagged as "potentially dangerous but possibly intended" — the AI should know this
- **Add missing DENY rules from auto-mode defaults**:
  - Data exfiltration to external endpoints
  - Supply chain (agent-chosen packages vs declared dependencies)
  - External system writes (closing issues, posting comments user didn't ask for)
  - Content fabrication / impersonation
  - Credential exploration (scanning behavior itself is the violation)
- **Add ALLOW rules** (reduces false positives): project-scoped ops, declared deps, read-only network
- **Add tool-specific guidance**: Write/Edit path rules, WebFetch URL rules, Agent subagent rules
- **Change matcher** from `"Bash"` to `""` (all tools)

### 4. Consider model optimization

- Current: `model: "sonnet"` for all AI reviews
- Proposed: `model: "haiku"` — sufficient for classification, cheaper, ~3x faster
- Most complex commands already handled by Layer 1 regex; Layer 2 only sees edge cases

## Design Principles (from auto-mode source analysis)

1. **3-tier classification**: hard_deny (absolute block) / soft_deny (AI judges intent) / allow (instant pass)
2. **Reasoning-blind**: `$ARGUMENTS` naturally provides only `tool_name` + `tool_input` (no agent reasoning text, no tool results) — this is correct
3. **Minimize AI calls**: Static rules in Layer 1 should handle 80%+ of decisions at zero cost. Soft deny items are the primary AI workload.
4. **Intent over pattern**: The AI prompt should judge "did the user want this?" not just "is this a known attack pattern?"
5. **Fail-open for unknowns**: Unknown tools/commands → passthrough → user prompt (not silent allow or deny)

## Complementary with Sandbox

This enhancement works best alongside `sandbox: { enabled: true }`:
- **Sandbox**: kernel-level isolation (filesystem + network boundaries)
- **Gatekeeper**: intent-level judgment (within allowed boundaries)
- Together: sandbox guarantees worst-case, Gatekeeper manages the everyday

## References

- [Anthropic: Claude Code Auto Mode](https://www.anthropic.com/engineering/claude-code-auto-mode)
- [Anthropic: Claude Code Sandboxing](https://www.anthropic.com/engineering/claude-code-sandboxing)
- [Claude Code Hooks Documentation](https://code.claude.com/docs/en/hooks)
- Source: `src/utils/permissions/yoloClassifier.ts`, `src/utils/permissions/classifierDecision.ts`

## Tasks

- [ ] Implement 3-tier decision system (hard_deny / soft_deny / allow) in PreToolUse
- [ ] Define soft_deny rules for Bash (git force push, npm publish, kubectl apply, self-modification, security weakening)
- [ ] Define soft_deny rules for Write/Edit (.env, .claude/settings, CI configs)
- [ ] Define soft_deny rules for WebFetch (paste services, script downloads)
- [ ] Extend `pre-tool-use.ts` evaluate function to handle Write, Edit, WebFetch, and safe tool allowlist
- [ ] Add missing Bash hard_deny rules
- [ ] Update PermissionRequest prompt with intent judgment, soft_deny context, and auto-mode rules
- [ ] Change both hook matchers from `"Bash"` to `""`
- [ ] Evaluate haiku vs sonnet for Layer 2 accuracy
- [ ] Add tests for hard_deny, soft_deny, and allow classifications
- [ ] Update README with 3-tier system and all-tool coverage documentation

Command	Layer 1	Reason
`rm -rf /`	hard_deny	Irreversible system destruction
`mkfs.ext4 /dev/sda`	hard_deny	Disk format
`dd if=/dev/zero of=/dev/sda`	hard_deny	Disk zeroing
`git push --force`	soft_deny → AI	Destructive but sometimes intended
`git push origin main`	soft_deny → AI	Bypasses PR review but sometimes intended
`npm publish`	soft_deny → AI	Public release, may be intended
`kubectl apply`	soft_deny → AI	Infrastructure change, may be intended
`terraform apply`	soft_deny → AI	Infrastructure change, may be intended
`npm test`	allow	Safe dev command
`git status`	allow	Read-only
`ls`, `cat`, `grep`	allow	Read-only inspection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gatekeeper v2: Extend to all tools + improve AI prompt with auto-mode rules #135

Summary

Background

Current State (v1)

Proposed Changes

1. Introduce 3-tier decision system (hard_deny / soft_deny / allow)

Hook flow with soft_deny

Classification examples

Implementation in PreToolUse

Soft deny rules (passthrough to AI)

2. Extend PreToolUse to all tools (Layer 1 — static rules, ~0ms)

3. Improve PermissionRequest prompt (Layer 2 — AI review)

4. Consider model optimization

Design Principles (from auto-mode source analysis)

Complementary with Sandbox

References

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool	Rule Type	hard_deny	soft_deny	allow
Bash	Regex	rm -rf /, mkfs, dd, curl\|bash	git push --force, npm publish, kubectl apply	npm test, git status, ls
Write/Edit	Path-based	—	`.env`, `.claude/settings`, CI configs	project-relative paths
WebFetch	URL-based	—	paste services, script downloads	localhost, known dev services
Safe tools	Instant allow	—	—	Read, Glob, Grep, LSP, TaskCreate, etc.
Unknown tools	Passthrough	—	—	→ Layer 2

Gatekeeper v2: Extend to all tools + improve AI prompt with auto-mode rules #135

Description

Summary

Background

Current State (v1)

Proposed Changes

1. Introduce 3-tier decision system (hard_deny / soft_deny / allow)

Hook flow with soft_deny

Classification examples

Implementation in PreToolUse

Soft deny rules (passthrough to AI)

2. Extend PreToolUse to all tools (Layer 1 — static rules, ~0ms)

3. Improve PermissionRequest prompt (Layer 2 — AI review)

4. Consider model optimization

Design Principles (from auto-mode source analysis)

Complementary with Sandbox

References

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions