Skip to content

Gatekeeper v2: Extend to all tools + improve AI prompt with auto-mode rules #135

@amondnet

Description

@amondnet

Summary

Extend Gatekeeper plugin from Bash-only coverage to all Claude Code tools (Write, Edit, WebFetch, Agent, etc.) and improve the PermissionRequest agent hook prompt with rules derived from Claude Code's built-in auto mode classifier.

Background

Analysis of Claude Code source (src/utils/permissions/yoloClassifier.ts) and Anthropic's engineering blog reveals that the built-in auto mode uses a comprehensive 25+ rule set for DENY decisions and 7 ALLOW rules. Current Gatekeeper only covers Bash commands, leaving other tools unprotected.

The built-in auto mode also distinguishes between hard deny (absolute block) and soft deny (block unless user explicitly requested). Current Gatekeeper treats all denials the same — this should be differentiated.

Current State (v1)

  • PreToolUse: Bash only — regex DENY/ALLOW rules + chain command parser
  • PermissionRequest: Bash only — agent hook with 7 attack pattern checks
  • Other tools: No coverage (Write, Edit, WebFetch, Agent, etc.)
  • No soft_deny: All denials are hard blocks — no intent-aware classification

Proposed Changes

1. Introduce 3-tier decision system (hard_deny / soft_deny / allow)

The key insight from auto-mode source analysis: not all denials are equal. Some actions should be absolutely blocked (rm -rf /), while others should be blocked unless the user explicitly requested them (git push --force).

Hook flow with soft_deny

PreToolUse (Layer 1 — static rules, ~0ms)
  │
  ├─ hard_deny  → exit code 2 + stderr message → tool blocked immediately
  │                (rm -rf /, mkfs, dd if=/dev/zero, etc.)
  │
  ├─ allow      → exit code 0 + JSON { permissionDecision: "allow" }
  │                (npm test, git status, ls, etc.)
  │
  ├─ soft_deny  → return null (passthrough) → proceeds to PermissionRequest
  │                (git push --force, npm publish, kubectl apply, etc.)
  │
  └─ unknown    → return null (passthrough) → proceeds to PermissionRequest

PermissionRequest (Layer 2 — AI review, ~2-5s)
  │
  ├─ AI judges: "Did the user explicitly request this?"
  │   ├─ Yes → { decision: { behavior: "allow" } }
  │   └─ No  → { decision: { behavior: "deny", message: "..." } }
  │
  └─ AI unsure → {} (empty response = passthrough to next hook)
      └─ No more hooks → user sees permission prompt (= ask)

Classification examples

Command Layer 1 Reason
rm -rf / hard_deny Irreversible system destruction
mkfs.ext4 /dev/sda hard_deny Disk format
dd if=/dev/zero of=/dev/sda hard_deny Disk zeroing
git push --force soft_deny → AI Destructive but sometimes intended
git push origin main soft_deny → AI Bypasses PR review but sometimes intended
npm publish soft_deny → AI Public release, may be intended
kubectl apply soft_deny → AI Infrastructure change, may be intended
terraform apply soft_deny → AI Infrastructure change, may be intended
npm test allow Safe dev command
git status allow Read-only
ls, cat, grep allow Read-only inspection

Implementation in PreToolUse

type Decision = 'hard_deny' | 'soft_deny' | 'allow' | null

function classifyBash(cmd: string): { decision: Decision, reason: string } | null {
  // 1. Hard deny — absolute blocks, never passthrough
  for (const rule of HARD_DENY_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'hard_deny', reason: rule.reason }
  }

  // 2. Soft deny — passthrough to AI for intent judgment
  for (const rule of SOFT_DENY_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'soft_deny', reason: rule.reason }
  }

  // 3. Allow — safe commands
  for (const rule of ALLOW_RULES) {
    if (rule.pattern.test(cmd)) return { decision: 'allow', reason: rule.reason }
  }

  return null // unknown → passthrough to AI
}

// In the main evaluate function:
switch (result.decision) {
  case 'hard_deny':
    return makeDecision('deny', result.reason)  // exit code 2
  case 'soft_deny':
    return null  // passthrough → PermissionRequest hook handles it
  case 'allow':
    return makeDecision('allow', result.reason)
}

Soft deny rules (passthrough to AI)

These match auto-mode soft_deny rules — dangerous but sometimes intentionally requested:

const SOFT_DENY_RULES: Rule[] = [
  // Git destructive
  { pattern: /git\s+push\s+.*--force\b/i, reason: 'Force push needs user intent verification' },
  { pattern: /git\s+push\s+.*\b(main|master)\b/i, reason: 'Push to default branch needs user intent verification' },
  { pattern: /git\s+reset\s+--hard/i, reason: 'Hard reset needs user intent verification' },
  { pattern: /git\s+clean\s+-[a-z]*f/i, reason: 'Git clean needs user intent verification' },

  // Deploy/publish
  { pattern: /^npm\s+publish\b/i, reason: 'Package publish needs user intent verification' },
  { pattern: /^(terraform|pulumi)\s+apply\b/i, reason: 'Infrastructure apply needs user intent verification' },
  { pattern: /^kubectl\s+(apply|delete)\b/i, reason: 'Kubernetes mutation needs user intent verification' },

  // Self-modification
  { pattern: /\b(\.claude\/settings|CLAUDE\.md)\b/i, reason: 'Agent self-modification needs user intent verification' },

  // Security weakening
  { pattern: /\b--no-verify\b/i, reason: 'Skipping verification needs user intent verification' },
  { pattern: /\bchmod\s+777\b/i, reason: 'Broad permission change needs user intent verification' },
]

2. Extend PreToolUse to all tools (Layer 1 — static rules, ~0ms)

Tool Rule Type hard_deny soft_deny allow
Bash Regex rm -rf /, mkfs, dd, curl|bash git push --force, npm publish, kubectl apply npm test, git status, ls
Write/Edit Path-based .env, .claude/settings, CI configs project-relative paths
WebFetch URL-based paste services, script downloads localhost, known dev services
Safe tools Instant allow Read, Glob, Grep, LSP, TaskCreate, etc.
Unknown tools Passthrough → Layer 2

Change matcher from "Bash" to "" (all tools).

3. Improve PermissionRequest prompt (Layer 2 — AI review)

Key improvements based on auto-mode analysis:

  • Add intent judgment: "Did the user explicitly request this action?" (auto-mode core principle — catches overeager behavior, essential for soft_deny items)
  • Add context about soft_deny: Layer 2 receives commands that Layer 1 flagged as "potentially dangerous but possibly intended" — the AI should know this
  • Add missing DENY rules from auto-mode defaults:
    • Data exfiltration to external endpoints
    • Supply chain (agent-chosen packages vs declared dependencies)
    • External system writes (closing issues, posting comments user didn't ask for)
    • Content fabrication / impersonation
    • Credential exploration (scanning behavior itself is the violation)
  • Add ALLOW rules (reduces false positives): project-scoped ops, declared deps, read-only network
  • Add tool-specific guidance: Write/Edit path rules, WebFetch URL rules, Agent subagent rules
  • Change matcher from "Bash" to "" (all tools)

4. Consider model optimization

  • Current: model: "sonnet" for all AI reviews
  • Proposed: model: "haiku" — sufficient for classification, cheaper, ~3x faster
  • Most complex commands already handled by Layer 1 regex; Layer 2 only sees edge cases

Design Principles (from auto-mode source analysis)

  1. 3-tier classification: hard_deny (absolute block) / soft_deny (AI judges intent) / allow (instant pass)
  2. Reasoning-blind: $ARGUMENTS naturally provides only tool_name + tool_input (no agent reasoning text, no tool results) — this is correct
  3. Minimize AI calls: Static rules in Layer 1 should handle 80%+ of decisions at zero cost. Soft deny items are the primary AI workload.
  4. Intent over pattern: The AI prompt should judge "did the user want this?" not just "is this a known attack pattern?"
  5. Fail-open for unknowns: Unknown tools/commands → passthrough → user prompt (not silent allow or deny)

Complementary with Sandbox

This enhancement works best alongside sandbox: { enabled: true }:

  • Sandbox: kernel-level isolation (filesystem + network boundaries)
  • Gatekeeper: intent-level judgment (within allowed boundaries)
  • Together: sandbox guarantees worst-case, Gatekeeper manages the everyday

References

Tasks

  • Implement 3-tier decision system (hard_deny / soft_deny / allow) in PreToolUse
  • Define soft_deny rules for Bash (git force push, npm publish, kubectl apply, self-modification, security weakening)
  • Define soft_deny rules for Write/Edit (.env, .claude/settings, CI configs)
  • Define soft_deny rules for WebFetch (paste services, script downloads)
  • Extend pre-tool-use.ts evaluate function to handle Write, Edit, WebFetch, and safe tool allowlist
  • Add missing Bash hard_deny rules
  • Update PermissionRequest prompt with intent judgment, soft_deny context, and auto-mode rules
  • Change both hook matchers from "Bash" to ""
  • Evaluate haiku vs sonnet for Layer 2 accuracy
  • Add tests for hard_deny, soft_deny, and allow classifications
  • Update README with 3-tier system and all-tool coverage documentation

Metadata

Metadata

Assignees

Labels

p2Priority 2 - Medium

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions