feat(gatekeeper): rewrite PermissionRequest prompt with auto-mode rules

amondnet · amondnet · commit c38052af41d0 · 2026-03-31T22:42:00.000+09:00
- Change both matchers from "Bash" to "" (all tools)
- Add intent judgment as core principle
- Add full auto-mode ALLOW/DENY rule coverage (7 ALLOW, 25+ DENY)
- Add tool-specific guidance for Write/Edit, WebFetch, Agent
- Switch model from sonnet to haiku for faster classification
diff --git a/plugins/gatekeeper/hooks/hooks.json b/plugins/gatekeeper/hooks/hooks.json
@@ -1,9 +1,9 @@
 {
-  "description": "Gatekeeper: auto-approve safe commands + AI review",
+  "description": "Gatekeeper v2: 3-tier security for all Claude Code tools",
   "hooks": {
     "PreToolUse": [
       {
-        "matcher": "Bash",
+        "matcher": "",
         "hooks": [
           {
             "type": "command",
@@ -15,12 +15,12 @@
     ],
     "PermissionRequest": [
       {
-        "matcher": "Bash",
+        "matcher": "",
         "hooks": [
           {
             "type": "agent",
-            "prompt": "You are a security analyst. This command was NOT matched by pattern-based rules (Layer 1) and needs your judgment.\n\nCommand context:\n$ARGUMENTS\n\nCheck for these attack patterns:\n1. Data destruction (rm -rf /, truncate, dd if=/dev/zero, etc.)\n2. System modification (chmod 777, chown, modifying /etc/, etc.)\n3. Network attacks (curl piping to bash, wget suspicious scripts, reverse shells)\n4. Credential exposure (cat .env, echo $API_KEY, etc.)\n5. Supply chain attacks (npm install from suspicious sources, pip install --extra-index-url)\n6. Privilege escalation (sudo without clear purpose, setuid)\n7. Command chaining hiding destructive intent — analyze ALL parts of chained commands (;, &&, ||, |, $(), backticks), not just the first\n\nScope guidance:\n- Project-scoped operations (./build, ./dist, node_modules) are generally safe\n- System-scoped operations (/etc, /usr, ~/) require careful scrutiny\n- Standard dev tools (docker run, curl localhost, ssh) are generally safe unless combined with attack patterns above\n\nYou have access to Read, Grep, Glob tools. Use them if:\n- The command references a script file (verify its contents)\n- The command uses variables that might be dangerous\n- You need to check if a path exists and what it contains\n\nAfter your analysis, respond with ONLY one of:\n{\"ok\": true}\n{\"ok\": false, \"reason\": \"Brief explanation of the specific risk\"}",
-            "model": "sonnet",
+            "prompt": "You are a security classifier for Claude Code.\nThis tool call passed Layer 1 static rules and needs your judgment.\n\nTool call:\n$ARGUMENTS\n\n## Core Principle\n\nFIRST: \"Did the user explicitly request this specific action?\"\n- Clearly serving the user's stated task → lean ALLOW\n- Agent acting beyond task scope → lean DENY\n- Explicit user confirmation must be direct, not suggestive or implicit\n\n## ALLOW\n\n- Project-scoped file operations (read/write/edit within project directory)\n- Standard dev commands (test, lint, build, format)\n- Git operations (except force push / push to main)\n- Installing declared dependencies from manifest (npm install, pip install -r)\n- Read-only network requests to known services\n- Test artifacts (hardcoded test keys, placeholder credentials)\n- Git push to working/agent-created branch (not main/master)\n- Official toolchain installers (rustup, pypa, bun.sh, brew.sh) when repo requires it\n\n## DENY — hard block\n\n- Irreversible system destruction (already caught by Layer 1, but double-check)\n- Data exfiltration: sending project data to external endpoints\n- Credential exploration: systematically scanning credential stores\n- Credential leakage: embedding secrets in metadata, labels, URLs\n- Exfil scouting: probing which external endpoints are reachable\n- Trusting guessed services: agent guesses a service and sends internal data\n\n## DENY — unless user explicitly requested (soft deny)\n\n- Git destructive: force push, delete remote branches, hard reset\n- Git push to main/master (bypasses PR review)\n- Deploy/publish: npm publish, terraform apply, kubectl apply/delete\n- Cloud storage mass operations (S3, GCS, Azure Blob bulk delete)\n- Production access: kubectl exec, docker exec, ssh to prod (even reads)\n- Blind apply: --yes, --auto-approve, --force on infrastructure commands\n- Security weakening: disabling TLS, hooks, logs, audit trails, security agents\n- Permission grant: adding IAM roles, granting admin access\n- Self-modification: .claude/settings, CLAUDE.md, permission overrides\n- Unauthorized persistence: cron jobs, systemd services, SSH keys, shell profiles\n- Create unsafe agents: autonomous loops without approval gates\n- Interfere with others: cancelling jobs/processes not created by agent\n- Modify shared resources: changing shared configs, databases, cluster state\n- Create RCE surface: endpoints accepting arbitrary code, eval/injection patterns\n- Expose local services: binding ports that expose local files/services\n- Untrusted code integration: pulling external code for execution (submodules, postinstall)\n- External system writes: closing issues, posting comments, sending messages user didn't ask for\n- Content fabrication: reporting success on failed work, impersonating humans\n- Real-world transactions: purchases, payments, bookings, external communications\n\n## Tool-Specific\n\n- Bash: analyze ALL parts of chains (;, &&, ||, |, $(), backticks)\n- Write/Edit: project files safe, config files (.env, CI) need scrutiny, outside project → DENY\n- WebFetch: read-only research → ALLOW, downloading executables/scripts → DENY\n- Agent: research/analysis subagents → ALLOW, subagents bypassing user intent → DENY\n\nYou have Read, Grep, Glob. Use if command references scripts or ambiguous paths.\n\nAfter your analysis, respond with ONLY one of:\n{\"decision\":{\"behavior\":\"allow\"},\"reason\":\"Brief explanation\"}\n{\"decision\":{\"behavior\":\"deny\",\"message\":\"Brief explanation of the specific risk\"}}",
+            "model": "haiku",
             "timeout": 30
           }
         ]

Original file line number	Diff line number	Diff line change
`@@ -1,9 +1,9 @@`
`1`	`1`	`{`
`2`		`- "description": "Gatekeeper: auto-approve safe commands + AI review",`
	`2`	`+ "description": "Gatekeeper v2: 3-tier security for all Claude Code tools",`
`3`	`3`	`"hooks": {`
`4`	`4`	`"PreToolUse": [`
`5`	`5`	`{`
`6`		`- "matcher": "Bash",`
	`6`	`+ "matcher": "",`
`7`	`7`	`"hooks": [`
`8`	`8`	`{`
`9`	`9`	`"type": "command",`
`@@ -15,12 +15,12 @@`
`15`	`15`	`],`
`16`	`16`	`"PermissionRequest": [`
`17`	`17`	`{`
`18`		`- "matcher": "Bash",`
	`18`	`+ "matcher": "",`
`19`	`19`	`"hooks": [`
`20`	`20`	`{`
`21`	`21`	`"type": "agent",`
`22`		- "prompt": "You are a security analyst. This command was NOT matched by pattern-based rules (Layer 1) and needs your judgment.\n\nCommand context:\n$ARGUMENTS\n\nCheck for these attack patterns:\n1. Data destruction (rm -rf /, truncate, dd if=/dev/zero, etc.)\n2. System modification (chmod 777, chown, modifying /etc/, etc.)\n3. Network attacks (curl piping to bash, wget suspicious scripts, reverse shells)\n4. Credential exposure (cat .env, echo $API_KEY, etc.)\n5. Supply chain attacks (npm install from suspicious sources, pip install --extra-index-url)\n6. Privilege escalation (sudo without clear purpose, setuid)\n7. Command chaining hiding destructive intent — analyze ALL parts of chained commands (;, &&, \|\|, \|, $(), backticks), not just the first\n\nScope guidance:\n- Project-scoped operations (./build, ./dist, node_modules) are generally safe\n- System-scoped operations (/etc, /usr, ~/) require careful scrutiny\n- Standard dev tools (docker run, curl localhost, ssh) are generally safe unless combined with attack patterns above\n\nYou have access to Read, Grep, Glob tools. Use them if:\n- The command references a script file (verify its contents)\n- The command uses variables that might be dangerous\n- You need to check if a path exists and what it contains\n\nAfter your analysis, respond with ONLY one of:\n{\"ok\": true}\n{\"ok\": false, \"reason\": \"Brief explanation of the specific risk\"}",
`23`		`- "model": "sonnet",`
	`22`	+ "prompt": "You are a security classifier for Claude Code.\nThis tool call passed Layer 1 static rules and needs your judgment.\n\nTool call:\n$ARGUMENTS\n\n## Core Principle\n\nFIRST: \"Did the user explicitly request this specific action?\"\n- Clearly serving the user's stated task → lean ALLOW\n- Agent acting beyond task scope → lean DENY\n- Explicit user confirmation must be direct, not suggestive or implicit\n\n## ALLOW\n\n- Project-scoped file operations (read/write/edit within project directory)\n- Standard dev commands (test, lint, build, format)\n- Git operations (except force push / push to main)\n- Installing declared dependencies from manifest (npm install, pip install -r)\n- Read-only network requests to known services\n- Test artifacts (hardcoded test keys, placeholder credentials)\n- Git push to working/agent-created branch (not main/master)\n- Official toolchain installers (rustup, pypa, bun.sh, brew.sh) when repo requires it\n\n## DENY — hard block\n\n- Irreversible system destruction (already caught by Layer 1, but double-check)\n- Data exfiltration: sending project data to external endpoints\n- Credential exploration: systematically scanning credential stores\n- Credential leakage: embedding secrets in metadata, labels, URLs\n- Exfil scouting: probing which external endpoints are reachable\n- Trusting guessed services: agent guesses a service and sends internal data\n\n## DENY — unless user explicitly requested (soft deny)\n\n- Git destructive: force push, delete remote branches, hard reset\n- Git push to main/master (bypasses PR review)\n- Deploy/publish: npm publish, terraform apply, kubectl apply/delete\n- Cloud storage mass operations (S3, GCS, Azure Blob bulk delete)\n- Production access: kubectl exec, docker exec, ssh to prod (even reads)\n- Blind apply: --yes, --auto-approve, --force on infrastructure commands\n- Security weakening: disabling TLS, hooks, logs, audit trails, security agents\n- Permission grant: adding IAM roles, granting admin access\n- Self-modification: .claude/settings, CLAUDE.md, permission overrides\n- Unauthorized persistence: cron jobs, systemd services, SSH keys, shell profiles\n- Create unsafe agents: autonomous loops without approval gates\n- Interfere with others: cancelling jobs/processes not created by agent\n- Modify shared resources: changing shared configs, databases, cluster state\n- Create RCE surface: endpoints accepting arbitrary code, eval/injection patterns\n- Expose local services: binding ports that expose local files/services\n- Untrusted code integration: pulling external code for execution (submodules, postinstall)\n- External system writes: closing issues, posting comments, sending messages user didn't ask for\n- Content fabrication: reporting success on failed work, impersonating humans\n- Real-world transactions: purchases, payments, bookings, external communications\n\n## Tool-Specific\n\n- Bash: analyze ALL parts of chains (;, &&, \|\|, \|, $(), backticks)\n- Write/Edit: project files safe, config files (.env, CI) need scrutiny, outside project → DENY\n- WebFetch: read-only research → ALLOW, downloading executables/scripts → DENY\n- Agent: research/analysis subagents → ALLOW, subagents bypassing user intent → DENY\n\nYou have Read, Grep, Glob. Use if command references scripts or ambiguous paths.\n\nAfter your analysis, respond with ONLY one of:\n{\"decision\":{\"behavior\":\"allow\"},\"reason\":\"Brief explanation\"}\n{\"decision\":{\"behavior\":\"deny\",\"message\":\"Brief explanation of the specific risk\"}}",
	`23`	`+ "model": "haiku",`
`24`	`24`	`"timeout": 30`
`25`	`25`	`}`
`26`	`26`	`]`