Anthropic Security Plugin-Catch Security Issues as Claude Writes Code

Anthropic just shipped a free plugin that reviews code the moment it's written. I review code for a living — here's my honest take on what it catches, what it misses, and whether it's worth turning on.

The 20-second version

AI assistants write code fast. That's the pitch.

The part nobody puts on the slide: they write insecure code just as fast. The same SQL injection, the same hardcoded API key, the same eval() on on user input we've been shipping for thirty years — except now it arrives in seconds, and in volume.

Anthropic's fix is a plugin called security-guidance. It's free, the plugin makes Claude review its own code changes for common vulnerabilities while it works and fix what it finds in the same session.

The plugin catches issues such as injection, unsafe deserialization, and unsafe DOM APIs before the code reaches a pull request, reducing how much security review falls to human reviewers downstream. Once installed, the plugin runs automatically. There is nothing to invoke and no separate command to remember.

I review code for a living. I came in skeptical. Here's my honest read.

What it actually is

Not a scanner you run after the fact. It lives inside the session where the code is born, and it works in three layers — each catching a different kind of bug at a different moment.

Layer 1 — Pattern matching (instant, zero cost)

The moment Claude writes or edits a file, around 25 regex rules fire. No AI, no latency, no token cost. It's hunting the classic landmines: eval() and new Function(), os.system() and unsafe child_process.exec(), Python pickle deserialization, DOM sinks like innerHTML and dangerouslySetInnerHTML, and command injection in GitHub Actions workflows.

Think of it as a linter with a security grudge.

Layer 2 — Turn-end LLM review

After each turn, a separate Claude model (Opus 4.7 by default) reads the git diff and actually reasons about it. This is where the bugs no regex can find get caught: authorization bypass, IDOR (insecure direct object references), weak or misused cryptography, and SSRF (server-side request forgery).

Layer 3 — Commit-time agentic review

On git commit, an SDK-driven reviewer does the deep pass — reading across multiple files for context, catching multi-file vulnerabilities, and cutting false positives before they burn a developer's afternoon.

Why this design is genuinely smart

1. It catches bugs at the cheapest possible second

Every security pro knows the cost curve: a bug caught while typing costs nothing; the same bug caught in production costs a war room. Layer 1 fires at keystroke-time. That's exactly where the money should be spent.

2. It is layered, not monolithic

Regex is fast but dumb. LLM review is smart but expensive. Stack them and you get instant coverage on known landmines and reasoning-level coverage on subtle logic flaws — paying the AI cost only where it earns its keep.

3. It is honest about its limits :

This is the part I respect most. The docs say it plainly: this is a best-effort tool, not a guarantee. It surfaces findings as suggestions, not hard blocks. And it tells you outright that it does not replace human review, SAST/DAST, dependency scanning, or pen-testing.

In Anthropic's internal testing, the plugin cut security-related comments on pull requests by 30 to 40 percent. Not a claim that security is solved — just real time handed back to senior engineers.

What it detects

The usual web-vuln suspects and list maps cleanly onto the OWASP Top 10.

Install it in one command

/plugin install security-guidance@claude-plugins-official
/reload-plugins

/plugin install security-guidance@claude-plugins-official
/reload-plugins

Prerequisites:

Claude Code CLI ≥ v2.1.144
Python 3.8+ on your PATH
A working API path (Claude subscription, API key, or a third-party provider)

It runs on Claude Opus 4.7 by default, but the model is configurable.

Making it yours — the part teams will care about

Out of the box it knows generic vulnerabilities. The real power is teaching it your threat model.

Encode your own rules in plain English

Drop theclaude-security-guidance.md file in one of three places:

~/.claude/claude-security-guidance.md — applies to everything you do
<project>/.claude/claude-security-guidance.md — committed to the repo, shared with the team
<project>/.claude/claude-security-guidance.local.md — local-only overrides (gitignored)

The rules read like a Slack message to a new hire:

# Acme security rules
- All SELECTs against `customers`/`orders` must use `db.replica`, not `db.primary`
- Background jobs require service-account creds from `jobs.get_service_account()`
- User-controlled URLs to `requests.get()` need the `acme.net.safe_request` wrapper

# Acme security rules
- All SELECTs against `customers`/`orders` must use `db.replica`, not `db.primary`
- Background jobs require service-account creds from `jobs.get_service_account()`
- User-controlled URLs to `requests.get()` need the `acme.net.safe_request` wrapper

Tune the layers with environment variables

SECURITY_GUIDANCE_DISABLE=1— master kill switch ENABLE_PATTERN_RULES=0— turn off Layer 1 ENABLE_CODE_SECURITY_REVIEW=0— turn off LLM reviews ENABLE_COMMIT_REVIEW=0— turn off Layer 3 SECURITY_REVIEW_MODEL=claude-sonnet-4–6 — use a cheaper model SG_DUAL_OR=on— parallel review calls (~2x cost, higher recall)

Too noisy? Drop to a cheaper model, or document why a flagged line is safe — inline or in your guidance file — and it learns to leave it alone.

Read this before you roll it out: data and privacy

First question I ask of anything that "reads your code." Here's what leaves your machine:

Changed file paths and diff hunks Relevant file contents (for context) The contents of your claude-security-guidance.md

Where it goes depends on your config:

Default— api.anthropic.com
LLM gateway — your gateway URL
Bedrock / Vertex — your provider's endpoint

Debug logs sit at ~/.claude/security/log.txt, and per Anthropic, full file contents are not uploaded there. Under strict data-residency rules? Route it through Bedrock or Vertex and keep the traffic inside your own boundary.

My verdict

security-guidance is not a silver bullet, and to Anthropic's credit, it never pretends to be.

What it is: a smart, layered, free shift-left control that catches a meaningful slice of vulnerabilities at the cheapest possible moment — the second the code is written.

If your team already leans on Claude Code, turning this on is close to a no-brainer. It won't retire your SAST pipeline or your human reviewers. It will just make all of them argue with the AI a little less often.

Shift-left security used to be a slogan. This is one of the most concrete versions of it I have seen — and it is free.

Sources and further reading

ClaudeDevs announcement on X: https://x.com/ClaudeDevs/status/2059385239781384341

Official plugin on GitHub: https://github.com/anthropics/claude-plugins-official/tree/main/plugins/security-guidance

Disclaimer: independent analysis based on Anthropic's public docs and reporting. Verify configuration and data-handling details against the official docs before deploying in a regulated environment.

If you found this useful

Follow me for more hands-on DevSecOps and AI-tooling breakdowns. Got the plugin running on your stack? Drop your false-positive rate in the responses — I'm collecting real-world numbers.

Contents