AI vulnerability scanning needs an attacker story

Most AI security scanning demos have the same problem.

Frank

~7 min read · May 19, 2026 (Updated: May 19, 2026) · Free: Yes

They look impressive for five minutes, then they collapse into a pile of suspicious-sounding guesses.

The model points at a parser and says "possible buffer overflow." It sees a string concatenated into a shell command and says "possible command injection." It notices a file path and says "possible path traversal." Sometimes it's right. Often it isn't. And when it's wrong, the output still looks polished enough to waste a human reviewer afternoon.

That isn't good enough.

An upcoming Swival release is going to include a major revamp of /audit, Swival's security-audit command. The goal is pretty simple: make agentic security review less like a vulnerability-themed autocomplete run, and more like an even stronger harness for finding, checking, and reporting real bugs.

The current public version of /audit has already found a lot of vulnerabilities in real-world software. Some of those reports are public in the Swival security-audits repository, which collects automated audits run against widely used open-source projects.

Other audits have surfaced complex, exploitable vulnerabilities that were reported privately to maintainers, so I'm not going to name projects or describe triggers here.

That matters because the revamp isn't trying to make /audit useful for the first time.

It's building on something that already works.

Until this lands publicly, people who want this kind of audit run against their own code can reach out to me privately.

If you have been following AI security research, this is in the same broad category as Project Glasswing: using powerful models defensively to find vulnerabilities in important software before attackers do.

The difference is the shape. Project Glasswing is the large, controlled-access, frontier-model version of that idea. Swival is the open CLI harness version: something you can point at a repository, inspect as plain files, and use with the models and workflow you already trust.

The interesting part isn't that an LLM can say "this code looks dangerous."

We already knew that.

The interesting part is building enough machinery around the model that the final report has to survive contact with the code.

The old shape was file-centric

The original /audit command already had the right instinct.

It profiled the repository, triaged files, escalated suspicious code, asked for deeper review, verified proposed findings in isolated worktrees, then generated reports and patches. It wasn't just a single prompt sprayed over a repository.

But the unit of work was still mostly "file."

That's a natural first version. Files are easy to enumerate. They have paths. They have extensions. They give the model a bounded chunk of context.

Security bugs don't always cooperate with that shape.

Real vulnerabilities often live across boundaries: an HTTP route, a deserializer, a cache key, a permission check, a helper buried three calls below an entry point, a parser that is only dangerous when reached from one specific caller. Looking at one file in isolation can find bugs, but it can also miss the attacker story.

The revamped /audit adds a new hunt mode that starts from that attacker story.

Instead of only asking "which files look suspicious?", Swival can now build attacker-anchored hunt tasks. It profiles the repository, identifies entry points and trust boundaries, scans for sink patterns across attack classes, and creates tasks around concrete attacker positions and controlled inputs.

That sounds more abstract, but in practice it is more specific.

The question becomes less "does this file contain something scary?" and more "can this kind of attacker get this kind of input across this boundary into this kind of sink?"

That's the question that matters.

Reachability isn't optional

A lot of AI-generated security reports fail on reachability.

The model finds a dangerous helper, but the helper is only called with constants. It finds a parser bug, but the parser isn't exposed to untrusted input. It finds a scary operation behind a permission check and then forgets the permission check exists.

This is where /audit's becoming much stricter.

Hunt-derived findings carry an explicit reachability status. If the model finds a local bug but can't prove that attacker-controlled input reaches it, Swival queues a separate reachability task. That task has one job: trace the path from a real boundary to the sink, or say why the path is blocked.

There's also an opt-in in-repo reachability trace phase for verified findings. It asks, after the bug has already survived verification, whether an external entry point in this repository can actually reach the vulnerable operation.

That distinction matters a lot.

A local bug can be real and still not be an exploitable vulnerability in the current program. Conversely, a boring-looking helper can become serious when a particular route feeds it attacker-controlled data.

The new pipeline tries to keep those facts separate until it has enough evidence to join them.

The model gets an adversary

Another big change is the adversarial disproof gate.

Before a proposed finding reaches the expensive proof verifier, Swival can hand it to a reviewer whose job isn't to find more bugs. The reviewer is only allowed to try to falsify the claim.

It can say the finding is invalid. It can say the finding needs a specific missing proof step. Or it can say the finding is plausible and should move forward.

This sounds small, but it changes the pressure on the system.

LLMs are good at building a case. Sometimes they are too good at it. If the first agent has already framed a bug as real, the next agent can drift into confirming the narrative.

An explicit disproof step gives the system a chance to ask the opposite question: what concrete guard, missing caller, trusted-only path, or impossible precondition would make this report wrong?

When it works, it removes noise before verification. When it can't decide, it can pass a required proof step into the verifier instead of letting the claim stay vague.

That's the kind of boring harness detail that makes the final output better.

Proofs run in isolated worktrees

The verification phase treats each proposed finding as a hypothesis.

A separate verifier agent runs against an isolated Git worktree at HEAD. It can inspect the code. It can compile or run small proof-of-concept programs when appropriate. It must return a structured proof block with a verdict, proof kind, commands actually run, observed output, trigger, impact, and limitations.

Malformed proof blocks don't count as success.

That matters because "the model says reproduced" isn't a proof. A verifier that can't explain what it did, what input triggered the issue, what output it observed, and what security impact follows shouldn't be able to promote a finding into a report.

Not every bug needs a runtime exploit. Some security-control failures and simple logic bugs are better proven from source. But the pipeline now makes that distinction explicit. Runtime, source, and mixed proofs are tracked separately in the run summary, so a run that quietly drifts toward cheap source-only claims is visible.

Again, this isn't glamorous.

It's useful.

Coverage has to be observable

One of the traps in agentic review is pretending that the agent looked everywhere.

It usually didn't.

The new /audit pipeline adds an observable coverage and gapfill mechanism. Hunt tasks can report which high-priority seeds they didn't inspect or which concrete scope items they didn't cover. Swival can then queue bounded follow-up tasks for those gaps.

The important word is bounded.

Security review can become infinite if you let every unanswered question spawn three more questions. Swival tracks a global token budget, caps gapfill expansion, drops low-priority work when the run is already spending too much, and writes structured run summaries that can be compared across runs.

The point isn't to make exhaustive review magically cheap.

The point is to stop pretending that invisible coverage is the same thing as real coverage.

Reports should be fewer and stronger

The final stages also got sharper.

Verified findings are deduplicated by root cause, with care around trust boundaries. Two endpoints that share a vulnerable helper aren't automatically the same report if the attacker reachability and impact differ. On the other hand, exact duplicate variants shouldn't produce five noisy markdown files.

For each surviving finding, Swival generates a structured report and a patch. The report includes classification, affected locations, provenance, preconditions, proof, why the issue is a real bug, the fix requirement, patch rationale, residual risk, and the patch itself.

The output still lands in audit-findings/, because plain files are a good interface. You can read them, diff them, edit them, apply the patch, or throw them away.

There's no dashboard required to understand what happened.

A step towards Mythos-grade audits for everyone

I'm usually skeptical of AI security tooling claims.

There's a lot of demo gravity in this space. A model can produce plausible vulnerability prose very easily. A screenshot of ten generated findings doesn't tell you whether any of them survive a real maintainer reading the code.

The reason I'm excited about the new /audit work isn't that it makes Swival sound more like a scanner.

It's the opposite.

It makes Swival behave less like a scanner and more like a coordinated review process: profile the repository, choose attacker models, hunt by boundary, split local bug proof from reachability proof, try to disprove claims, verify in isolation, fill observable gaps, dedupe by root cause, then write artifacts a human can actually inspect.

That shape is heavier than a single prompt.

Good.

Security work is full of places where cheap confidence is actively harmful.

This future release won't make /audit perfect. It still depends on model quality. It still sees committed code, not your whole deployment. It still has cost and runtime trade-offs. And human review still matters, especially before public disclosure or production changes.

But it's a serious step toward something I've wanted from coding agents for a while: a tool that doesn't merely suggest that a bug might exist, but keeps working until the claim has either been killed or made concrete.

That's the line I care about.

And with the new /audit, Swival is getting much closer to it.

#ai-agent #security #vulnerability #llm #security-research

< Go to the original