Implementing Agentic Zero Trust

The next phase of AI security is about identity, cryptographic accountability, and infrastructure that enforces trust instead of assuming it.

Read this article for free

Your agent shipped last quarter, and it works. It reconciles invoices, opens pull requests, and answers customers at 2 a.m. without complaint. Then one morning, someone from security, or legal, or a regulator asks a deceptively simple question: which agent did that, on whose authority, and can you prove it?

If your honest answer is "it ran under a developer's OAuth token," you don't have an answer. You have a liability.

For the last couple of years, the security model for autonomous agents was roughly this: write a very stern system prompt, grant the agent a human's credentials so it can actually do its job, and hope it behaves. That worked right up until it didn't. We are now entering the era of Agentic Zero Trust, where trust stops being something you assume in a prompt and becomes something the infrastructure proves on every single action.

Three forces are colliding to force this shift, and if you build agents or you're accountable for securing them, all three are now your problem. Capability is outrunning the defenders. Attackers are weaponizing the agents we deployed. And regulators have stopped treating AI as a tool and started treating it as a digital actor that someone has to answer for. Let's walk through what's actually happening and what a defensible architecture looks like.

The Delusion Gap

Start with the uncomfortable numbers. According to Gravitee's 2026 State of AI Agent Security report, 88% of organizations experienced a confirmed or suspected AI agent security incident in the past year. In the same survey, 82% of executives said they were confident their existing security policies had them covered. Those two figures cannot both be true, and attackers have noticed which one is real.

The deeper problem is visibility. Roughly half of all enterprise identity activity is now driven by non-human identities that security teams cannot see, which means that when an autonomous agent takes a consequential action, it is often legally impossible to trace it back to a specific actor. This is the accountability vacuum, and it exists by design. Because most agents inherit the broad access of the human who launched them, your audit log shows a person doing something a person never did.

It gets worse at machine speed. Shadow AI on corporate devices has jumped roughly 300% in a single year, with a large share of employees feeding corporate data into agents nobody approved. The average enterprise still takes close to eight months to claw back excessive cloud permissions. At human speed, that is merely embarrassing. At agent speed, where a single workflow can spawn thousands of sub-tasks before your SOC finishes reading the first alert, eight months is a geological epoch. This year, a credential exposure put billions of stolen agentic sessions into circulation, harvested by infostealer malware that scoops up authentication cookies and quietly sidesteps MFA. Attackers don't crack anything. They borrow a live session and ride your agent straight into the data lake as a "legitimate" user.

Underneath all of it is what people now call the lethal trifecta: an agent with access to private data, exposure to untrusted external content like a PDF or an email or a market feed, and the ability to reach external APIs. Any one of those is manageable. All three together, running in an autonomous loop, are a self-service exfiltration pipeline that you built and pay to host.

Here is the part that security officers already suspect. Legacy Identity management (IAM) was never designed for this. It assumes a human rhythm of requests, hands static and broad permissions to service accounts built for one fixed job, and expects identities to be relatively stable. Agents violate all three assumptions. They fire thousands of calls a second, they change their intent and tool requirements on the fly based on model output, and most of them ride on a borrowed human token. Bringing legacy IAM to this fight is like bringing a knife to a thousand-subagent gunfight.

A Short Tour of Things That Were "Secure by Default"

If you still trust your agent framework out of the box, here are a few recent reasons not to.

The Semantic Kernel surprise. Microsoft's own researchers showed that prompt injection isn't just about coaxing a chatbot into saying something rude. It can escalate to remote code execution on the host. They found two CVEs in the popular Semantic Kernel framework. The Python SDK's default vector store filter was implemented with eval() on unsanitized input, which is the security equivalent of leaving the front door open with a polite note. The .NET SDK quietly exposed a file-transfer helper to the model with no path validation. Neither was an obscure edge case. Both shipped in the default configuration.

The transport-layer "feature." Researchers at OX Security found that the MCP STDIO transport layer executes any operating system command it receives with absolutely zero sanitization or execution boundary enforcement, exposing servers to unauthenticated command injection. The scale of the exposure is massive, given that MCP has over 150 million downloads and acts as the connective backbone for the agentic ecosystem. OX Security extrapolated that approximately 200,000 MCP instances were exposed, based on scanning 7,000 servers facing the public internet

Controversially, Anthropic labeled this raw execution-without-sanitization behavior as a "feature" rather than a vulnerability. This stance has raised significant accountability concerns, as it places the burden of security entirely on the enterprise deployer rather than the protocol designers

SymJack, or why "human in the loop" can be a lie. Adversa AI disclosed a symlink race-condition attack, dubbed SymJack, that affected all five major coding agents, including Claude Code, Cursor, and Copilot. The agent shows you a safe file path. You click approve. In the sliver of time between your click and the actual execution, an attacker swaps that path for a symlink pointing at your SSH keys or OS binaries. The agent runs against the real target, not the one you blessed. Your approval was honest when you read it and a lie by the time it mattered. If your compliance story depends on a human pressing approve, that story has a race condition in it. Long story short: Claiming "human in the loop" will not help you in court.

The skill supply chain. A Snyk audit of the public skill ecosystem found that 13.4% of skills contained critical security issues, 36% were vulnerable to prompt injection, and 76 shipped confirmed credential-stealing backdoors. Your agent is only as trustworthy as the capabilities you let it install, and the marketplace is, charitably, a mixed neighborhood.

Why a Better Prompt Will Never Save You

The instinct here is to reach for the thing that always felt like it worked: write a stricter prompt, add more rules, tell the model very firmly this time. The research says stop.

A 2026 paper by Sahar Abdelnabi and Eugene Bagdasarian argues, using contextual integrity theory, that prompt injection in agentic systems cannot be fully prevented at the application layer. The logic is elegant and a little bleak. Any filter strict enough to block every malicious instruction will inevitably block legitimate ones as well, because an adversary can always construct a context in which the forbidden instruction appears entirely reasonable. The difference between a good instruction and an attack lives in context, not in the text, and the model has no privileged channel telling it which is which.

It compounds when agents talk to each other. Researcher Krti Tallam formalized what happens to authority in multi-agent systems, and the results aren't reassuring. When Agent A delegates to Agent B, permission leaks. You get transitive delegation, where authority flows further than anyone intended, and aggregation inference, where agents quietly assemble restricted information from individually unrestricted pieces. No attacker required. It emerges from normal production behavior. A separate result from Bajaj and colleagues makes the point land harder: safety in a multi-agent system is determined by the network topology, not by how well-aligned each individual model is. Wire your agents together in the wrong order, and you get ordering instability and information cascades, where an early mistake compounds irreversibly even when every agent passed its safety benchmarks with a gold star.

So tape this sentence to your monitor. For a transformer, your instructions and the attacker's injected instructions are the same tokens. That is precisely why security cannot live in the prompt. It has to live in the harness: the orchestration layer around the model that governs which tools it can reach, what it can touch, and what gets logged. Prompt engineering is interior decorating. Harness engineering is the load-bearing wall.

The Regulatory Awakening

Here is the shift that turns all of this from a good-practices blog post into a board-level mandate. The frameworks most enterprises lean on, ISO/IEC 42001 and the NIST AI Risk Management Framework, were written to govern static models that produce text or predictions. They are genuinely useful, and they offer almost nothing on the questions agents raise: how to set an autonomy threshold, how to monitor goal drift, or how to manage an agent that can spawn its own sub-agents and delegate authority on its own initiative. The map predates the territory.

Regulators have figured this out, and the response is consistent across jurisdictions. Intelligence and standards bodies, including CISA, the NSA, and the broader Five Eyes coalition, are pushing organizations to extend zero trust and least privilege directly to autonomous systems rather than treating AI as an exception. The practical translation is blunt: stop sharing human OAuth tokens with AI, and give every agent a unique, auditable, short-lived non-human identity.

The accountability question is being closed from the legal side too. Singapore's Model AI Governance Framework, which has been extended toward agentic systems, anchors a principle the whole field is converging on: humans remain ultimately accountable for what an AI system does, no matter how autonomous it becomes. "The AI made a mistake" is being struck from the list of acceptable answers. In the United States, state-level frontier-model legislation, including bills like Illinois' SB 315, points toward third-party audits and published catastrophic-risk assessments becoming a de facto national baseline rather than a nice-to-have.

The financial sector is operationalizing accountability through a paradigm worth knowing by name: Know Your Agent, or KYA. Just as Know Your Customer ties a banking action to a verified human, KYA issues real-time trust tokens that cryptographically bind a verified human, their device, and the transacting agent into a single unbroken chain. Every autonomous trade, transfer, or query becomes attributable to a named owner, which is what makes dispute resolution and audit tractable instead of theoretical.

And when an action carries financial or regulatory weight, a plain text log file stops being sufficient evidence. The emerging expectation is that agentic operations be cryptographically verifiable end to end. Several standards are racing to fill that gap. The Agent Trust Protocol aims to be the SSL/TLS of the agentic web, providing cryptographic proof of who an agent is, what it is authorized to do, and whether its instructions were tampered with in transit. Supply-chain frameworks require that downloaded skills be scanned and cryptographically signed before an agent is allowed to use them. And forward-looking work such as MAGIQ enforces agent-to-agent communication policies with post-quantum primitives, so that long delegation chains stay auditable even against an adversary with tomorrow's hardware. The throughline is simple to state and hard to fake: compliance is becoming code.

Building the Brakes

The good news for everyone who has to live with these mandates is that the tooling is arriving fast, and a lot of it is shipping inside platforms you already run. A defensible architecture rests on four pillars.

Pillar one: give every agent a real identity. Treat agents as non-human identities and govern them with the same paranoia you reserve for your most accident-prone contractor. The first move is to stop letting them borrow human tokens. Microsoft made this concrete with Entra Agent ID, now generally available to all Entra customers. It assigns each agent its own governed identity rather than a human's, applies conditional access and real-time risk evaluation before the agent reaches any sensitive data, and logs every action for auditing. Two of its constructs map directly onto the regulatory demands above. A "sponsor" records the specific human accountable for an agent, which is KYA-style accountability baked into the identity object, and an "agent identity blueprint" standardizes owners, access envelopes, and lifecycle controls so you are not hand-rolling governance per agent. Microsoft even warns that assigning a normal human user account to an agent breaks zero trust enforcement because policies tuned for human sign-in patterns misfire on machine behavior. SailPoint's Agentic Fabric and Cequence's persona-based scoping push the same idea from other angles, the latter handing each agent persona its own virtual endpoint so it can't inherit a human's full and dangerous permission set. The organizing principle underneath all of it is zero standing privilege: no permanent permissions, only short-lived access granted at the moment of need and revoked right after.

Pillar two: trap the agent in a box it cannot escape. If you can't guarantee an agent won't run hostile code, make sure it can't do damage when it does. NVIDIA's OpenShell builds kernel-level sandboxes using Landlock and Seccomp BPF, enforcing filesystem, network, and process limits far below the application layer where a compromised agent simply cannot reach to override them. Google's Scion acts as a hypervisor for agents, isolating each agent with its own credentials and a copy of the working tree. Microsoft's micro-VM pattern goes further: the agent writes a short script to call its tools, that script runs exactly once in a fresh disposable VM, and then the VM is thrown away. This thinking is also reaching the surface of everyday development. Google Antigravity, the agent-first development platform Google launched in late 2025, ships with allow lists, deny lists, browser allow lists, and terminal execution modes that require the agent to ask before running a command rather than executing blindly. When governance shows up as a default in the IDE itself, you know the center of gravity has moved.

Pillar three: take the model out of the routing decision. A large share of agentic risk comes from letting the LLM decide what happens next, because that decision is exactly what injection hijacks. So don't let it. Deterministic routing tools like Microsoft Conductor define orchestration logic in inspectable YAML, so a malicious prompt has nothing to grab onto in the control flow. Inline gateways sit on the network and adjudicate every tool call in real time, enforcing least privilege at the level of individual read and write operations and logging each one against the identity behind it. The model gets to be clever about content. It does not get to be in charge.

Pillar four: vet the skills like third-party malware. Until proven otherwise, treat every downloadable capability as hostile. Scanning tools inspect skills for credential-exfiltration paths and hidden instructions, then cryptographically sign the clean ones using the OpenSSF Model Signing standard, which is real, shipping, and already being adopted by major model hubs. The payoff is a software bill of materials for your agent's capabilities: you know exactly what it's allowed to learn, and you can prove nobody tampered with it on the way in.

The Connective Tissue

Two open standards are quietly becoming the TCP/IP of this world, and both matter to anyone responsible for the blast radius. The Model Context Protocol, introduced by Anthropic, is how agents connect to tools and data, and used well, it supports outbound-only tunnels that let an agent reach private data without you punching an inbound hole in the firewall. Agent2Agent, originally built by Google and now governed by the Linux Foundation, lets agents from different vendors discover one another and delegate work using signed agent cards for identity, without handing over their internal memory. The reason to lean on open, inspectable standards is the same reason it always is in security: you want plumbing you can audit, not a black box you have to trust on faith.

Watch Everything, Then Prove It

In regulated environments, being secure isn't enough. You have to be able to reconstruct exactly what happened. Trace-and-replay tooling lets you rebuild a full multi-agent decision chain after the fact, stitching every model call, tool invocation, and handoff into one coherent story. The question shifts from "what did the agent say" to "what did the agent do, and why," which is the only version a regulator cares about.

One subtle warning if you adopt the popular pattern of using guardian agents to police your other agents. The guardian has to run on a different model from the agent it watches. If the actor and the overseer share a model, they share blind spots, and your oversight becomes theater. A guard who fails for the exact same reasons as the prisoner is not a guard.

What To Do Monday Morning

You don't need to boil the ocean. You need to start, and you need a sequence that your security and engineering teams can both sign off on.

Run a shadow-AI inventory. You cannot govern what you cannot see. Surface the non-human identities already operating in your environment before you do anything else.
Evict the everything-agent. Stop handing one monolithic agent your CRM, your email, and your databases at once. Break it into narrow agents with tightly scoped jobs and route between them deterministically.
Move identity to the infrastructure layer. Ban human-token sharing today, not next quarter. Give every agent a unique, short-lived cryptographic identity tied to a named human sponsor, using a platform like Entra Agent ID or SailPoint Agentic Fabric.
Sandbox by default. No agent executes on bare metal. Wrap every execution in kernel-level isolation or a disposable micro-VM so a compromise stays contained.
Vet the supply chain. Treat every third-party skill as guilty until scanned and signed. Never let an agent load an unverified skill off the open internet.
Make the audit trail cryptographic. For any action with financial or regulatory weight, log it in a form you can prove later, and tie it to the human who is accountable.

The agentic era is going to be spectacular, assuming we don't let an unverified agent set fire to the data center on the way to the future. Responsible AI has quietly stopped being a philosophical debate about fairness and become an infrastructure requirement with a budget line, an owner, and an audit.

The reframe worth holding onto is this. Zero trust is not the brake that slows you down. It's the brake that lets you go fast. Nobody drives a race car at 200mph because the brakes are weak. They do it because the brakes are extraordinary. So go ahead and unleash your agents. Just give them a verifiable identity, put them in a sandbox, and make sure that when someone asks who did that and on whose authority, you have an answer instead of a liability.

References

Introducing the Model Context Protocol, Anthropic
Linux Foundation launches the Agent2Agent (A2A) Protocol project, Linux Foundation
An Introduction to the OpenSSF Model Signing (OMS) Specification, Open Source Security Foundation
What is Microsoft Entra Agent ID?, Microsoft Learn
Build with Google Antigravity, our new agentic development platform, Google Developers Blog
AI Risk Management Framework, NIST
ISO/IEC 42001 Artificial intelligence management system, ISO

Contents