The Evolution of AgentArmor: Building a Zero-Trust Gateway for Agentic AI

How a simple LLM proxy evolved into a two-layer defense-in-depth system, neutralizing MCP risks, credential leaks, and document-borne prompt injections.

When I first introduced the concept of a dedicated AI security proxy back in April, the premise was straightforward: AI agents should not have unsupervised access to the internet. We were — and many still are — mixing system instructions, untrusted user input, and external tool outputs into a single context window, simply hoping the underlying model would behave.

But the landscape of agentic AI is accelerating, and AgentArmor has had to evolve rapidly to keep pace. Over the course of my career scaling security architectures and leading transformations across environments like Mandiant, Salesforce, and Atlassian, I've learned one immutable truth: bolting security on as an afterthought never works. Whether you are building an enterprise SAST pipeline or securing an autonomous LLM, true resilience requires secure-by-default platforms, an assumed-breach methodology, and automated remediation.

That philosophy has driven a massive architectural shift in this project. Today, as part of the broader aiarmor.org initiative, AgentArmor has matured from a conceptual L7 proxy into a comprehensive, zero-trust gateway for LLM-powered applications.

Here is a deep dive into how AgentArmor has grown, the technical mechanics of the new architecture, and how it actively closes the gaps in modern agentic systems.

1. The Two-Layer Defense: Merging L7 Inspection with L3/L4 Egress Control

In its initial iteration, AgentArmor primarily acted as a semantic traffic cop. It sat at the application layer (L7), scanning prompts and responses for injection attempts or exfiltration patterns. However, application-layer inspection alone is insufficient when dealing with autonomous agents capable of lateral network movement. If an attacker successfully bypasses the L7 prompt filter, the agent could potentially reach out to arbitrary C2 servers or pivot into internal infrastructure.

AgentArmor now operates as a true two-layer security proxy:

The L7 Semantic Engine: It continues to provide deep packet inspection for LLM traffic, evaluating the context window for jailbreaks and enforcing data loss prevention (DLP) rules.
The L3/L4 Network Firewall: We have paired the semantic engine with an integrated iptables-level egress controller. This enforces strict, network-level routing. Even if an agent goes rogue, the network layer physically prevents it from opening unauthorized connections, ensuring that tool execution is bounded by a strict allowlist.

2. Taming the Model Context Protocol (MCP)

The industry is rapidly standardizing around the Model Context Protocol (MCP) to give agents universal access to tools and data sources. While MCP is fantastic for interoperability, it inadvertently creates massive attack surfaces. We are essentially handing agents the keys to the kingdom.

To address this, AgentArmor has introduced robust MCP Credential Brokering and Config-Hardening:

Zero-Trust Tooling & Credential Brokering

Historically, developers injected live API keys directly into the agent's environment variables. If the agent was compromised, the keys were easily exfiltrated. AgentArmor completely decouples credentials from the agent. The LLM only receives short-lived, dummy tokens. When the agent attempts a tool call, AgentArmor intercepts the request, resolves the intended action against a secure internal registry, and dynamically injects the actual managed credentials before forwarding the request to the MCP server. If the action falls outside the permitted scope, it drops the request instantly.

Automated Config Scanning

On every policy load, AgentArmor now audits all connected MCP server configurations. If a server is dangerously exposed — such as binding to 0.0.0.0 without authentication or utilizing .. path traversals in its local file access schemas—the system automatically quarantines the server, severing the attack path before a single token is generated.

3. Neutralizing Document-Borne Injections with `doc2md`

One of the most dangerous, yet frequently overlooked, attack vectors in Retrieval-Augmented Generation (RAG) and agentic workflows is unstructured file parsing. Threat actors are actively hiding malicious instructions inside the formatting, metadata, and binary headers of PDFs, Word documents, and Excel sheets. These embedded payloads easily bypass traditional string-matching defenses.

To close this gap, AgentArmor now natively integrates with doc2md. We have implemented a strict "Convert-and-Scan" lifecycle:

Stripping the Payload: Before any uploaded file reaches the LLM, doc2md physically strips away embedded macros, binary headers, and complex tracking metadata.
Standardization: The file is converted into clean, sanitized Markdown.
Runtime Evaluation: AgentArmor then executes its L7 prompt evaluation on this standardized, plaintext format.

This process completely neutralizes binary-upload injection bypasses while simultaneously cutting down on LLM token usage by removing bloated formatting.

4. Operationalizing "Assume Breach"

We must build AI systems under the assumption that the session will eventually be compromised. To move from theoretical security to active defense, AgentArmor now heavily operationalizes the "Assume Breach, Survive, and Repave" methodology.

GoalLock Canaries: AgentArmor injects cryptographic canary tokens directly into the system prompts at runtime. If the LLM ever echoes that specific token in its output, it provides definitive proof of context exfiltration or goal hijacking, triggering an immediate block.
Agent Threat Rules (ATR) Integration: We've baked in a comprehensive ATR corpus — providing out-of-the-box, vendor-neutral detection for tool poisoning, context exfiltration, and privilege escalation. You don't need to write regex from scratch; the platform defends against known attack patterns on day one.
Automated Session Repaving: If an anomaly threshold is crossed or an ATR is triggered, AgentArmor executes an automated session kill switch in under a second. It destroys the compromised context window and rebuilds the session from a known-good state, preventing the agent from carrying out chained attacks.

The Road Ahead

When we build autonomous systems, we are delegating trust. AgentArmor is designed to continuously and aggressively verify that trust. Whether you are running a single local Ollama container for a development proof-of-concept, or deploying a fleet of agents across an enterprise AWS environment, the core engine remains consistent.

The project has evolved from a simple concept into a robust, two-layer gateway. It is entirely open-source, and I invite the community to tear it apart, test it, and contribute.

Check out the latest release on the AgentArmor GitHub repository or learn more at aiarmor.org, and let me know what you think.

Because at the end of the day, your AI agent shouldn't be wandering the internet alone.

Contents