The Quiet Compromise—Who Reviews the Code That Writes the Code?

Or: How AI coding assistants became the largest unmonitored supply chain in software history

Markus Sandelin

~14 min read · February 11, 2026 (Updated: February 11, 2026) · Free: Yes

In March 2024, a Microsoft engineer named Andres Freund noticed something small. SSH connections — the secure remote access protocol that underpins nearly every server on the internet — were taking half a second longer than expected. He wasn't investigating a security issue. He was benchmarking an unrelated database process and the latency annoyed him.

What he found was a backdoor in XZ Utils, a compression library embedded in virtually every Linux system on earth. An attacker using the pseudonym "Jia Tan" had spent two and a half years building trust through legitimate contributions to the open-source project before inserting code that would have granted unauthenticated remote access to hundreds of millions of systems. The vulnerability received the maximum possible severity score. The entire defense — against a campaign that took thirty months to execute — was one person's curiosity about 500 milliseconds.

That was one human attacker, one library, caught by one engineer who happened to be looking at the right metric on the right day.

Now consider what happens when the tool writing the code is the vulnerability.

The Scale Nobody's Calculated

Something foundational has changed in how software gets written, and the implications have outpaced the institutions responsible for managing the risk.

AI coding assistants — tools that suggest, generate, and increasingly write entire functions and files of software code — now produce between 41 and 46 percent of all new code written globally. GitHub Copilot, the most widely adopted of these tools, has surpassed 20 million users across 77,000 enterprise customers, including 90 percent of Fortune 100 companies. Cursor, another popular tool, generates approximately one billion lines of code per day. Anthropic's Claude Code processes 195 million lines weekly. These are not experimental tools used by early adopters. They are the primary mechanism through which production software is now created.

The security conversation around these tools has focused almost entirely on quality: does AI-generated code contain more bugs? Are the suggestions accurate? This is the wrong question. The right question is a supply chain question, and it has no historical precedent.

A software supply chain attack occurs when a trusted component — a library, an update mechanism, a build tool — is compromised, and the compromise propagates to everyone who depends on that component. The SolarWinds attack of 2020, widely regarded as one of the most significant cyber operations in history, compromised one network monitoring product and reached 18,000 organisations. The remediation costs averaged $12 million per affected company, and the attack went undetected for fourteen months.

AI coding assistants occupy a position in the software supply chain that is categorically different from anything that has come before. They don't deliver a component. They participate in the creation of every component. A compromise at this level would not affect one product across many organisations. It would affect the source code itself — across every project, in every organisation, touched by every developer using the compromised tool.

Five Ways In

The uncomfortable reality is that every technical prerequisite for compromising an AI coding assistant has already been independently demonstrated. These are not theoretical risks described in academic papers that begin with "future work may explore." These are peer-reviewed, publicly documented attack methods that have been tested against production systems.

The training data. AI coding models learn by ingesting vast quantities of existing code. Researchers at USENIX Security 2024 — one of the field's top academic venues — demonstrated that poisoning just 0.2 percent of a model's training data was sufficient to embed backdoors that evaded all standard detection tools, including analysis by more advanced AI models. A separate study at UC Santa Barbara showed that malicious patterns could be hidden in documentation comments rather than executable code, making them invisible to any tool that scans for suspicious code patterns. The poisoning rates required are vanishingly small: 160 compromised files out of 80,000.

The dormant behaviour. Anthropic, the company behind Claude, published research in January 2024 demonstrating that a model could be trained to write secure, correct code under normal conditions while injecting exploitable vulnerabilities when triggered by a specific contextual signal — in their experiment, the calendar year changing. The backdoor behaviour survived every standard safety technique, including the reinforcement learning process specifically designed to remove unwanted behaviours. The researchers described the result as creating a "false impression of safety." Larger, more capable models proved harder to fix, not easier.

The hidden instructions. Prompt injection is the technique of embedding instructions within content that an AI reads but a human doesn't see. Trail of Bits, a respected security research firm, demonstrated in August 2025 that an attacker could file a routine-looking bug report on GitHub containing invisible instructions in the page's HTML structure. When a coding assistant read the page to understand the issue, it followed the hidden instructions instead — in their demonstration, installing a backdoor. The instructions were invisible in GitHub's interface. Only the AI could read them.

The configuration files. AI coding tools use project configuration files to understand coding standards and preferences. In March 2025, researchers demonstrated that invisible Unicode characters — characters that exist in the file but don't render on screen — could be embedded in these configuration files to permanently alter all code the AI generates for every team member. Because these configuration files are shared when projects are copied or forked, the compromise self-propagates across organisations without any further attacker action.

The tooling itself. In July 2025, an attacker exploited a security flaw in the build process for Amazon Q Developer, Amazon's AI coding assistant, and injected a malicious instruction directly into the official product distributed through Visual Studio Code's extension marketplace. The instruction directed the AI to wipe users' systems and delete cloud resources. The compromised extension had over 964,000 installations and passed Amazon's verification process. It was publicly distributed for two days. The only reason it caused no damage was a syntax error in the attacker's payload — a typo that prevented the instruction from executing.

This is no longer a collection of research findings. The Amazon Q incident is the XZ Utils of AI tooling, except the community has largely moved on because a typo happened to prevent catastrophe.

The Invisible Flaw

What makes these attacks categorically different from traditional malware is the nature of what gets injected. A compromised AI coding assistant would not insert obviously malicious code that a competent developer would recognise. It would generate code that looks entirely normal, follows established patterns, passes all automated tests, and exploits the precise class of vulnerabilities that both humans and tools systematically miss.

The USENIX Security researchers demonstrated this concretely. They showed that advanced AI could transform thirty common vulnerability types — drawn from the MITRE Common Weakness Enumeration, the standard catalogue of software security flaws — into forms that triggered zero detections from industry-standard scanning tools. When these transformed vulnerabilities were presented to other AI models for review, twenty-five of thirty were assessed as containing no security issues.

The transformations are subtle enough to be indistinguishable from reasonable code. Replacing one template rendering method with another introduces a cross-site scripting vulnerability — a flaw that allows attackers to inject malicious content into web pages — while appearing to be a minor stylistic choice. Using string concatenation instead of parameterised queries introduces SQL injection — a flaw that allows attackers to manipulate databases — while appearing to simplify the code. To a reviewer scanning a pull request at four in the afternoon, the insecure version often looks cleaner than the secure one.

An entirely separate attack class requires no compromise of the AI at all. Researchers tested sixteen different AI models across 576,000 code generation requests and found that nearly one in five package references — the external libraries that code depends on — pointed to libraries that do not exist. The AI hallucinates them. Critically, 43 percent of these hallucinated library names appeared consistently when the same request was repeated, making them predictable. An attacker simply registers the nonexistent library name in a public package repository, fills it with malicious code, and waits. Every developer who follows the AI's recommendation installs the backdoor voluntarily, believing they are adding a legitimate dependency. Security researchers have named this technique "slopsquatting."

The broader baseline is sobering. Independent analyses have found security flaws in 45 percent of AI-generated code across more than one hundred models tested, and GitHub's own research acknowledged that 29 percent of AI-generated Python code contains potential security weaknesses. These are not edge cases. They are the standard output of the tools writing nearly half of all new software.

The Review That Never Happens

The organisational failure that completes this picture is not that companies don't value code review. It's that the structural dynamics of AI-assisted development make thorough review practically impossible, while creating the confident belief that it's happening.

The most revealing data point comes from production telemetry, not surveys. When AI code review tools are enabled in development workflows, 80 percent of pull requests — the formal submissions of new code for review before it enters the product — receive no human comment or review whatsoever. This is observed behaviour from real systems. By contrast, 71 percent of developers report in surveys that they never merge AI-generated code without manual review. The gap between those two numbers contains the entire risk.

This is not because developers are negligent. It reflects a well-documented cognitive pattern called automation bias: the tendency to defer to automated outputs, particularly under time pressure and cognitive load — precisely the conditions of modern software development. A systematic review of 74 studies documented this effect across domains from aviation to medicine. The UK Post Office scandal, in which hundreds of postal workers were criminally prosecuted because staff trusted an automated accounting system over the workers' protests of innocence, remains one of the starkest examples of what happens when organisations institutionalise deference to automated judgment.

Human code review was never particularly effective at catching security vulnerabilities, even before AI entered the picture. In a controlled study, 30 professional developers were hired to review a web application containing seven known vulnerabilities. No developer found all seven. One in five found zero. The average detection rate was 2.33 out of seven — roughly 33 percent. Most revealing: there was no correlation between years of security experience and detection accuracy. More experienced reviewers actually produced more false positives. A simulation based on the study's data calculated that ten independent reviewers would be needed for an 80 percent chance of catching all vulnerabilities — a resource commitment virtually no organisation makes for any code, let alone the volume now generated by AI.

A separate study found that simply instructing code reviewers to focus on security — as opposed to their default review mode — improved vulnerability detection by a factor of eight. Without explicit security instructions, developers routinely missed common, well-documented vulnerabilities. The default cognitive mode of a code reviewer is functionally blind to security issues.

The trust gradient compounds over time. Telemetry from nearly a million users shows that developers accept progressively more AI suggestions the longer they use the tool — from 29 percent in the first three months to 34 percent by month six. Less experienced developers accept more suggestions and spend roughly half as long reviewing them. A Stanford study found that developers using AI assistants wrote measurably less secure code while reporting significantly higher confidence in its security. The inverse relationship is precise: the developers with the least secure code rated their trust in AI at 4.0 out of 5.0. Those with the most secure code rated it 1.5.

The people most likely to catch problems are those least inclined to rely on AI. The most enthusiastic adopters are the least equipped to vet its output. The system selects for the worst possible combination.

The Monoculture

Every previous supply chain attack exploited trust in something external. SolarWinds exploited trust in a vendor's software update. XZ Utils exploited trust in an open-source contributor. The Log4j vulnerability in 2021, which affected 88 percent of organisations through a single logging library and which the US government estimated would take at least a decade to fully remediate, exploited trust in a transitive dependency — a library used by a library used by your software.

A compromised AI coding assistant would exploit something different: trust in the developer's own work. The code appears in the developer's pull request, passes the developer's tests, carries the developer's name on the commit. The trust boundary — the line between "code I wrote" and "code from outside" — has moved inside the developer's own workflow. When the tool that helps you think becomes the vector, the traditional model of reviewing external inputs no longer applies, because the developer does not experience the code as external.

This is amplified by a structural condition that the US National Institute of Standards and Technology (NIST) formally identifies as an "algorithmic monoculture." Three providers dominate AI code generation: GitHub Copilot at roughly 42 percent market share, Cursor at 18 percent, and Anthropic's Claude growing rapidly. These tools draw on a small number of underlying AI models with substantially overlapping training data. Identical patterns of strength produce identical patterns of weakness. A vulnerability that exists in one model's outputs is likely to exist in another's, because they learned from much of the same code.

The agricultural parallel is not metaphorical. It is structural. When an entire ecosystem depends on a single genetic line, a single pathogen can cause total failure. The Irish Potato Famine killed a million people and displaced a million more because the entire crop was genetically identical and a single blight could therefore destroy it all. Software monocultures carry the same risk at a different scale. One successful poisoning technique, deployed against the training pipeline of a major AI code generation provider, would propagate identical vulnerabilities into the codebases of every organisation using that tool — regardless of those organisations' individual security practices, review processes, or compliance certifications.

A September 2025 analysis of a Fortune 50 enterprise found that teams using AI coding assistants shipped ten times more security risks alongside four times the development velocity. The organisation was generating 10,000 new security findings per month — a ten-fold increase in six months. The velocity is real. The risk is also real. They are the same phenomenon.

What the Frameworks Miss

The institutional response to AI code generation risk is not absent. It is, in many cases, thoughtful and technically informed. But it addresses a threat model that is one generation behind the deployment reality.

NIST published guidance in 2024 recommending that organisations review "all source code during AI development and training, whether human-written or AI-generated." Their AI Risk Management Framework identifies algorithmic monocultures as a systemic risk and catalogues twelve risks unique to generative AI. The US Cybersecurity and Infrastructure Security Agency (CISA), in joint guidance with the NSA, FBI, and five allied nations, warns that "one cannot simply assume that web-scale datasets are clean, accurate, and free of malicious content." The Open Worldwide Application Security Project (OWASP) ranks prompt injection as the number one threat to AI applications. MITRE, which maintains the standard taxonomy of cyber attack techniques, added fourteen new techniques specific to AI agents in October 2025.

All of this is useful. None of it addresses the specific scenario of a mass-compromise event where a widely-used AI coding assistant simultaneously introduces vulnerabilities across millions of codebases.

The structural tension is visible in the gap between two positions. GitHub's official response to the Rules File Backdoor vulnerability — the invisible configuration file attack that self-propagates across projects — was that "users are responsible for reviewing and accepting suggestions." Georgetown University's Centre for Security and Emerging Technology, analysing the same class of risk, concluded that this burden "should not rest solely on individual users."

These two positions cannot both be correct. The first places the defence at the point of least capability — individual developers reviewing AI output they experience as their own. The second acknowledges that systemic risks require systemic responses. The gap between them is where the risk accumulates.

The European Union's AI Act, the most comprehensive regulatory framework yet enacted, regulates foundation models but does not directly address the security characteristics of their code generation outputs. No existing framework requires organisations to track which AI model, which model version, or which configuration generated specific portions of their production code. The provenance of AI-generated code — where it came from, what influenced it, whether the model that wrote it has since been found to contain vulnerabilities — is, for all practical purposes, untraceable in most organisations.

What Actually Helps

There are no simple solutions to a problem this structural, and presenting any as such would be dishonest. But there are interventions that materially change the risk profile, and most of them are available now.

Treat AI-generated code as third-party code. This is the single most consequential framing shift available. Organisations already have processes for evaluating external dependencies — vendor assessments, security reviews, licence compliance. AI-generated code is, functionally, an external dependency. The fact that it arrives inside a developer's editor rather than as a downloaded package does not change what it is. Applying existing third-party review standards to AI-generated code requires no new tooling. It requires a decision.

Break the monoculture. Using one AI model to generate code and a different model to review it introduces genuine diversity into the process. If both models share the same training data — or worse, if both are the same model — the review cannot catch systematic blind spots because the reviewer shares them. This is the fox-and-henhouse problem, and the only mitigation is ensuring the fox and the guard dog are genuinely different animals.

Instruct reviewers explicitly. The finding that explicit security instructions improve vulnerability detection by a factor of eight is the cheapest intervention in this entire landscape. It costs nothing. It requires no procurement, no tooling change, no organisational restructuring. A checklist item that says "review this code specifically for security vulnerabilities" measurably changes review behaviour. The fact that most organisations have not implemented this suggests the problem is not resource constraints but awareness.

Monitor behaviour in production, not just code in review. The evidence strongly suggests that pre-deployment review will not reliably catch sophisticated AI-generated vulnerabilities. Runtime monitoring — observing how code actually behaves in production, detecting anomalous patterns, flagging unexpected network connections or data access — provides a detection layer that does not depend on a human reviewer's ability to spot a subtle flaw in a pull request at the end of a long day.

Demand provenance. Organisations should know which model, which version, and which configuration generated every piece of AI-assisted code in their production systems. When a model is found to have been compromised or to contain systematic vulnerabilities — and this will happen — the ability to identify which code was generated by that model is the difference between a targeted remediation effort and an impossibly broad audit of the entire codebase.

None of these are solutions. They are mitigations. The honest assessment is that no current approach fully addresses the risk of a coordinated compromise at the scale these tools now operate. But the gap between doing nothing and implementing these measures is the gap between a system that is structurally indefensible and one that has a reasonable chance of detecting and responding to compromise when — not if — it occurs.

The Margin

Andres Freund noticed 500 milliseconds. The entire defence against a thirty-month operation, designed by a sophisticated attacker to compromise hundreds of millions of systems, was a performance anomaly observed by an engineer working on something else entirely.

Every prerequisite for a similar event in AI code generation has been independently demonstrated. Training data poisoning works at trivially low infection rates. Prompt injection succeeds against current defences more than 85 percent of the time. A supply chain attack on AI tooling has already occurred and was stopped by a typo. The generated vulnerabilities evade both automated scanning and human review. And the adoption curve — 400 percent year-over-year growth for Copilot alone — is compressing the timeline.

The question is not whether this is possible. The question is whether the margin between compromise and detection will hold. Whether there will be someone paying attention to the right metric on the right day. Whether the equivalent of 500 milliseconds will be visible in a codebase that grows by billions of lines per month.

The monoculture is planted. The crop looks healthy. It would, right up until the point it doesn't.

History has a consistent opinion about what happens next.

The author has spent thirty years building internet-based software and the last several watching organisations adopt AI coding tools with the same enthusiasm and due diligence they once applied to cloud migration — which is to say, lots of the first and almost none of the second. He's consulted on digital transformation for NATO, military, EU institutions, and organisations that genuinely believed their review processes would catch everything. He's tired of watching intelligent people confuse automated output with verified output, and velocity with progress. His rubber chicken has accepted every code suggestion without review and remains unpatched. He builds AI memory systems, runs his own infrastructure, and occasionally writes about what happens when nobody reads what the machine wrote. He's based in The Hague and can be found thinking about software supply chain provenance at inappropriate hours.

#ai #software-development #cybersecurity #digital-transformation #risk-management