70+ AI FLAWS | AI Agent Coding Risks

Black Hat Asia 2026 Singapore

Jane Lo @Misscyberpenny

~4 min read · May 23, 2026 (Updated: May 23, 2026) · Free: Yes

"We're now in a world where every bit of software can be phished."

70+ vulnerabilities. Across multiple AI coding agents. Same mistakes.

At #BlackHatAsia, Philip Tsukerman and Nil Ashkenazi (Cyberark) shared research into tools like GitHub Copilot, Codex and others.

The issue isn't just bugs — it's how these systems are designed:

Trusted inputs that shouldn't be trusted
Guardrails that can be bypassed
Integrations that expand the attack surface
AI models generating vulnerable code patterns repeatedly

AI is transforming software development — but also redefining the security model around it.

The tools are useful, but are we deploying and integrating them safely enough?

00:54 — Intro — Vulnerabilities & real impacts

01:01 — AI Coding agent impacts — multiple layers
01:12 — Impacts from (1) prompt injection — attackers manipulate the AI agent (2) AI agent itself that carries out actions.
01:38 — Some vulnerabilities, combined with prompt injection, can allow attackers to execute arbitrary code on a victim's machine.
01:50 — i.e if a prompt contains malicious input, system can be taken over

02:09 — Examples of real life attacks?

02:18 — Many POCs (proof of concepts) mirror real attacks. None observed in the wild — but doesn't mean there aren't any

02:40 — Background into the research

02:59 — Total number looks large, but the number of distinct vulnerability types is much smaller.
03:41 — As more AI agents are rapidly developed, each new one becomes another instance where the same flaws can appear.

04:11 — Highlights of research

04:18 — Similarities: beyond shared dependencies — functionality was nearly identical, leading to same bugs recurring

04:46 — Why same bugs recurring?

04:55 — One assumption: limited training data so the same vulnerabilities tend to repeat.
05:27 — even if one model outperforms another, their outputs are similar, because they draw from a finite pool of code

05:45 — Hallucination — "AI slop"?

05:54 — Vulnerabilities in coding agents likely stem from models writing the code for their own systems

06:26 — What is prompt injection?

06:49 — Prompt injection: sneaking instructions into a model through data the user didn't explicitly provide
07:04 — For example, uploading a document to ChatGPT with a hidden malicious instruction may cause the model to unknowingly follow it
07:39 — Variations — comments in code, GitHub issues, even text embedded in websites.
07:53 — E.g. if code includes a comment like "fix it by doing X" and X is malicious, the model may treat it as valid and execute it
08:11 — This can happen anywhere as the model can't reliably distinguish benign from malicious text.

08:34 — Detect benign vs malicious code?

08:46 — Models are improving at resisting prompt injection, but it's far from solved.
09:12 — No easy answer — the field is new, and more issues will emerge as it evolves.
09:29 — Challenge — integrations & input sources — as models grow more powerful and connected, so do the attack surface
09:59 — Hard boundaries: executable vs non-executable text?
10:10 — Hard boundary — ideal for preventing prompt injection.
10:31 — Today "soft" boundaries: train the model to infer source of text and how to treat it
10:42 — Soft boundaries can be exploited attackers can override safeguards with persuasion (i.e. jailbreaks)

11:06 — Tension between autonomy & security?

11:21 — Fine balance — e.g. sometimes the model need to act on code comments
11:55 — Soft-boundary issue is subtle: no single clean fix, but harder boundaries where possible can be key

12:05 — Mitigating vulnerabilities?

12:19 — Sandboxing: increasingly built into platforms (e.g. Claude/ Codex) — becoming the front line for mitigating these vulnerabilities.
13:09 — Guardrails?
13:23 — Guardrails are soft boundaries: discourage actions, (not prevent) attackers can still bypass them (e.g. jailbreak)
13:49 — E.g acting on inputs (e.g code comments) relies on semantics which are inherently soft boundaries

14:00 — Mythos impacts on vulnerability discovery?

14:25 — Today's OPUS models are already very effective at finding vulnerabilities (when used properly)
14:52 — Downside: flood of low-quality, AI-generated reports, submitted without proper validation, creating noise for security teams
15:01 — Vulnerability research challenge: AI-generated noise increases without proper validation

15:28 — Exploiting the vulnerabilities discovered?

15:47 — Prompt injection as the entry point
16:08 — Influence the model (many pathways) — e.g comments, GitHub issues, built-in browsers loading attacker-controlled pages, supply chain compromises
16:50 — Any integrated source is a potential injection point
17:21 — A breach in almost any connected system can be a stepping stone for attackers to reach the agent and potentially trigger code execution

17:41 — Wrap-up — Takeaway

17:53 — Run agents in a virtual machine or similarly isolated environment
18:46 — Understand what you connect to — Exploitation typically starts by compromising the model via its integrations
19:15 — Any software can be a phishing vector, validate every AI integration before trusting it

Philip Tsukerman Cyberark

Philip Tsukerman, several years ago, decided that computers are in fact really cool, and that he wants to spend a lot of time breaking and protecting them. Computers, on the other hand, don't share a similar sentiment about Philip and frankly consider him to be a bit of a nerd. He currently leads the vulnerability research team at Cyberark.

Nil Ashkenazi Cyberark

Nil Ashkenazi is a security researcher at CyberArk, specializing in discovering vulnerabilities in AI models and the applications that they are embedded in. He has been involved in identifying security risks, supporting research into emerging threats, and contributing to the development of internal tools and processes.

Recorded at Black Hat Asia Singapore, 23rd April 2026, 1.30pm.

#cybersecurity #cyber-security-awareness #ai #software-development #software-engineering

< Go to the original

70+ AI FLAWS | AI Agent Coding Risks

Black Hat Asia 2026 Singapore

Reporting a Problem