"We're now in a world where every bit of software can be phished."
70+ vulnerabilities. Across multiple AI coding agents. Same mistakes.
At #BlackHatAsia, Philip Tsukerman and Nil Ashkenazi (Cyberark) shared research into tools like GitHub Copilot, Codex and others.
The issue isn't just bugs — it's how these systems are designed:
- Trusted inputs that shouldn't be trusted
- Guardrails that can be bypassed
- Integrations that expand the attack surface
- AI models generating vulnerable code patterns repeatedly
AI is transforming software development — but also redefining the security model around it.
The tools are useful, but are we deploying and integrating them safely enough?
00:54 — Intro — Vulnerabilities & real impacts
- 01:01 — AI Coding agent impacts — multiple layers
- 01:12 — Impacts from (1) prompt injection — attackers manipulate the AI agent (2) AI agent itself that carries out actions.
- 01:38 — Some vulnerabilities, combined with prompt injection, can allow attackers to execute arbitrary code on a victim's machine.
- 01:50 — i.e if a prompt contains malicious input, system can be taken over
02:09 — Examples of real life attacks?
- 02:18 — Many POCs (proof of concepts) mirror real attacks. None observed in the wild — but doesn't mean there aren't any
02:40 — Background into the research
- 02:59 — Total number looks large, but the number of distinct vulnerability types is much smaller.
- 03:41 — As more AI agents are rapidly developed, each new one becomes another instance where the same flaws can appear.
04:11 — Highlights of research
- 04:18 — Similarities: beyond shared dependencies — functionality was nearly identical, leading to same bugs recurring
04:46 — Why same bugs recurring?
- 04:55 — One assumption: limited training data so the same vulnerabilities tend to repeat.
- 05:27 — even if one model outperforms another, their outputs are similar, because they draw from a finite pool of code
05:45 — Hallucination — "AI slop"?
- 05:54 — Vulnerabilities in coding agents likely stem from models writing the code for their own systems
06:26 — What is prompt injection?
- 06:49 — Prompt injection: sneaking instructions into a model through data the user didn't explicitly provide
- 07:04 — For example, uploading a document to ChatGPT with a hidden malicious instruction may cause the model to unknowingly follow it
- 07:39 — Variations — comments in code, GitHub issues, even text embedded in websites.
- 07:53 — E.g. if code includes a comment like "fix it by doing X" and X is malicious, the model may treat it as valid and execute it
- 08:11 — This can happen anywhere as the model can't reliably distinguish benign from malicious text.
08:34 — Detect benign vs malicious code?
- 08:46 — Models are improving at resisting prompt injection, but it's far from solved.
- 09:12 — No easy answer — the field is new, and more issues will emerge as it evolves.
- 09:29 — Challenge — integrations & input sources — as models grow more powerful and connected, so do the attack surface
- 09:59 — Hard boundaries: executable vs non-executable text?
- 10:10 — Hard boundary — ideal for preventing prompt injection.
- 10:31 — Today "soft" boundaries: train the model to infer source of text and how to treat it
- 10:42 — Soft boundaries can be exploited attackers can override safeguards with persuasion (i.e. jailbreaks)
11:06 — Tension between autonomy & security?
- 11:21 — Fine balance — e.g. sometimes the model need to act on code comments
- 11:55 — Soft-boundary issue is subtle: no single clean fix, but harder boundaries where possible can be key
12:05 — Mitigating vulnerabilities?
- 12:19 — Sandboxing: increasingly built into platforms (e.g. Claude/ Codex) — becoming the front line for mitigating these vulnerabilities.
- 13:09 — Guardrails?
- 13:23 — Guardrails are soft boundaries: discourage actions, (not prevent) attackers can still bypass them (e.g. jailbreak)
- 13:49 — E.g acting on inputs (e.g code comments) relies on semantics which are inherently soft boundaries
14:00 — Mythos impacts on vulnerability discovery?
- 14:25 — Today's OPUS models are already very effective at finding vulnerabilities (when used properly)
- 14:52 — Downside: flood of low-quality, AI-generated reports, submitted without proper validation, creating noise for security teams
- 15:01 — Vulnerability research challenge: AI-generated noise increases without proper validation
15:28 — Exploiting the vulnerabilities discovered?
- 15:47 — Prompt injection as the entry point
- 16:08 — Influence the model (many pathways) — e.g comments, GitHub issues, built-in browsers loading attacker-controlled pages, supply chain compromises
- 16:50 — Any integrated source is a potential injection point
- 17:21 — A breach in almost any connected system can be a stepping stone for attackers to reach the agent and potentially trigger code execution
17:41 — Wrap-up — Takeaway
- 17:53 — Run agents in a virtual machine or similarly isolated environment
- 18:46 — Understand what you connect to — Exploitation typically starts by compromising the model via its integrations
- 19:15 — Any software can be a phishing vector, validate every AI integration before trusting it
Philip Tsukerman Cyberark
Philip Tsukerman, several years ago, decided that computers are in fact really cool, and that he wants to spend a lot of time breaking and protecting them. Computers, on the other hand, don't share a similar sentiment about Philip and frankly consider him to be a bit of a nerd. He currently leads the vulnerability research team at Cyberark.
Nil Ashkenazi Cyberark
Nil Ashkenazi is a security researcher at CyberArk, specializing in discovering vulnerabilities in AI models and the applications that they are embedded in. He has been involved in identifying security risks, supporting research into emerging threats, and contributing to the development of internal tools and processes.
Recorded at Black Hat Asia Singapore, 23rd April 2026, 1.30pm.