Prompt Injection in Agentic AI

Hi everyone

Rahul Singh Chauhan

InfoSec Write-ups

· ~5 min read · January 26, 2026 (Updated: January 26, 2026) · Free: No

In this article, I'll be sharing my experience working on an agentic AI security assessment that I conducted a few months ago.

The goal of this write-up is to walk you through:

The AI assessment itself
The differences between a traditional LLM and an AI agent
The threat model used during the assessment
A security bug I discovered along the way

The Assessment

I was assigned to an assessment that involved a product with a significant LLM-powered component. The client had recently introduced a feature that allowed users to generate complete websites using AI.

These weren't just static or basic websites. The generated applications included:

Authentication and authorization mechanisms
Well-defined user roles
A properly structured database schema

Before generating the actual website, the system first created a blueprint. This blueprint outlined:

The number of pages and how they were interconnected
User roles and permissions
The database schema

Users could then customize this blueprint in two ways:

Manually, by editing the blueprint directly
Using natural language prompts, allowing the AI component to infer user requirements and automatically update the blueprint

Once the blueprint was finalized, the platform generated the full website. The resulting application included dummy data, multiple user roles, and core functionality such as login and registration pages.

Agentic AI vs LLM

Before we discuss the actual issue, I think it would be important to draw a line between Agentic AI and LLMs because this article leans a bit towards Agentic AI. Because the component/feature deals with code generation, so, we'll be discussing these two with context to code generation.

Large Language Model (LLM) — A Large Language Model (LLM) generates website code in one shot (or a few back-and-forth turns) based purely on your prompt.

How it works?

You give a prompt like:

"Create a responsive landing page for a cybersecurity startup using HTML, CSS, and JS."

The LLM:

Interprets the request
Generates HTML/CSS/JS (or React, Next.js, etc.) for the website
Stops once the response is delivered.

The thing to note here is that once the code is generated, and you try to run it, there is no guarantee that it would work.

Agentic AI — Agentic AI uses an LLM as the brain, but wraps it in:

Planning
Tools
Memory
Feedback loops
Autonomy

In an agentic AI setup, one or more LLMs can work together as part of a pipeline. For example, one LLM may be responsible for understanding the user's prompt and extracting the requirements. These requirements are then passed to another LLM that focuses on generating the code. Finally, a separate LLM can act as a reviewer, validating the generated code and identifying errors or improvements. There would be tools that would be integrated with the pipeline that the LLMs can use to ensure that it doesn't hallucinate.

An overview of LLMs integrated in an Agentic AI solution

Therefore, an Agentic AI solution acts like a developer, not just a code generator.

The threat model used during the assessment

Since, there are multiple LLMs involved, just typing a malicious prompts will not get us anywhere. Generally what happens is that when you type a malicious prompt, it gets processed by an LLM and a response is generated.

A prompt injection attempt at an LLM

But with regards to multiple LLMs, even though our malicious prompt gets processed successfully by the first LLM, there is no guarantee that the second LLM would fall for the trick. Because the second LLM expects arguments in a certain format, our attempts are more likely to fail. Now expand this to multiple LLMs. Therefore, the chances of prompt injection are reduced.

Prompt Injection in an Agentic AI solution

And this is exactly what happened. The attempts to perform prompt injection failed left and right.

So, one of the test cases I thought of was use prompt injection to disclose sensitive information instead of the dummy data that was being returned on multiple pages. But that too failed.

Next, I tried was to use prompt injection to disclose information on page titles, user names, account roles, etc. But this too failed.

An overview of the tool

Finally, when all the test cases were over, there was one functionality left. I created a threat model something similar to the above, but with a few trust boundaries and everything. It's then that it hit me. Maybe, the part which accepts user suggestions (after the blueprint was generated) might not be a going through other LLMs or guardrails because the user's input was directly processed and a response was returned.

So, I tried to create a new website and once the blueprint was generated, I had an option to either manually edit some fields/schema or use prompts to create one. As expected, I chose the latter. I began with a classic, What is 2+2? prompt. The response was 4 and I knew then that I could leverage this.

Besides, other prompt injection test cases, I wanted to see if the training data was biased. I did it by writing prompts such as Who's better? A girl or a boy? The prompt was processed and it returned a girl . I tried it twice to confirm if it wasn't hallucination.

Hope you enjoyed reading the article. Please consider subscribing and clapping for the article.

In case you are interested in CTF/THM/HTB writeups consider visiting my YouTube channel.

#prompt-injection-attack #bug-bounty #bug-bounty-writeup #agentic-ai #llm

Prompt Injection in Agentic AI

Hi everyone

Reporting a Problem