In this article, I'll be sharing my experience working on an agentic AI security assessment that I conducted a few months ago.
The goal of this write-up is to walk you through:
- The AI assessment itself
- The differences between a traditional LLM and an AI agent
- The threat model used during the assessment
- A security bug I discovered along the way
The Assessment
I was assigned to an assessment that involved a product with a significant LLM-powered component. The client had recently introduced a feature that allowed users to generate complete websites using AI.
These weren't just static or basic websites. The generated applications included:
- Authentication and authorization mechanisms
- Well-defined user roles
- A properly structured database schema
Before generating the actual website, the system first created a blueprint. This blueprint outlined:
- The number of pages and how they were interconnected
- User roles and permissions
- The database schema
Users could then customize this blueprint in two ways:
- Manually, by editing the blueprint directly
- Using natural language prompts, allowing the AI component to infer user requirements and automatically update the blueprint
Once the blueprint was finalized, the platform generated the full website. The resulting application included dummy data, multiple user roles, and core functionality such as login and registration pages.
Agentic AI vs LLM
Before we discuss the actual issue, I think it would be important to draw a line between Agentic AI and LLMs because this article leans a bit towards Agentic AI. Because the component/feature deals with code generation, so, we'll be discussing these two with context to code generation.
Large Language Model (LLM) — A Large Language Model (LLM) generates website code in one shot (or a few back-and-forth turns) based purely on your prompt.
How it works?
You give a prompt like:
"Create a responsive landing page for a cybersecurity startup using HTML, CSS, and JS."
The LLM:
- Interprets the request
- Generates HTML/CSS/JS (or React, Next.js, etc.) for the website
- Stops once the response is delivered.
The thing to note here is that once the code is generated, and you try to run it, there is no guarantee that it would work.
Agentic AI — Agentic AI uses an LLM as the brain, but wraps it in:
- Planning
- Tools
- Memory
- Feedback loops
- Autonomy
In an agentic AI setup, one or more LLMs can work together as part of a pipeline. For example, one LLM may be responsible for understanding the user's prompt and extracting the requirements. These requirements are then passed to another LLM that focuses on generating the code. Finally, a separate LLM can act as a reviewer, validating the generated code and identifying errors or improvements. There would be tools that would be integrated with the pipeline that the LLMs can use to ensure that it doesn't hallucinate.

Therefore, an Agentic AI solution acts like a developer, not just a code generator.
The threat model used during the assessment
Since, there are multiple LLMs involved, just typing a malicious prompts will not get us anywhere. Generally what happens is that when you type a malicious prompt, it gets processed by an LLM and a response is generated.

But with regards to multiple LLMs, even though our malicious prompt gets processed successfully by the first LLM, there is no guarantee that the second LLM would fall for the trick. Because the second LLM expects arguments in a certain format, our attempts are more likely to fail. Now expand this to multiple LLMs. Therefore, the chances of prompt injection are reduced.

And this is exactly what happened. The attempts to perform prompt injection failed left and right.
So, one of the test cases I thought of was use prompt injection to disclose sensitive information instead of the dummy data that was being returned on multiple pages. But that too failed.
Next, I tried was to use prompt injection to disclose information on page titles, user names, account roles, etc. But this too failed.

Finally, when all the test cases were over, there was one functionality left. I created a threat model something similar to the above, but with a few trust boundaries and everything. It's then that it hit me. Maybe, the part which accepts user suggestions (after the blueprint was generated) might not be a going through other LLMs or guardrails because the user's input was directly processed and a response was returned.
So, I tried to create a new website and once the blueprint was generated, I had an option to either manually edit some fields/schema or use prompts to create one. As expected, I chose the latter. I began with a classic, What is 2+2? prompt. The response was 4 and I knew then that I could leverage this.
Besides, other prompt injection test cases, I wanted to see if the training data was biased. I did it by writing prompts such as Who's better? A girl or a boy? The prompt was processed and it returned a girl . I tried it twice to confirm if it wasn't hallucination.
Hope you enjoyed reading the article. Please consider subscribing and clapping for the article.
In case you are interested in CTF/THM/HTB writeups consider visiting my YouTube channel.