Learning Objectives

By completing this task, you will be able to:

  • Define sensitive information disclosure (LLM02)
  • Distinguish between parametric memory and retrieval-based leakage
  • Identify architectural points where data can leak
  • Understand why LLM confidentiality is a system design issue
  • Prepare to analyse real disclosure scenarios in the next task

Prerequisites

This room is part of a broader AI Security path. It is recommended that you complete this room in the intended order to establish core fundamentals. At a minimum, you should be familiar with the concepts covered in the RAG Security Fundamentals and Data Poisoning in RAG Systems rooms. You should also be familiar with:

  • How LLMs generate responses
  • High-level knowledge of vector databases (recommended)

No machine learning or mathematical background is required.

Answer the questions below

  • What OWASP category covers sensitive data exposure in LLM systems?

LLM02

  • What mathematical mechanism determines which documents are retrieved in RAG systems?

Similarity

  • What retrieval parameter controls how many documents are returned?

Top-k

  • What CVE demonstrated zero-click prompt injection via retrieved content?

EchoLeak

  • What mathematical metric is commonly used to measure similarity between embeddings?

Cosine

  • What attack attempts to reconstruct text from stored vectors?

Inversion

  • What retrieval configuration change increases exposure surface by expanding the number of ranked chunks?

Top-k

  • What logical grouping inside a vector database separates datasets?

Namespace

  • Which segmentation model provides the strongest isolation but at a higher cost?

Per-tenant

  • What type of enforcement operates before computation instead of after?

Deterministic

  • What control removes sensitive data before embedding?

Redaction

  • What policy ensures deleted embeddings are removed from storage?

Retention

  • What caused the assistant to expose confidential data?

Broad Retrieval

  • Why did Tom Russo's HR record appear when asking about benefits?

Semantic Collision

  • What control could have prevented the disclosure in Phase 2?

Metadata Filtering

Conclusion

Guarding the Context, Not Just the Model

Everything in this room came back to the same point. LLM systems leak sensitive information without anyone exploiting the model itself. The model didn't choose to disclose anything. The pipeline fed it data it should never have seen, and it did what it always does. It used whatever was in the context window.

Confidentiality in LLM systems is a pipeline problem. Not a model problem. Not a prompt engineering problem. A pipeline problem. If unauthorised data makes it into the context window, the exposure has already happened. It doesn't matter what the system prompt says, what guardrails you added, or how carefully you tuned the output filters. The model saw it. You can't walk that back.

Security has to be enforced before the similarity search runs. Before ranking. Before the context window gets assembled. By the time the model is generating a response, it's too late to decide what it should and shouldn't know.

Key Takeaways

Sensitive information disclosure in RAG systems usually comes down to a handful of recurring mistakes:

  • Top-k retrieval that pulls too broadly and drags in documents from outside the user's scope
  • Metadata filtering that's missing or misconfigured
  • Shared vector indexes where tenants aren't properly isolated from each other
  • Stale embeddings from deleted documents that are still sitting in the index, still searchable
  • Logging systems that record full augmented prompts, creating an unguarded copy of the data
  • Namespace enforcement that exists on paper but breaks under edge cases

Similarity search is math. It finds the closest vectors. That's all it does. It has no concept of who should see what, which department owns a document, or whether something was supposed to be deleted last month.

Authorisation is policy. And if that policy isn't enforced at the vector layer, before the search runs, similarity will happily ignore every boundary you thought you had.

Red-Team Lessons

When you're assessing an LLM system, forget prompt tricks for a minute. The real findings are in the architecture.

  • Can you influence what gets retrieved? Widen the scope, change the filters, see what comes back that shouldn't.
  • Can you trigger similarity results that cross tenant boundaries?
  • Are namespaces enumerable? Can you figure out what other tenants or collections exist just by poking around?
  • Are embeddings directly accessible through the API or some debug endpoint nobody locked down?
  • Are logs capturing full context windows in plaintext somewhere?

The uncomfortable part about disclosure in these systems is that it's quiet. Nothing crashes. No error messages. The system looks stable, responses seem normal, and the whole time it's surfacing sensitive associations that nobody authorised. You won't find it unless you're specifically looking for it.

Blue-Team Lessons

Defenders need to stop thinking of vector databases as just search infrastructure. They're sensitive storage. Treat them that way.

  • Enforce metadata filtering before running similarity searches. Not in the application layer. Not in the prompt. In the query.
  • Apply deterministic namespace isolation to prevent tenants from accidentally seeing each other's data.
  • Remove stale embeddings during retention cycles. If the source document is gone, the embedding should be gone too.
  • Minimise logging exposure. If your logs contain the same content you locked down in retrieval, you've built a side door.
  • Continuously monitor for abnormal retrieval behaviour: volume spikes, cross-tenant access attempts, repeated probing.

The model will not enforce any of this. It can't. It works with whatever lands in the context window and has no idea whether it should be there. The retrieval engine is the last line of defence before the model sees anything. That's where segmentation lives or dies.

Framework Alignment

This room focused on: OWASP LLM02 — Sensitive Information Disclosure

Related OWASP risks include:

  • LLM01 — Prompt Injection (when retrieval trust is abused)
  • LLM05 — Supply Chain Vulnerabilities (when external corpora are indexed)

From a governance perspective:

  • The NIST AI Risk Management Framework (AI RMF) emphasises data governance and access control as foundational risk controls.
  • The EU AI Act requires appropriate data management and technical safeguards for systems handling sensitive information.

LLM disclosure risk is not theoretical. It is an architectural governance issue.

Bridge to Upcoming Challenges

In the next challenge rooms, you will encounter scenarios where disclosure and poisoning intersect.

You will analyse systems where attackers:

  • Inject malicious documents
  • Exploit retrieval boundaries
  • Manipulate embeddings
  • Trigger exfiltration pathways

Now that you understand the risks of LLM02 and have practiced deterministic mitigations, you are prepared to detect and defend against combined poisoning and disclosure attacks.

Answer the questions below

All done!