June 15, 2026
Safety and governance for LLM systems: guardrails, PII, audit, and memory
The level where an LLM system stops being a demo and earns the right to touch real data and real decisions: layered guardrails that fail…
Varun Jindal
14 min read
The level where an LLM system stops being a demo and earns the right to touch real data and real decisions: layered guardrails that fail closed, PII handled at the boundary, an immutable audit trail, and scoped memory.
By the time an LLM system is making decisions that matter, "it usually works" is no longer the bar. This is Level 4 of the maturity model — safety and governance — and it's where four disciplines that teams tend to bolt on late have to be designed in instead. They share one idea: don't trust a single point to do the right thing. Layer independent guardrails so a miss at one is caught at the next. Handle sensitive data at every boundary so it never accumulates where it shouldn't. Write an immutable audit trail so "why did it do that?" has an answer months later. And scope memory by category so one customer's data can never reach another's decision — with a clean line between what you ship and what the system earns.
None of these is exotic. Each is a small, testable contract enforced in code. Here is how they fit together.
Defense in depth: layer your guardrails
A lot of teams "add guardrails" by bolting one moderation filter onto the model output and calling it done. That's the AI equivalent of a single firewall rule. Real safety, like real security, is defense in depth: several independent layers, each catching a different class of problem, arranged so a miss at one is caught at the next.
Make every guardrail the same small contract, so you can add, remove, test, and reorder them independently.
from typing import Protocol, Literal
from dataclasses import dataclass, field
@dataclass
class GuardrailResult:
action: Literal["ok", "block", "redact", "flag"]
layer: str
rule: str = ""
detail: dict = field(default_factory=dict) # e.g. {"fields": ["email"]}
class Guardrail(Protocol):
layer: str
def check(self, ctx: "Context") -> GuardrailResult: ...from typing import Protocol, Literal
from dataclasses import dataclass, field
@dataclass
class GuardrailResult:
action: Literal["ok", "block", "redact", "flag"]
layer: str
rule: str = ""
detail: dict = field(default_factory=dict) # e.g. {"fields": ["email"]}
class Guardrail(Protocol):
layer: str
def check(self, ctx: "Context") -> GuardrailResult: ...A request then hits checkpoints on the way in and out:
# Layer Catches Stage
- ----------------------------- ------------------------------------------------------ ---------------------
1 Input / pre-prompt injection, oversized/malformed input before the model
2 Grounding constraints model using disallowed actions/data; off-schema output shapes the model call
3 Output scrub policy violations, PII echoed back after generation
4 Verification business-rule / consistency violations deterministic code
5 Judge "plausible but wrong" that passed mechanical checks sampled second model
6 Composed confidence + routing the unknowns — anything still uncertain final net
in ─▶[1 input]─▶[2 grounding]─▶(model)─▶[3 output scrub]─▶[4 verify]─▶[5 judge]─▶[6 confidence/route]─▶ act|escalate# Layer Catches Stage
- ----------------------------- ------------------------------------------------------ ---------------------
1 Input / pre-prompt injection, oversized/malformed input before the model
2 Grounding constraints model using disallowed actions/data; off-schema output shapes the model call
3 Output scrub policy violations, PII echoed back after generation
4 Verification business-rule / consistency violations deterministic code
5 Judge "plausible but wrong" that passed mechanical checks sampled second model
6 Composed confidence + routing the unknowns — anything still uncertain final net
in ─▶[1 input]─▶[2 grounding]─▶(model)─▶[3 output scrub]─▶[4 verify]─▶[5 judge]─▶[6 confidence/route]─▶ act|escalateThe order and the pass/fail logic live in one readable place. Two rules: fail closed (an errored or unavailable guardrail blocks or escalates — it never waves the request through) and log every block as a first-class signal.
def run_guardrails(ctx, layers, metrics) -> list[GuardrailResult]:
applied = [] # accumulate — a redacted result must be visible to the caller/audit
for g in layers:
try:
r = g.check(ctx)
except GuardrailError: # NARROW — don't swallow your own bugs as a "block"
metrics.incr("guardrail_blocks_total", layer=g.layer, rule="error") # rule = bounded enum, NEVER matched content
return applied + [GuardrailResult("block", g.layer, "error")] # same label the metric emitted
if r.action == "ok":
continue
metrics.incr("guardrail_blocks_total", layer=g.layer, rule=r.rule)
if r.action in ("redact", "flag"):
ctx.apply(r); applied.append(r); continue
if r.action == "block":
return applied + [r] # stop at first hard block
return applied + [GuardrailResult("block", g.layer, "unknown_action")] # unknown action == block (fail closed)
return applied or [GuardrailResult("ok", "all")]def run_guardrails(ctx, layers, metrics) -> list[GuardrailResult]:
applied = [] # accumulate — a redacted result must be visible to the caller/audit
for g in layers:
try:
r = g.check(ctx)
except GuardrailError: # NARROW — don't swallow your own bugs as a "block"
metrics.incr("guardrail_blocks_total", layer=g.layer, rule="error") # rule = bounded enum, NEVER matched content
return applied + [GuardrailResult("block", g.layer, "error")] # same label the metric emitted
if r.action == "ok":
continue
metrics.incr("guardrail_blocks_total", layer=g.layer, rule=r.rule)
if r.action in ("redact", "flag"):
ctx.apply(r); applied.append(r); continue
if r.action == "block":
return applied + [r] # stop at first hard block
return applied + [GuardrailResult("block", g.layer, "unknown_action")] # unknown action == block (fail closed)
return applied or [GuardrailResult("ok", "all")]The one test that matters most here proves it fails closed:
def test_fail_closed_on_error():
class Boom: # a guardrail that throws
layer = "x"
def check(self, ctx): raise GuardrailError("boom")
result = run_guardrails(ctx, [Boom()], NullMetrics())
assert result[-1].action == "block" # an erroring guardrail BLOCKS, never silently passesdef test_fail_closed_on_error():
class Boom: # a guardrail that throws
layer = "x"
def check(self, ctx): raise GuardrailError("boom")
result = run_guardrails(ctx, [Boom()], NullMetrics())
assert result[-1].action == "block" # an erroring guardrail BLOCKS, never silently passesWhy layers beat one big filter: there's no single point of failure (injection slips past layer 1? grounding limits what it can do; an off output? scrub or verification catches it). Each layer is simple and testable — six single-purpose checks each have a clear contract, where one mega-filter is impossible to reason about. And different layers catch different failure modes**: input scrub catches attacks, verification catches logic errors, the judge catches plausible-wrongness, confidence catches unknown unknowns. No single mechanism covers all four.**
One more reason to meter every block: guardrail_blocks_total{layer, rule} is one of the sharpest production health signals you have. A spike is an attack, a regression, or a bad deploy — page on it.
The anti-patterns are mostly the inverse of the rules. Fail-open is worse than no guardrail — it gives false assurance. One layer doing everything is unmaintainable. Guardrails the model can talk past ("please don't do X" in the prompt) aren't guardrails — constrain the output space (enum/schema) so the disallowed thing is unrepresentable. Silent blocks mean you can't tell an attack from a bug. And note that last layer-3 job — scrubbing PII the model echoed back — which is the natural handoff to the next discipline.
PII at the boundary
AI systems are unusually hungry for data — they want rich context to reason well, and they generate records of everything they decide. That collides with a basic obligation: don't accumulate sensitive personal data you don't need. The resolution is to handle PII at the boundary — scrub it on the way in and out, and never let raw sensitive data settle into your stores, logs, or model traffic.
Drive redaction from a declared classification, not ad-hoc if field == "email" scattered around.
class Sensitivity(Enum):
PUBLIC = 0 # ok anywhere
INTERNAL = 1 # ok in tier-1/2 logs + ledger summary
PII = 2 # mask in summaries; never in shared memory; tier-3 only if retained at all
SECRET = 3 # never persisted, never logged, never to the model unless essential
FIELD_POLICY = { # the single source of truth
"name": Sensitivity.PII, "email": Sensitivity.PII, "card_number": Sensitivity.SECRET,
"amount": Sensitivity.INTERNAL, "category": Sensitivity.PUBLIC,
}class Sensitivity(Enum):
PUBLIC = 0 # ok anywhere
INTERNAL = 1 # ok in tier-1/2 logs + ledger summary
PII = 2 # mask in summaries; never in shared memory; tier-3 only if retained at all
SECRET = 3 # never persisted, never logged, never to the model unless essential
FIELD_POLICY = { # the single source of truth
"name": Sensitivity.PII, "email": Sensitivity.PII, "card_number": Sensitivity.SECRET,
"amount": Sensitivity.INTERNAL, "category": Sensitivity.PUBLIC,
}Every place data moves between components is a boundary — into the system, into the model, into the ledger, into logs, out to a service. At each, ask: does what crosses here need raw sensitive data? Usually no — redact, mask, or hash before it crosses.
inbound ─▶[scrub]─▶ working set ─▶[scrub]─▶ model
├─▶[redact + hash]─▶ ledger (no raw PII)
└─▶[redact]─▶ logs (no raw PII)inbound ─▶[scrub]─▶ working set ─▶[scrub]─▶ model
├─▶[redact + hash]─▶ ledger (no raw PII)
└─▶[redact]─▶ logs (no raw PII)The ledger boundary deserves special care: you need to prove what a decision was made on, not retain the sensitive payload. Store a hash plus a redacted summary; re-hash later to prove equivalence — verifiability without the liability.
def classify(key) -> Sensitivity:
return FIELD_POLICY.get(key, Sensitivity.PII) # DEFAULT-DENY: unknown keys are treated as PII
def redact(obj): # MUST recurse — real payloads are nested
if isinstance(obj, dict):
return {k: ("•••" if classify(k).value >= Sensitivity.PII.value else redact(v))
for k, v in obj.items()}
if isinstance(obj, list):
return [redact(v) for v in obj]
return obj
def drop_secrets(obj): # SECRET fields never even enter the hash input
if isinstance(obj, dict):
return {k: drop_secrets(v) for k, v in obj.items() if classify(k) is not Sensitivity.SECRET}
if isinstance(obj, list):
return [drop_secrets(v) for v in obj]
return obj
def ledger_view(raw: dict, tenant_key: bytes) -> dict: # the ONLY way to write to the ledger
safe = drop_secrets(raw)
# HMAC with a per-tenant key, NOT bare sha256 — a plain hash of low-entropy PII (card, email,
# phone) is brute-forceable / rainbow-tableable, so "verifiability without liability" needs a key.
return {"inputs_hmac": "hmac-sha256:" + hmac_sha256(tenant_key, canonical_json(safe)),
"inputs_summary": redact(raw)}def classify(key) -> Sensitivity:
return FIELD_POLICY.get(key, Sensitivity.PII) # DEFAULT-DENY: unknown keys are treated as PII
def redact(obj): # MUST recurse — real payloads are nested
if isinstance(obj, dict):
return {k: ("•••" if classify(k).value >= Sensitivity.PII.value else redact(v))
for k, v in obj.items()}
if isinstance(obj, list):
return [redact(v) for v in obj]
return obj
def drop_secrets(obj): # SECRET fields never even enter the hash input
if isinstance(obj, dict):
return {k: drop_secrets(v) for k, v in obj.items() if classify(k) is not Sensitivity.SECRET}
if isinstance(obj, list):
return [drop_secrets(v) for v in obj]
return obj
def ledger_view(raw: dict, tenant_key: bytes) -> dict: # the ONLY way to write to the ledger
safe = drop_secrets(raw)
# HMAC with a per-tenant key, NOT bare sha256 — a plain hash of low-entropy PII (card, email,
# phone) is brute-forceable / rainbow-tableable, so "verifiability without liability" needs a key.
return {"inputs_hmac": "hmac-sha256:" + hmac_sha256(tenant_key, canonical_json(safe)),
"inputs_summary": redact(raw)}Two decisions to pin once and reuse everywhere: (a) keyed hash (HMAC, per-tenant key in KMS) for any low-entropy field — a bare SHA-256 of a 16-digit number is reversible; (b) a single canonical-JSON spec (e.g. RFC 8785 / JCS — sorted keys, normalized unicode, fixed number format), because the audit-ledger hash chain depends on re-hashing producing identical bytes.
The model is an external boundary too — often a third party. Strip sensitive fields it doesn't need to reason out of prompts, and scrub its output before persisting or returning, because models echo input back. (That output scrub is exactly layer 3 above.)
The failure mode is "we redact in most places" — a leak with extra steps. Make scrubbing the only path through each boundary, so a developer can't forget:
ledger.append(ledger_view(raw, tenant_key)) # there is no ledger.append(raw) — redaction isn't optional
log.tier2(redact(event)) # the logging helper redacts; raw logging isn't exposedledger.append(ledger_view(raw, tenant_key)) # there is no ledger.append(raw) — redaction isn't optional
log.tier2(redact(event)) # the logging helper redacts; raw logging isn't exposedAnd design for residency and access up front. Sensitive data carries constraints on where it may live and who may read it — per-tenant keys, region-pinned storage for regulated tiers, access-gated audit reads. Retrofitting after data has spread everywhere is the nightmare you're avoiding. The subtle leak isn't out, it's sideways — per-user context reaching the components that decide for other users — but that one is best handled structurally, in the memory model below.
The append-only audit ledger
The first time someone asks "why did the system decide that for case X back in March?", you find out whether you built an audit trail or just have logs. Logs are for debugging — they rotate, they're unstructured, they're not the truth. An audit ledger is the canonical, append-only, tamper-evident record of every decision and why. For anything consequential it's not optional.
CREATE TABLE decision_ledger (
decision_id TEXT PRIMARY KEY, -- threads through logs/traces for this decision
ts TIMESTAMPTZ NOT NULL,
tenant_id TEXT NOT NULL,
identity TEXT NOT NULL, -- who/what authority it ran with
capability TEXT NOT NULL,
inputs_hash TEXT NOT NULL, -- keyed hash of canonical inputs (NOT the raw payload)
inputs_summary JSONB NOT NULL, -- redacted, PII-free human-readable summary
model_version TEXT NOT NULL,
prompt_version TEXT NOT NULL,
decision JSONB NOT NULL, -- the structured decision
confidence REAL, -- nullable: a manual supersede carries no model confidence
routing TEXT NOT NULL, -- auto | hitl_recommended | hitl_required | reject | abstain
outcome JSONB, -- the ONE mutable column, filled in later; EXCLUDED from the hash
supersedes TEXT REFERENCES decision_ledger(decision_id),
seq BIGSERIAL, -- monotonic, for the hash chain
prev_hash TEXT, -- entry_hash of seq-1
entry_hash TEXT NOT NULL -- H(canonical(row-without-hash) + prev_hash)
);
-- App role gets INSERT + SELECT only. Revoke mutation in the DB, not just in code.
REVOKE UPDATE, DELETE, TRUNCATE ON decision_ledger FROM app_role;
GRANT UPDATE (outcome) ON decision_ledger TO app_role; -- column-level: ONLY `outcome` is writable later
-- NOTE: this stops the APP, not a DB superuser/owner who can still DROP/TRUNCATE/rewrite. True
-- append-only against an operator needs WORM/object-lock storage or external anchoring (below).CREATE TABLE decision_ledger (
decision_id TEXT PRIMARY KEY, -- threads through logs/traces for this decision
ts TIMESTAMPTZ NOT NULL,
tenant_id TEXT NOT NULL,
identity TEXT NOT NULL, -- who/what authority it ran with
capability TEXT NOT NULL,
inputs_hash TEXT NOT NULL, -- keyed hash of canonical inputs (NOT the raw payload)
inputs_summary JSONB NOT NULL, -- redacted, PII-free human-readable summary
model_version TEXT NOT NULL,
prompt_version TEXT NOT NULL,
decision JSONB NOT NULL, -- the structured decision
confidence REAL, -- nullable: a manual supersede carries no model confidence
routing TEXT NOT NULL, -- auto | hitl_recommended | hitl_required | reject | abstain
outcome JSONB, -- the ONE mutable column, filled in later; EXCLUDED from the hash
supersedes TEXT REFERENCES decision_ledger(decision_id),
seq BIGSERIAL, -- monotonic, for the hash chain
prev_hash TEXT, -- entry_hash of seq-1
entry_hash TEXT NOT NULL -- H(canonical(row-without-hash) + prev_hash)
);
-- App role gets INSERT + SELECT only. Revoke mutation in the DB, not just in code.
REVOKE UPDATE, DELETE, TRUNCATE ON decision_ledger FROM app_role;
GRANT UPDATE (outcome) ON decision_ledger TO app_role; -- column-level: ONLY `outcome` is writable later
-- NOTE: this stops the APP, not a DB superuser/owner who can still DROP/TRUNCATE/rewrite. True
-- append-only against an operator needs WORM/object-lock storage or external anchoring (below).Three design choices do the heavy lifting.
Append-only — corrections supersede, never overwrite. You never edit or delete an entry; a correction is a new row that points at the old one.
seq 1041 decision=approve conf=0.91 supersedes=NULL
seq 1207 decision=reject conf=NULL supersedes=<id of 1041> reason="manual review"seq 1041 decision=approve conf=0.91 supersedes=NULL
seq 1207 decision=reject conf=NULL supersedes=<id of 1041> reason="manual review"The history is the truth — a half-remembered old entry is never silently wrong, it's visibly superseded. Enforce it at the database (revoke UPDATE/DELETE), because "we promise not to update it" is not append-only.
Hash the inputs — prove what, don't warehouse it. This is the same ledger_view from the PII section: store a hash plus a redacted summary, and prove a decision was made on specific inputs by re-hashing and comparing. Storing the full sensitive payload "for completeness" just builds a honeypot.
Tamper-evidence — sign or chain. "Append-only" is a discipline until you make it cryptographic.
import hashlib
IMMUTABLE = ("decision_id","ts","tenant_id","identity","capability","inputs_hash","inputs_summary",
"model_version","prompt_version","decision","confidence","routing","seq","supersedes")
GENESIS = "0" * 64
def _h(payload: dict) -> str: # runnable: encode + hexdigest
return hashlib.sha256(canonical_json(payload).encode()).hexdigest()
def seal(row: dict, prev_hash: str) -> dict:
row["prev_hash"] = prev_hash
# hash ONLY immutable fields (NOT `outcome`, written later) via a STRUCTURED payload —
# never string-concat hash+json (boundary ambiguity: two different rows could serialize identically).
row["entry_hash"] = _h({"row": {k: row[k] for k in IMMUTABLE}, "prev": prev_hash})
return rowimport hashlib
IMMUTABLE = ("decision_id","ts","tenant_id","identity","capability","inputs_hash","inputs_summary",
"model_version","prompt_version","decision","confidence","routing","seq","supersedes")
GENESIS = "0" * 64
def _h(payload: dict) -> str: # runnable: encode + hexdigest
return hashlib.sha256(canonical_json(payload).encode()).hexdigest()
def seal(row: dict, prev_hash: str) -> dict:
row["prev_hash"] = prev_hash
# hash ONLY immutable fields (NOT `outcome`, written later) via a STRUCTURED payload —
# never string-concat hash+json (boundary ambiguity: two different rows could serialize identically).
row["entry_hash"] = _h({"row": {k: row[k] for k in IMMUTABLE}, "prev": prev_hash})
return rowA hash chain is tamper-evidence against mutation — alter one row and every later entry_hash stops matching. It does not stop truncation (delete the tail) or a full rewrite from genesis. So the verifier asserts seqcontiguity and a knownGENESIS; and for an operator-level threat, also sign each entry_hash with an asymmetric key a separate signer holds, publishing periodic signed checkpoints to external storage. REVOKE stops the app; signing plus anchoring is what stops someone with DB write access.
def verify_chain(rows): # returns first broken/missing seq, or None
prev, expect = GENESIS, None
for r in sorted(rows, key=lambda r: r["seq"]):
if expect is not None and r["seq"] != expect: # contiguity → detects truncation/deletion
return expect # a gap means rows were removed
if r["entry_hash"] != _h({"row": {k: r[k] for k in IMMUTABLE}, "prev": prev}):
return r["seq"] # mutation detected
prev, expect = r["entry_hash"], r["seq"] + 1
return Nonedef verify_chain(rows): # returns first broken/missing seq, or None
prev, expect = GENESIS, None
for r in sorted(rows, key=lambda r: r["seq"]):
if expect is not None and r["seq"] != expect: # contiguity → detects truncation/deletion
return expect # a gap means rows were removed
if r["entry_hash"] != _h({"row": {k: r[k] for k in IMMUTABLE}, "prev": prev}):
return r["seq"] # mutation detected
prev, expect = r["entry_hash"], r["seq"] + 1
return NoneAppend under a lock: read the tail's
entry_hash withSELECT … FOR UPDATE(or an advisory lock) in the same transaction as theINSERT, or two concurrent appends fork the chain on the sameprev_hash.
Prove the tamper-evidence — the whole value prop in one test:
def test_tamper_detected():
chain = seal_all([row(1), row(2), row(3)]) # three sealed, chained rows
chain[1]["decision"] = {"label": "tampered"} # mutate a sealed row in place
assert verify_chain(chain) == chain[1]["seq"] # detected exactly at the altered rowdef test_tamper_detected():
chain = seal_all([row(1), row(2), row(3)]) # three sealed, chained rows
chain[1]["decision"] = {"label": "tampered"} # mutate a sealed row in place
assert verify_chain(chain) == chain[1]["seq"] # detected exactly at the altered rowAnd the queries the ledger is for:
-- "why did the system do X, and what's its current state?" — the FULL supersede chain, both directions
WITH RECURSIVE lineage AS (
SELECT * FROM decision_ledger WHERE decision_id = $1
UNION
SELECT d.* FROM decision_ledger d JOIN lineage l
ON d.decision_id = l.supersedes -- walk back to ancestors
OR d.supersedes = l.decision_id -- walk forward to whatever superseded it
)
SELECT * FROM lineage ORDER BY seq; -- the current state is the last row
-- auto-execution rate by capability, last 7 days
SELECT capability, avg((routing='auto')::int) FROM decision_ledger
WHERE ts > now() - interval '7 days' GROUP BY capability;-- "why did the system do X, and what's its current state?" — the FULL supersede chain, both directions
WITH RECURSIVE lineage AS (
SELECT * FROM decision_ledger WHERE decision_id = $1
UNION
SELECT d.* FROM decision_ledger d JOIN lineage l
ON d.decision_id = l.supersedes -- walk back to ancestors
OR d.supersedes = l.decision_id -- walk forward to whatever superseded it
)
SELECT * FROM lineage ORDER BY seq; -- the current state is the last row
-- auto-execution rate by capability, last 7 days
SELECT capability, avg((routing='auto')::int) FROM decision_ledger
WHERE ts > now() - interval '7 days' GROUP BY capability;The ledger is the backbone, not a side-effect: analytics, drift detection, override rates, and debugging all read from it, and decision_id ties each entry to its trace. So write the entry as the last node of every decision, unconditionally — never "skip the ledger if the queue is full," or your audit has holes exactly when things went wrong.
A memory model for multi-tenant agents
The moment your agents serve more than one customer — or even more than one user — "memory" stops being a feature and becomes a data-governance problem wearing a feature's clothes. A single undifferentiated memory store is the sideways leak from the PII section made concrete: cross-tenant data exposure waiting to happen, and an audit you can't pass. The fix isn't a fancier vector DB; it's a typed memory model — named categories, each with explicit rules for scope, access, and sensitivity, enforced in code at the store boundary.
from enum import Enum
class MemoryCategory(Enum):
TENANT_SHARED = "tenant_shared" # org-wide knowledge — NO personal data
AGENT_NAMESPACE = "agent_namespace" # learned patterns (abstractions only, never raw records)
WORKFLOW_CONTEXT = "workflow_context" # scratch state for ONE invocation; discarded after
AUDIT = "audit" # immutable record; role-gated reads only
SEMANTIC_KNOWLEDGE = "semantic_knowledge" # curated reference/grounding facts
CONVERSATION = "conversation" # per-user, per-session; FIREWALLED from decision agentsfrom enum import Enum
class MemoryCategory(Enum):
TENANT_SHARED = "tenant_shared" # org-wide knowledge — NO personal data
AGENT_NAMESPACE = "agent_namespace" # learned patterns (abstractions only, never raw records)
WORKFLOW_CONTEXT = "workflow_context" # scratch state for ONE invocation; discarded after
AUDIT = "audit" # immutable record; role-gated reads only
SEMANTIC_KNOWLEDGE = "semantic_knowledge" # curated reference/grounding facts
CONVERSATION = "conversation" # per-user, per-session; FIREWALLED from decision agentsThe point isn't these exact six — it's that every datum belongs to exactly one category, and the category dictates the rules. For each, pin down three things and enforce them in code, not docs:
category scope (partition key) who can read may hold PII?
------------------ --------------------- ----------------------- -----------------------------
TENANT_SHARED tenant any caller in tenant **no**
AGENT_NAMESPACE tenant the system patterns only, **no raw PII**
WORKFLOW_CONTEXT invocation that invocation transient only
AUDIT tenant audit role only hashes/redacted
SEMANTIC_KNOWLEDGE global/tenant the system curated, **no** user PII
CONVERSATION user + session that user's own session yes, firewalledcategory scope (partition key) who can read may hold PII?
------------------ --------------------- ----------------------- -----------------------------
TENANT_SHARED tenant any caller in tenant **no**
AGENT_NAMESPACE tenant the system patterns only, **no raw PII**
WORKFLOW_CONTEXT invocation that invocation transient only
AUDIT tenant audit role only hashes/redacted
SEMANTIC_KNOWLEDGE global/tenant the system curated, **no** user PII
CONVERSATION user + session that user's own session yes, firewalledTenant scoping is non-negotiable: every read/write carries tenant_id (and user_id where the category is user-scoped), checked at the store, with cross-tenant access a hard error.
@dataclass
class Caller: # comes from the validated auth context, NOT the request body
tenant_id: str
user_id: str | None = None
role: str | None = None
def partition_key(category, tenant_id, *, user_id=None, session_id=None, invocation_id=None) -> str:
C = MemoryCategory # each branch matches the table's scope column
if category is C.WORKFLOW_CONTEXT: # invocation-scoped — the next run can't read it
assert invocation_id, "workflow context is invocation-scoped"
return f"{category.value}:{tenant_id}:{invocation_id}"
if category is C.CONVERSATION: # per-user, per-session
assert user_id and session_id, "conversation is per-user, per-session"
return f"{category.value}:{tenant_id}:{user_id}:{session_id}"
if category is C.SEMANTIC_KNOWLEDGE: # may be global; 'global' never collapses tenants together
return f"{category.value}:{tenant_id or 'global'}"
return f"{category.value}:{tenant_id}" # TENANT_SHARED, AGENT_NAMESPACE, AUDIT
def read(category, *, tenant_id, key, caller: Caller, user_id=None, session_id=None, invocation_id=None):
enforce_access(category, tenant_id, caller, user_id) # raises Forbidden on ANY violation
return store.get(partition_key(category, tenant_id, user_id=user_id,
session_id=session_id, invocation_id=invocation_id), key)
def enforce_access(category, tenant_id, caller: Caller, user_id):
if caller.tenant_id != tenant_id: # THE cross-tenant gate
raise Forbidden("cross-tenant access")
if category is MemoryCategory.AUDIT and caller.role != "audit":
raise Forbidden("audit memory is role-gated")
if category is MemoryCategory.CONVERSATION: # firewall: only your OWN session
if user_id is None or caller.user_id != user_id:
raise Forbidden("conversation memory is per-user")
def write(category, *, tenant_id, key, value, caller: Caller, user_id=None, session_id=None, invocation_id=None):
enforce_access(category, tenant_id, caller, user_id)
if category in (MemoryCategory.TENANT_SHARED, MemoryCategory.AGENT_NAMESPACE) and contains_pii(value):
raise Forbidden(f"{category} must not hold personal data") # explicit raise, NOT assert (-O strips asserts)
store.put(partition_key(category, tenant_id, user_id=user_id,
session_id=session_id, invocation_id=invocation_id), key, value)@dataclass
class Caller: # comes from the validated auth context, NOT the request body
tenant_id: str
user_id: str | None = None
role: str | None = None
def partition_key(category, tenant_id, *, user_id=None, session_id=None, invocation_id=None) -> str:
C = MemoryCategory # each branch matches the table's scope column
if category is C.WORKFLOW_CONTEXT: # invocation-scoped — the next run can't read it
assert invocation_id, "workflow context is invocation-scoped"
return f"{category.value}:{tenant_id}:{invocation_id}"
if category is C.CONVERSATION: # per-user, per-session
assert user_id and session_id, "conversation is per-user, per-session"
return f"{category.value}:{tenant_id}:{user_id}:{session_id}"
if category is C.SEMANTIC_KNOWLEDGE: # may be global; 'global' never collapses tenants together
return f"{category.value}:{tenant_id or 'global'}"
return f"{category.value}:{tenant_id}" # TENANT_SHARED, AGENT_NAMESPACE, AUDIT
def read(category, *, tenant_id, key, caller: Caller, user_id=None, session_id=None, invocation_id=None):
enforce_access(category, tenant_id, caller, user_id) # raises Forbidden on ANY violation
return store.get(partition_key(category, tenant_id, user_id=user_id,
session_id=session_id, invocation_id=invocation_id), key)
def enforce_access(category, tenant_id, caller: Caller, user_id):
if caller.tenant_id != tenant_id: # THE cross-tenant gate
raise Forbidden("cross-tenant access")
if category is MemoryCategory.AUDIT and caller.role != "audit":
raise Forbidden("audit memory is role-gated")
if category is MemoryCategory.CONVERSATION: # firewall: only your OWN session
if user_id is None or caller.user_id != user_id:
raise Forbidden("conversation memory is per-user")
def write(category, *, tenant_id, key, value, caller: Caller, user_id=None, session_id=None, invocation_id=None):
enforce_access(category, tenant_id, caller, user_id)
if category in (MemoryCategory.TENANT_SHARED, MemoryCategory.AGENT_NAMESPACE) and contains_pii(value):
raise Forbidden(f"{category} must not hold personal data") # explicit raise, NOT assert (-O strips asserts)
store.put(partition_key(category, tenant_id, user_id=user_id,
session_id=session_id, invocation_id=invocation_id), key, value)The firewall that prevents the embarrassing leak is CONVERSATION: a decision agent for user B must never read user A's conversation. Keep it in its own category, readable only within A's own session, and never injected into a shared decision path. That one boundary prevents a whole class of "why does the AI know that about me?" incidents — and it's the structural answer to the sideways leak flagged earlier.
Test the leaks you're most afraid of on day one — at the boundary, not by string-searching a serialized context (a base64 or embedding of the data would slip a substring check):
def test_no_cross_tenant_read():
write(MemoryCategory.TENANT_SHARED, tenant_id="A", key="policy", value="x", caller=Caller("A"))
with pytest.raises(Forbidden): # A's caller may not read B's partition
read(MemoryCategory.TENANT_SHARED, tenant_id="B", key="policy", caller=Caller("A"))
def test_conversation_firewalled_to_owning_user():
write(MemoryCategory.CONVERSATION, tenant_id="A", user_id="u1", session_id="s1", key="msg",
value="secret", caller=Caller("A", user_id="u1"))
with pytest.raises(Forbidden): # u2 can't read u1's conversation
read(MemoryCategory.CONVERSATION, tenant_id="A", key="msg",
user_id="u1", session_id="s1", caller=Caller("A", user_id="u2"))def test_no_cross_tenant_read():
write(MemoryCategory.TENANT_SHARED, tenant_id="A", key="policy", value="x", caller=Caller("A"))
with pytest.raises(Forbidden): # A's caller may not read B's partition
read(MemoryCategory.TENANT_SHARED, tenant_id="B", key="policy", caller=Caller("A"))
def test_conversation_firewalled_to_owning_user():
write(MemoryCategory.CONVERSATION, tenant_id="A", user_id="u1", session_id="s1", key="msg",
value="secret", caller=Caller("A", user_id="u1"))
with pytest.raises(Forbidden): # u2 can't read u1's conversation
read(MemoryCategory.CONVERSATION, tenant_id="A", key="msg",
user_id="u1", session_id="s1", caller=Caller("A", user_id="u2"))Even single-tenant today, writing it this way makes the move to multi-tenant a config change instead of a rewrite. The anti-patterns: one undifferentiated store (no category, no rule, eventual leak); filtering "in the application, usually" instead of at the store boundary on every access; raw records in shared/namespace memory (those are abstractions-only — check at write time); and clearing memory that wipes curated knowledge — which is the seam the last section exists to draw.
Seed vs runtime: what you ship vs what the system earns
Here's a bug I've watched happen more than once: someone runs a "clear memory" or "reset" to wipe accumulated state, and it also deletes the prompts, rules, and reference data the system needs to function. It comes back amnesiac — not just forgetting what it learned, but forgetting what it is.
The root cause is a missing distinction. In any stateful AI system there are two completely different kinds of data, and they should be stored, versioned, and lifecycle-managed differently: seed (what you ship — prompts, prompt versions, decision rules and policies, golden sets and eval baselines, grounding data) and runtime (what the system earns — the audit ledger, learned memory, drift signals, incidents, per-session context). Seed is the genome: authored by you, version-controlled, travels with the deploy, never mutated by the running system. Runtime is experience: a byproduct of operating, mutable, much of it disposable — clear it and the system still works, back to baseline behavior, because its identity lives in seed.
A directory layout that encodes the seam:
seed/ # shipped with the release, READ-ONLY at runtime, version-controlled
prompts/
rules/
golden_sets/
grounding/
runtime/ # earned by operating, mutable, clearable without breaking behavior
ledger/ # decisions made (canonical audit)
memory/ # learned/earned state
drift/ # signals, incidents, governance
sessions/ # per-invocation contextseed/ # shipped with the release, READ-ONLY at runtime, version-controlled
prompts/
rules/
golden_sets/
grounding/
runtime/ # earned by operating, mutable, clearable without breaking behavior
ledger/ # decisions made (canonical audit)
memory/ # learned/earned state
drift/ # signals, incidents, governance
sessions/ # per-invocation contextThe single most important function is a scoped reset — and the guard must be a real raise, not an assert (production runs with -O` would drop the assert and silently allow a seed wipe):
def clear_state(scope: str):
if scope != "runtime": # explicit raise — `assert` is STRIPPED under python -O
raise ValueError("refusing to clear seed — seed is shipped config, not state")
for area in ("memory", "drift", "sessions"): # NOT ledger by default — it's the audit record
storage.purge(f"runtime/{area}")
# seed/** is never touched. The amnesia bug cannot happen.def clear_state(scope: str):
if scope != "runtime": # explicit raise — `assert` is STRIPPED under python -O
raise ValueError("refusing to clear seed — seed is shipped config, not state")
for area in ("memory", "drift", "sessions"): # NOT ledger by default — it's the audit record
storage.purge(f"runtime/{area}")
# seed/** is never touched. The amnesia bug cannot happen.Note even within runtime, the ledger is retained — it's the audit record from earlier; purge memory and sessions, not history. (Caveat: "clear runtime and behavior returns to seed-baseline" assumes earned memory is additive context, not load-bearing input to decisions — state that assumption for your system.)
New behavior doesn't sneak from runtime into how the system acts. You promote validated runtime signals into seed through a deliberate, reviewed step:
runtime signal (e.g. a pattern recurs, humans keep correcting X the same way)
└─▶ candidate (proposed prompt/rule/golden-case change)
└─▶ review + eval gate (does it improve quality without regressing baseline?)
└─▶ merged into seed/ → ships in the next releaseruntime signal (e.g. a pattern recurs, humans keep correcting X the same way)
└─▶ candidate (proposed prompt/rule/golden-case change)
└─▶ review + eval gate (does it improve quality without regressing baseline?)
└─▶ merged into seed/ → ships in the next releaseThe system never edits its own seed at runtime — that would be unversioned, unreviewed, irreproducible behavior change. Learning is a PR, not a side effect. The seam makes the hard questions easy: reset clears runtime only; nothing in seed changes without a reviewed, eval-gated release; anything seed-driven is reproducible given the input, while what got learned explicitly is not; and new behavior always comes from promoted runtime signals through review, never runtime → behavior directly.
The takeaway
Safety and governance at this level isn't a feature you add; it's an architecture you arrange, and the four pieces reinforce each other. Guardrails compose behind one contract and fail closed, so a single failure is a caught failure. PII is scrubbed at every boundary and hashed instead of stored, so the rich context an AI needs accumulates without the raw liability — and the output scrub is just a guardrail layer, while the ledger hash is just the boundary's ledger_view. The audit ledger is append-only, tamper-evident, and written as the final unconditional step of every decision, so "why did it do that in March?" is a one-row query with verifiable lineage. Memory is typed and scoped so one tenant's — or one user's — data can never reach another's decision, which is the structural cure for the sideways leak. And the seed/runtime seam keeps all of this resettable and reproducible, with learning flowing through review rather than mutating behavior in place. Build these in at Level 4 and the system becomes one you can stand behind under audit — instead of one you can only apologize for.
Series: Running LLM systems in production — Level 4 of 6: Safety & governance.