As systems evolve into distributed architectures — microservices, event-driven pipelines, and shared data platforms — the location of sensitive information becomes increasingly opaque. In practice, most organizations don't struggle with storing data securely; they struggle with knowing exactly where sensitive data exists across hundreds of schemas and services.

We faced this exact problem: PII was scattered across databases, inconsistently classified, and often unknown outside of individual development teams. This created a structural risk — because without visibility, consistent encryption and access control cannot be enforced reliably.

We needed more than guidelines. We needed a systematic way to inventory, classify, and continuously govern PII fields across the enterprise.

Defining the Problem: PII as an Invisible Asset

Personally Identifiable Information (PII) refers to any data that can directly or indirectly identify an individual.

This includes:

  • Direct identifiers (e.g., SSN, passport number, email, phone number)
  • Quasi-identifiers (e.g., ZIP code, device ID, partial identifiers)
  • Behavioral or contextual attributes that can contribute to re-identification

In large systems, PII rarely exists in isolation. It is embedded inside domain models, replicated across services, and propagated into analytics pipelines.

The real challenge was not classification itself — it was lack of a consistent enterprise-wide inventory mechanism.

Without it:

  • Encryption decisions were inconsistent across teams
  • Sensitive fields were discovered late (often during audits or incidents)
  • Security enforcement was reactive rather than systemic

Our Design Principle: Treat PII as a Managed Enterprise Asset

We approached the problem with a simple principle:

You cannot secure what you cannot inventory.

This led us to design a structured PII field inventory model with three goals:

  1. Standardize classification of data fields
  2. Make classification discoverable programmatically
  3. Enforce governance for all schema changes

Step 1: Enterprise-Wide PII Classification Model

We introduced a classification system that separated PII into two practical categories:

High PII (Directly Sensitive Data)

High PII refers to fields that can directly identify an individual or cause significant harm if exposed.

Examples:

  • Government identifiers (SSN, passport number)
  • Financial data (credit card, bank account numbers)
  • Authentication secrets or tokens
  • Full identity attributes (name + identifier combinations)

Security requirement:

  • Mandatory encryption at rest
  • Strict access control policies
  • Masking in logs and responses

Low PII (Contextual or Indirect Identifiers)

Low PII refers to fields that are not uniquely identifying on their own but may contribute to identification when combined with other attributes.

Examples:

  • First name or partial name
  • City, state, or region
  • Partial phone number
  • Device or session metadata

Key architectural insight: Low PII is not inherently safe — it becomes sensitive in combination.

Critical Design Rule (Emergent Risk Model)

During analysis, we identified a compounding risk pattern:

When a table contains multiple Low PII fields, the combined dataset becomes re-identifiable.

To address this, we introduced a rule:

  • If a table contains 3 or more Low PII fields, the table is automatically treated as High PII classification

This effectively converted combinatorial risk into deterministic policy enforcement.

Even in cases where this threshold was not met, encryption of Low PII remained a recommended practice based on business sensitivity.

Step 2: Systematic PII Discovery Through Schema Intelligence

We then performed an enterprise-wide schema analysis across databases and services.

This was not a one-time audit — it was a structured discovery process:

  • Database schema extraction
  • Application data model mapping
  • API payload inspection

Each field was evaluated and classified into:

  • High PII
  • Low PII
  • Non-PII

The output of this phase was a normalized PII inventory dataset, reviewed collaboratively by engineering and security teams.

This became the foundational layer of our governance model.

Step 3: Centralized but Restricted PII Inventory System

Once classification was completed, we created a centralized repository for PII metadata.

Key design decisions:

  • Stored in an internal, access-restricted documentation platform
  • Version-controlled for auditability
  • Searchable by service, schema, and field name
None
Example

This became the authoritative system of record for PII classification across the enterprise.

However, we quickly identified a limitation: documentation alone does not scale with engineering velocity.

This led to the next evolution.

Step 4: Making PII Inventory Programmable

To operationalize classification, we embedded PII metadata directly into database schemas.

We leveraged column-level annotations as a lightweight but powerful mechanism for machine-readable classification.

Example:

ALTER TABLE user_profile 
MODIFY email VARCHAR(255) 
COMMENT 'PII: HIGH | encryption_required=true | masking_required=true';

ALTER TABLE user_profile 
MODIFY city VARCHAR(100) 
COMMENT 'PII: LOW';

This enabled:

  • Automated discovery of sensitive fields
  • Integration with static analysis tools
  • Runtime enforcement in data pipelines and services

This was a key architectural shift — from documentation-driven security to metadata-driven enforcement.

Step 5: Modern Extension — PII as a Governance Service

To further scale the model, we evaluated modern patterns used in large enterprises:

  • Centralized data catalogs
  • Policy-as-code enforcement
  • Metadata-driven security engines

The most scalable evolution of this model is a PII Classification Service, exposing APIs such as:

  • GET /pii/classification/{table}
  • GET /pii/fields/{service}
  • GET /pii/risk-level/{field}

This enables:

  • CI/CD validation of schema changes
  • Automated enforcement of encryption policies
  • Cross-system consistency in classification decisions

In this model, PII becomes a queryable security primitive, not static documentation.

Step 6: Governance for Schema Evolution

A critical gap we addressed was schema drift — new fields being added without security review.

We introduced a mandatory change control workflow:

  • Any new or modified database field requires a service request (JIRA-based workflow)
  • The request must include:
  • PII classification
  • Justification of data collection
  • Encryption and retention decisions
  • Security review is required before approval

This ensured that:

No data field enters production without explicit classification and governance review.

Continuous Validation and Review

To prevent classification decay over time, we implemented periodic reviews:

  • Monthly for high-risk domains
  • Quarterly for general systems

This ensured:

  • New services were correctly classified
  • Existing classifications remained accurate
  • Security posture evolved with system changes

Impact

This approach fundamentally changed how we managed sensitive data:

  • Established a single source of truth for PII across enterprise systems
  • Standardized encryption decisions across engineering teams
  • Reduced inconsistent handling of sensitive fields
  • Improved audit readiness and regulatory compliance

More importantly, it shifted the organization from a reactive posture:

"We secure data when we discover it"

to a proactive model:

"We know exactly where sensitive data exists, at all times, across all systems"

Final Reflection

The key realization was that PII management is not a documentation problem — it is a systems design problem.

Once PII is treated as a first-class, queryable, governed entity, it becomes possible to enforce security consistently at scale without relying on manual reviews or fragmented team practices.