Your Encryption Is Already Compromised…You Just Don’t Know It Yet

It's written for security engineers who know RSA and ECDSA cold and people who are comfortable with modular arithmetic and want to understand not just what Dilithium does, but why every decision was made the way it was. By the end you'll be able to read the NIST spec, spot implementation mistakes, and explain the security proof to a colleague.

Dilithium creates signatures by picking a random masking vector y, committing to A·y, generating a challenge c from a hash of the message and commitment, responding with z = y + c·s where s is the secret key, and using rejection sampling to ensure z reveals nothing about s. Verification checks that A·z equals the expected lattice value using the public key and a hint.

Don't worry if that's opaque right now. Every piece of it will make sense by the time we're done.

Let's build that understanding layer by layer.

Part 1: Why Everything You're Using Is Broken

RSA's security is a combination lock. For a classical computer, trying every combination of a 2048-bit number takes longer than the age of the universe.

Now imagine a machine that doesn't try combinations sequentially, but tries all of them simultaneously using quantum superposition. That's Shor's algorithm. Your 2048-bit RSA lock? Broken in polynomial time on a large quantum computer.

The critical distinction: Shor's algorithm is surgical. It factors integers and solves discrete logarithms. It does nothing useful against lattice problems. And best known quantum algorithms for lattice problems (BKZ with quantum walks) are still sub-exponential parameters can be sized to resist them.

Dilithium keeps the same external contract as RSA and ECDSA:

KeyGen()  →  (public_key, secret_key)
Sign(secret_key, message)  →  signature
Verify(public_key, message, signature)  →  true / false

KeyGen()  →  (public_key, secret_key)
Sign(secret_key, message)  →  signature
Verify(public_key, message, signature)  →  true / false

The internals are completely different. The interface is identical. Your existing code that calls Sign() and Verify() needs minimal changes.

Part 2: The Mathematical Foundation

Modular Arithmetic : Everything Lives Here

Every single number in Dilithium lives in modular arithmetic mod q = 8,380,417.

This is a clock with 8,380,417 positions. After q, you wrap back to 0. Formally: a ≡ b (mod q) means q divides (a − b).

Why this specific prime? Three deliberate reasons:

1. It's prime. Every nonzero element has a multiplicative inverse. Division works. Security proofs work.

2. NTT-friendly. q = 2²³ − 2¹³ + 1 means primitive 512th roots of unity exist in Z_q. There exists ω = 1753 such that 1753⁵¹² ≡ 1 mod q but 1753^k ≢ 1 for k < 512. This is exactly what the Number Theoretic Transform requires for fast polynomial multiplication. Without it, NTT doesn't work.

3. Fits in 23 bits. Since q < ²²³, each coefficient fits in a 23-bit integer, enabling efficient packed arithmetic without overflow.

Dilithium uses two representations of coefficients mod q and confusing them is a common implementation error:

Standard form [0, q−1]: how coefficients are stored
Centered form [−(q−1)/2, (q−1)/2]: how "smallness" is measured

The value q−2 = 8,380,415 looks large in standard form. In centered form it's −2 — obviously small. When we say the secret key has "small" coefficients, we mean centered coefficients in {−2, −1, 0, 1, 2}.

The ℓ∞ norm is how Dilithium measures smallness throughout:

‖v‖∞ = max{ |v₀|, |v₁|, ..., |v₂₅₅| }

‖v‖∞ = max{ |v₀|, |v₁|, ..., |v₂₅₅| }

Checking ‖v‖∞ ≤ B means: scan all coefficients, verify each |vᵢ| ≤ B. O(n) and simple. This appears in every rejection check.

Polynomial Rings : The Efficiency Engine

Dilithium works in the polynomial ring R_q = Z_q[X] / (X²⁵⁶ + 1).

Think of it as a "polynomial clock" — just as integers wrap at q, polynomials wrap when degree reaches 256, using the rule X²⁵⁶ ≡ −1. So X²⁵⁶ = −1, X²⁵⁷ = −X, X²⁵⁸ = −X², and so on. Any degree ≥ 256 reduces to degree < 256.

One polynomial = 256 integers packed into one algebraic object.

Why does this matter? Compare:

Plain integer LWE at 128-bit quantum security: matrix is 1024×1024 = 1,048,576 elements. Key size: ~10 MB.
Module LWE with polynomials: matrix is 4×4 = 16 polynomials = 16×256 = 4096 integers. Same security. Same 256× smaller.

Dilithium2's keypair: 1312 + 2528 = 3840 bytes. A 250× size reduction from naive integer LWE.

The NTT exploits the structure of R_q to multiply polynomials in O(n log n) instead of O(n²). In practice: 2,048 operations instead of 65,536. A is always stored in NTT domain. Every A·y in signing uses NTT internally.

Lattices : The Geometric Heart

A lattice is an infinite, perfectly regular grid in high-dimensional space defined by basis vectors.

The classic 2D example: take vectors b₁ = (2,1) and b₂ = (−1,3). The lattice is all integer combinations z₁·b₁ + z₂·b₂; a skewed grid extending infinitely in every direction.

The same lattice can be described by infinitely many different bases. This asymmetry is the cryptographic security:

The Shortest Vector Problem (SVP): given a bad basis, find the shortest nonzero lattice vector. NP-hard in the worst case. Best algorithms (LLL, BKZ) take exponential time in the dimension. For n = 256, breaking it requires ~²¹²⁸ operations. No known quantum speedup beyond quadratic (Grover).

The entire security of Dilithium rests on this asymmetry: the good basis makes hard problems easy. The bad basis gives the adversary only a hard problem.

MLWE and MSIS : The Two Security Pillars

Dilithium's security reduces to two problems:

Module Learning With Errors (MLWE); protects the secret key.

Without noise, recovering s from A and A·s is just Gaussian elimination; trivial. With small noise e added: given A and A·s + e, recovering s is computationally intractable. The noise destroys the linear structure you'd exploit.

MLWE lifts this to polynomial matrices. Given public matrix A ∈ R_q^{k×ℓ} and the noisy sample t = A·s₁ + s₂ (where s₁, s₂ have small coefficients), recovering s₁ or s₂ reduces to SVP on module lattices. This is proven, not just conjectured.

Module Short Integer Solution (MSIS); ensures unforgeability.

Given A, find a nonzero short vector z satisfying A·z ≡ 0 (mod q) with ‖z‖ ≤ β. This is finding a short vector in the null-space of A. A valid Dilithium forgery requires either solving MSIS or breaking SHAKE-256 : both intractable.

MLWE → key secrecy. MSIS → signature unforgeability.

Part 3: Complete Walkthrough

Before diving in, one mental model to hold onto throughout: Dilithium is essentially a game of hide and seek with your secret key. Key generation hides it inside a hard math problem. Signing proves you know it without showing it. Verification confirms the proof without needing it.

Key Generation

Everything starts from a single 256-bit random seed ξ from your OS CSPRNG. Back this up and you can regenerate the entire keypair deterministically.

Everything in Dilithium starts from a single random number. Just 32 bytes (256 bits)pulled from your operating system's random number generator. That's it. That's the only randomness the entire key generation process needs.

This single seed is called ξ (xi), and it's deliberately the only source of entropy. Why? Because if you can reproduce the same seed, you can reproduce the entire keypair deterministically, byte for byte. Lose your seed, lose your keys. Recreate your seed, recreate your keys. This makes backups simple and key management predictable.

From ξ, Dilithium runs a cryptographic hash function (SHAKE-256) to stretch those 32 bytes into 128 bytes of output. This expanded output is sliced into three pieces: ρ (rho), ρ′ (rho-prime), and K. Think of this as one master key splitting into three specialized keys; one for building the public matrix, one for generating the secret, and one reserved for use during signing.

The first piece, ρ, is used to generate a large public matrix called A. This is a 4×4 grid of polynomials; essentially 16 arrays of 256 numbers each. A is completely public; anyone can regenerate it from ρ. Here's a subtle but important engineering decision: A is never actually stored anywhere. At roughly 11 kilobytes, it would bloat every keypair. Instead, only the 32-byte seed ρ is stored, and A is regenerated from it on demand. This is only possible because ρ is public and the verifier can do the same regeneration independently.

The second piece, ρ′, is used to sample two secret polynomial vectors: s₁ and s₂. These are the actual secrets. What makes them "secret" in a mathematical sense isn't just that they're hidden; it's that every single one of their coefficients is deliberately small, drawn uniformly from the set {−2, −1, 0, 1, 2}. This smallness is load-bearing; without it, the security proof doesn't work.

Now comes the mathematical heart of key generation. Dilithium computes:

t = A·s₁ + s₂

t = A·s₁ + s₂

This is called an MLWE sample; Module Learning With Errors. The public matrix A is multiplied by the secret s₁, and then the small "error" polynomial s₂ is added. The result t is published as part of the public key.

Why is this secure? Because recovering s₁ from A and t is believed to be computationally intractable and equivalent to solving the Shortest Vector Problem in a high-dimensional lattice. The added noise s₂ destroys the linear structure an attacker would need to simply run Gaussian elimination. Without the noise, breaking it would take milliseconds. With it, even a quantum computer has no known polynomial-time algorithm.

But t is large and each of its coefficients needs 23 bits to represent. Publishing t as-is would make the public key enormous. So Dilithium applies a clever compression step called Power2Round: each coefficient of t is split into its top 10 bits (called t₁) and its bottom 13 bits (called t₀). Only t₁ goes into the public key. t₀ which the low bits that goes into the secret key.

This is where a lot of people pause: why keep t₀ secret at all? Here's the answer. If you published both t₁ and t₀, an attacker would have the full t. And full t, combined with A, gives a noiseless linear system A·s₁ = t − s₂. Without the noise confusion, recovering s₁ is just Gaussian elimination: an O(n³) operation any laptop can do in seconds. The split is simultaneously a compression trick and a security mechanism.

Finally, Dilithium hashes the entire public key into a 64-byte fingerprint called tr. This gets stored in the secret key and will be used in every future signature to bind the signature to this specific keypair.

The result: a public key of 1,312 bytes (ρ plus the compressed t₁) and a secret key of 2,528 bytes (ρ, K, tr, s₁, s₂, and t₀).

Signing

Signing is the most intricate of the three algorithms. It uses a loop, it can fail and restart, and it involves several operations that only make sense once you understand the core challenge: how do you prove you know a secret without showing it?

The classical answer is that used in Schnorr signatures and its relatives that is masking. You cover your secret with a large random value, answer a challenge, and the math works out so that your answer proves knowledge without exposure. Dilithium does exactly this, adapted for polynomial lattices. The paradigm is called Fiat-Shamir with Aborts, and every design decision in signing flows from it.

Binding the message to the keypair: Signing doesn't begin with the message M directly. It begins by computing μ = H(tr ‖ M); a hash of the public key fingerprint concatenated with the message. This single step ensures that a signature produced for one keypair cannot be "transplanted" to validate under a different keypair, even if the message is the same. The signature is inseparably tied to both the message and the specific public key.

The masking vector (your disguise): Dilithium samples a fresh random polynomial vector y, with each coefficient drawn uniformly from a range roughly ±131,072. This is the mask. When you later compute z = y + c·s₁, the secret s₁'s contribution is at most ±78; completely buried inside y's range of ±131,072. The ratio between those two numbers (about 1,681) determines how well s₁ is hidden. The larger that ratio, the more z looks like pure noise.

The commitment (announcing where you stand). Dilithium multiplies A by y to produce a commitment w = A·y. But rather than hashing all of w, only its high bits are extracted: w₁ = HighBits(w). The low bits of w are discarded from the hash. This matters for two reasons. First, an attacker who could see all of w might use it to reverse-engineer y, which would compromise s₁. Second and more practically, the verifier never sees w at all; they'll need to reconstruct w₁ on their own, which they can do using the hint mechanism.

w = A·y   →   w₁ = HighBits(w, 2γ₂)

w = A·y   →   w₁ = HighBits(w, 2γ₂)

The challenge (the random question). The challenge c is computed by hashing the message representative μ together with the commitment w₁. This is the Fiat-Shamir transform in action: instead of an interactive protocol where a verifier sends you a random challenge, you generate the challenge yourself by hashing the data you've committed to. The result is a sparse polynomial with exactly 39 coefficients of ±1 and 217 zeros; a deliberately lightweight structure that makes subsequent polynomial multiplications cheap.

c̃ = H(μ ‖ Encode(w₁))   →   c = SampleInBall(c̃)

c̃ = H(μ ‖ Encode(w₁))   →   c = SampleInBall(c̃)

The response. With the challenge in hand, Dilithium computes the response z = y + c·s₁. This is the Schnorr pattern applied to polynomials. The response encodes your knowledge of s₁, but only indirectly, through the mask y. The math of verification will check that A·z equals the right value, which can only happen if s₁ was known when z was computed.

z = y + c·s₁

z = y + c·s₁

The rejection check (the critical safety valve). Here's where Dilithium diverges sharply from classical signature schemes. The response z is inspected coefficient by coefficient. If any single coefficient has an absolute value of 130,994 or more, the entire attempt is thrown away; y, z, the challenge, all of it, and the loop restarts from scratch with a fresh y.

IF ‖z‖∞ ≥ γ₁ − β = 130,994  → RESTART
IF ‖r₀‖∞ ≥ γ₂ − β = 95,154  → RESTART

IF ‖z‖∞ ≥ γ₁ − β = 130,994  → RESTART
IF ‖r₀‖∞ ≥ γ₂ − β = 95,154  → RESTART

Why is this necessary? Because a large coefficient in z is a signal. The challenge's contribution c·s₁ is bounded at ±78. So if z has a large coefficient, it means y had a large coefficient in that direction. An attacker who accumulates many such signatures can use that constraint to progressively narrow down what s₁ could be; eventually recovering it entirely using lattice reduction techniques. Rejection cuts off that attack at the source: every accepted z is statistically independent of s₁, carrying zero information about the secret key.

Roughly 75 - 81% of signing attempts are rejected and restarted. This sounds expensive, but each attempt is fast, and the expected number of tries before an accepted signature is only about 4 to 5.

The hint. One subtlety remains. The verifier will compute w′ = A·z − c·t₁·2^d, which equals w plus a small correction term (c·t₀ − c·s₂). That correction is small, bounded by ±78, but it can occasionally nudge a coefficient across a rounding boundary, changing what HighBits returns. The hint h is a sparse binary vector that records exactly where those boundary crossings happen. The verifier uses it to correct w′ back to the original w₁. If the total number of hint bits exceeds 80, the attempt is rejected and restarted; valid signatures almost always use fewer than 10.

h = MakeHint(−c·t₀, w − c·s₂ + c·t₀, 2γ₂)

MakeHint returns 1 if HighBits(r+z) ≠ HighBits(r), else 0

h = MakeHint(−c·t₀, w − c·s₂ + c·t₀, 2γ₂)

MakeHint returns 1 if HighBits(r+z) ≠ HighBits(r), else 0

The output. Once an attempt passes all checks, the signature is simply the tuple (c̃, z, h): the challenge hash, the response vector, and the hint. At 2,420 bytes for Dilithium2, this is about 38 times larger than an Ed25519 signature. That's the cost of quantum resistance.

σ = (c̃, z, h)  ← 2,420 bytes

c̃ (32 bytes)
The challenge hash. Verifier re-derives the challenge polynomial c from this.

z (2,304 bytes)
The response. 4 polynomials, 18 bits per coefficient. Proves knowledge of s₁.

h (84 bytes)
The hint. Sparse binary vector. Lets verifier recover HighBits(w) without ever seeing w.

σ = (c̃, z, h)  ← 2,420 bytes

c̃ (32 bytes)
The challenge hash. Verifier re-derives the challenge polynomial c from this.

z (2,304 bytes)
The response. 4 polynomials, 18 bits per coefficient. Proves knowledge of s₁.

h (84 bytes)
The hint. Sparse binary vector. Lets verifier recover HighBits(w) without ever seeing w.

Verification

Verification is deliberately simple. No loops, no randomness, no decisions; just computation and three checks. It either passes or it fails.

The verifier receives the public key (ρ, t₁), the message M, and the signature (c̃, z, h). From these, they must determine whether the signature is genuine.

Reconstructing the shared context. The verifier begins by regenerating A from ρ, the same expansion the signer use and recomputing the message representative μ = H(H(ρ ‖ t₁) ‖ M). Both of these must match exactly what the signer computed, which they will as long as the public key and message are authentic.

Recovering the challenge. The challenge polynomial c is recovered deterministically from c̃ using the same SampleInBall algorithm used during signing. Because this is a deterministic function, the verifier always gets the same c that the signer used.

The key computation. The verifier computes w′ = A·z − c·t₁·2^d. To understand why this works, trace the algebra. Since z = y + c·s₁, we have A·z = A·y + c·A·s₁ = w + c·A·s₁. And since t₁·2^d = t − t₀ = A·s₁ + s₂ − t₀, the c·t₁·2^d term evaluates to c·A·s₁ + c·s₂ − c·t₀. Subtracting, the c·A·s₁ terms cancel completely, leaving w′ = w + c·t₀ − c·s₂. The verifier has computed something that differs from the original commitment w by only a small, bounded correction, without ever knowing y, s₁, s₂, or t₀.

Using the hint. The verifier applies UseHint(h, w′) to recover w₁′. The hint encodes exactly where the small correction c·t₀ − c·s₂ shifted a coefficient across a rounding boundary. UseHint uses the hint bits to adjust w′'s high-bit representation by ±1 at those positions, recovering the original HighBits(w) = w₁. The mathematical identity that guarantees this UseHint(MakeHint(z, r, α), r, α) = HighBits(r + z, α) is the single equation that makes the whole verification scheme work.

The three checks. With w₁′ recovered, the verifier performs three checks in sequence.

The first check is ‖z‖∞ < 130,994. This confirms that the response vector z is within the bounds imposed by honest rejection sampling. A forger constructing z freely to satisfy the verification equation has no incentive to keep all coefficients within this range, genuine signatures always satisfy it; forged ones very likely don't.

The second check is ‖h‖₁ ≤ 80. This limits the total number of hint bits. A forger who submits a dense hint vector can make UseHint adjust many values, creating too much freedom to manufacture a matching w₁′. Keeping the hint sparse closes that door.

The third check is the decisive one: c̃ must equal H(μ ‖ Encode(w₁′)). The verifier recomputes the challenge hash from scratch using the recovered w₁′ and the message. If w₁′ is genuinely equal to the original w₁, which happens if and only if the signer knew s₁ and produced z and h honestly the hash matches. If the signature is forged or tampered with, the hash will differ with overwhelming probability. The chance of a random forgery passing this check is 2⁻²⁵⁶: effectively zero.

If all three checks pass, the signature is accepted. The entire verification requires one matrix-vector multiplication, one application of UseHint, and one hash comparison; no secrets, no randomness, no loops.

Part 4: How We Actually Know It's Secure

"Nobody has broken it yet" is not a security proof. Dilithium has formal proofs.

EUF-CMA: The Security Model

The game: a challenger runs KeyGen and gives the adversary the public key. The adversary can request signatures on any messages they choose (a "signing oracle"). After observing as many (message, signature) pairs as they like, they must forge a valid signature on a new message.

Dilithium is EUF-CMA secure if no efficient adversary wins this game with non-negligible probability (< 2⁻¹²⁸). Same security level targeted by RSA-3072 and ECDSA P-256 classically, except Dilithium maintains it against quantum adversaries.

The Security Reduction

The proof logic: if an adversary A can forge Dilithium signatures, we use A to solve MSIS. Since MSIS is hard, A can't exist.

Assume A forges with probability ε.
Apply the Forking Lemma (Bellare-Neven 2006): run A twice with the same randomness but different hash oracle responses at one critical query. With probability ε²/Q_H, both runs produce valid forgeries (c̃, z, h) and (c̃', z', h') on the same message with the same commitment w₁ but different challenges c ≠ c'.
Extract MSIS solution: from the two verification equations, subtracting gives A·(z − z') = (c − c')·t₁·2^d. With careful norm analysis, (z − z') becomes a short vector z with A·z ≈ 0, a valid MSIS solution.
Contradiction: MSIS is hard, so ε must be negligible. A cannot forge.

Part 5: What You Actually Need to Do

The math is settled. NIST finalized ML-DSA (FIPS 204) in August 2024. The algorithm isn't experimental, it's a published standard, and the clock on migration has already started.

The threat timeline matters here. Quantum computers capable of running Shor's algorithm at cryptographic scale don't exist today. But "harvest now, decrypt later" attacks do: adversaries are collecting encrypted traffic right now, betting they'll be able to decrypt it once the hardware catches up. For signatures specifically, the risk is slightly different and you can't retroactively forge a past signature, but any long-lived signing key (code signing certificates, PKI roots, firmware signing) that you intend to use past the mid-2030s is a migration target now.

What changes in your codebase is less than you think. Dilithium preserves the same KeyGen / Sign / Verify interface you already call. The migration surface is key sizes (public key 1,312 bytes vs 64 bytes for Ed25519), signature sizes (2,420 bytes vs 64 bytes), and anywhere you've hardcoded assumptions about those sizes; buffer lengths, database column widths, protocol message formats. Find those assumptions first; they'll take longer to fix than swapping the algorithm.

The rejection sampling loop and the hint mechanism aren't bugs to work around, they're the security. If you're implementing Dilithium rather than calling a library, the single most dangerous mistake is removing or weakening the rejection checks to speed things up. The bounds ‖z‖∞ < γ₁ − β and ‖h‖₁ ≤ ω are load-bearing. Every accepted signature leaking even a single bit of information about s₁ is recoverable via lattice reduction given enough signatures.

Use a vetted library. In order of preference: liboqs (Open Quantum Safe), the reference implementation from the CRYSTALS team, or BouncyCastle's post-quantum module. Do not roll your own. Constant-time polynomial arithmetic, correct centered reduction, and safe NTT implementations are the kinds of things that look correct and aren't.

The bigger picture: Dilithium isn't a drop-in speed upgrade; it's a deliberate trade. You pay 38× in signature size and ~3–5× in signing time compared to Ed25519, and you get a scheme whose security reduces to a problem that has resisted three decades of cryptanalysis and has no known quantum speedup beyond sub-exponential. For most systems, that's a trade worth making now rather than under pressure later.

The math has been done. The standard exists. The libraries are ready. The remaining work is yours.

References

Specification: NIST FIPS 204. This is the normative reference.

Original paper: Ducas et al., "CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme" (TCHES 2018); best source for design rationale and security proof.

Security proof foundations: Lyubashevsky, "Lattice Signatures Without Trapdoors" (Eurocrypt 2012); the origin of Fiat-Shamir with Aborts.

Lattice crypto survey: Peikert, "A Decade of Lattice Cryptography" (2016); comprehensive SVP, LWE, SIS coverage with full proofs.

Quantum security: Kiltz-Lyubashevsky-Schaffner (2018); the Q-EUF-CMA proof in the QROM.

Contents