June 13, 2026
PII Masking in Go: Why Regex Fails and What Actually Works
A production-tested, byte-level masking middleware that catches sensitive data before it leaks — no regex, no handler changes.
Opalski
5 min read
Pager goes off at 2 AM. Developer does what any of us would do — kubectl logs --tail=500, finds the error, copies the output, posts it in Slack. Problem solved, right? Not quite. That Slack message contained raw credit card numbers. Sixteen digits, plain as day, in a channel with 200 people. PCI-DSS violation, PDP breach, and a very uncomfortable meeting with legal.
All because of one innocent-looking line:
INFO: payment processed | credit_card=5212345678901234 | amount=150000INFO: payment processed | credit_card=5212345678901234 | amount=150000You almost never find PII leaks during development. You find out during an audit. Or when a customer complains. Or when a security researcher DMs you.
Stop Using Regex for Sensitive Data Masking
Your first instinct is regex. It was mine too. You've likely written this exact function before:
var cardRegex = regexp.MustCompile(`(\d{4})\d{8,12}(\d{4})`)
func maskCreditCard(log string) string {
return cardRegex.ReplaceAllString(log, "$1****$2")
}var cardRegex = regexp.MustCompile(`(\d{4})\d{8,12}(\d{4})`)
func maskCreditCard(log string) string {
return cardRegex.ReplaceAllString(log, "$1****$2")
}Looks clean. Simple PR, quick review, merged. But here's what actually happens when that code runs:
Regex is 18x slower with 17x more allocations. At scale — and I mean anything above a few hundred requests per second — that difference is literal server cost. At 1,000 req/s, you're burning CPU cycles making garbage for the GC to clean up. Every. Single. Request.
But here's the kicker: performance isn't even the real problem. Regex has zero context awareness. That \d{16} pattern doesn't know the difference between a credit card, a transaction ID, or a timestamp. It just sees sixteen digits and nukes them. Congratulations, now half your logs are unreadable and debugging takes twice as long.
Your team's regex-based solution:
INFO: payment success | order_id=5212****1234 | ts=167890****5678INFO: payment success | order_id=5212****1234 | ts=167890****5678Good luck debugging with that.
Regex solves the wrong problem. It's a post-processing hack, not an architecture decision.
Mask at the Boundary, Not at the Output
Here's the mindset shift: mask data as close to the source as possible.
Think of it like this — you wouldn't install a water filter at your neighbor's house. You install it at the pipe entering your own home. Same goes for sensitive data in logs.
If you only mask at the output layer, here's every single place your data can still leak:
- Log files rotated to S3 without server-side encryption
- ELK or Loki aggregators storing raw, unmasked payloads
- Alert webhooks firing with full JSON bodies to PagerDuty or Slack
- Developers running
tail -fon production instances (yes, you do this too)
The fix? Move masking to the middleware layer. One config, one place, before data ever reaches an output destination. It's the only way to guarantee coverage.
The Implementation: 3 Layers, Zero Pain
This is what we run in production. Three layers, each with a single job. No magic, no overengineering.
Layer 1: Configuration
Just a struct. Declarative. You say what to mask and how.
type MaskConfig struct {
Field string
Type MaskType // HideMask or PartialMask
ShowFirst int
ShowLast int
}type MaskConfig struct {
Field string
Type MaskType // HideMask or PartialMask
ShowFirst int
ShowLast int
}Five lines. That's the entire config layer.
Layer 2: The Engine
Byte-level traversal. No regex. No intermediate string allocations. JSON payloads arrive as []byte in Go, so we work with them directly.
func ApplyMask(payload []byte, masks map[string]MaskConfig) []byte {
masked := make([]byte, len(payload))
copy(masked, payload)
for field, cfg := range masks {
masked = maskField(masked, field, cfg)
}
return masked
}func ApplyMask(payload []byte, masks map[string]MaskConfig) []byte {
masked := make([]byte, len(payload))
copy(masked, payload)
for field, cfg := range masks {
masked = maskField(masked, field, cfg)
}
return masked
}The engine scans for "field_name": patterns byte-by-byte, locates the value boundaries, and mutates the slice in-place. Skipping the string() cast alone saves thousands of allocations per second at scale.
Layer 3: Middleware
This is the magic. Other developers on your team don't change a single line of their code.
func MaskingMiddleware(masks map[string]MaskConfig) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
mrw := &maskingResponseWriter{
ResponseWriter: w,
masks: masks,
}
next.ServeHTTP(mrw, r)
})
}
}func MaskingMiddleware(masks map[string]MaskConfig) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
mrw := &maskingResponseWriter{
ResponseWriter: w,
masks: masks,
}
next.ServeHTTP(mrw, r)
})
}
}Drop it in main.go, and every HTTP response is automatically masked. Your team keeps writing handlers. They don't think about masking. They don't need to.
Complete Gin Example: Middleware → Handler → Output
Here's the end-to-end implementation with Gin. The middleware, response writer, handler, and wiring — everything in one compact block. Layer 1 (Config) and Layer 2 (Engine) stay the same from above.
func MaskingMiddleware(masks map[string]MaskConfig) gin.HandlerFunc {
return func(c *gin.Context) {
w := &bufferedWriter{ResponseWriter: c.Writer, body: new(bytes.Buffer), masks: masks}
c.Writer = w
c.Next()
masked := ApplyMask(w.body.Bytes(), w.masks)
w.ResponseWriter.Header().Set("Content-Length", strconv.Itoa(len(masked)))
w.ResponseWriter.WriteHeader(w.status)
w.ResponseWriter.Write(masked)
}
}
type bufferedWriter struct {
gin.ResponseWriter
body *bytes.Buffer
masks map[string]MaskConfig
status int
}
func (w *bufferedWriter) WriteHeader(c int) { w.status = c }
func (w *bufferedWriter) Write(d []byte) (int, error) { return w.body.Write(d) }
func paymentHandler(c *gin.Context) {
c.JSON(200, gin.H{
"status": "success", "credit_card": "5212345678901234",
"cvv": "123", "customer": "Budi Santoso",
})
}
func main() {
r := gin.Default()
r.Use(MaskingMiddleware(map[string]MaskConfig{
"credit_card": {Type: PartialMask, ShowFirst: 4, ShowLast: 4},
"cvv": {Type: HideMask},
}))
r.POST("/payment", paymentHandler)
r.Run()
}func MaskingMiddleware(masks map[string]MaskConfig) gin.HandlerFunc {
return func(c *gin.Context) {
w := &bufferedWriter{ResponseWriter: c.Writer, body: new(bytes.Buffer), masks: masks}
c.Writer = w
c.Next()
masked := ApplyMask(w.body.Bytes(), w.masks)
w.ResponseWriter.Header().Set("Content-Length", strconv.Itoa(len(masked)))
w.ResponseWriter.WriteHeader(w.status)
w.ResponseWriter.Write(masked)
}
}
type bufferedWriter struct {
gin.ResponseWriter
body *bytes.Buffer
masks map[string]MaskConfig
status int
}
func (w *bufferedWriter) WriteHeader(c int) { w.status = c }
func (w *bufferedWriter) Write(d []byte) (int, error) { return w.body.Write(d) }
func paymentHandler(c *gin.Context) {
c.JSON(200, gin.H{
"status": "success", "credit_card": "5212345678901234",
"cvv": "123", "customer": "Budi Santoso",
})
}
func main() {
r := gin.Default()
r.Use(MaskingMiddleware(map[string]MaskConfig{
"credit_card": {Type: PartialMask, ShowFirst: 4, ShowLast: 4},
"cvv": {Type: HideMask},
}))
r.POST("/payment", paymentHandler)
r.Run()
}Output — raw response body (what hits the wire):
{"credit_card":"5212****1234","cvv":"***","customer":"Budi Santoso","status":"success"}{"credit_card":"5212****1234","cvv":"***","customer":"Budi Santoso","status":"success"}The handler called c.JSON(gin.H{...}) with full plaintext data. The middleware masked credit_card (partial) and cvv (fully hidden) before the response ever left the server. The handler's code is completely clean — no masking calls, no special structs, no mask_credit_card() helpers sprinkled everywhere.
This is the boundary-first approach in practice: mask once, at the middleware, and every handler is protected by default. New team members don't need to know it exists.
The Edge Cases Everyone Misses
Three things that'll bite you if you're not careful:
Nested JSON. That credit_card in data.payment.credit_card? Your flat bytes.Index won't find it. You need token-based parsing with json.Decoder that tracks depth.
URL-encoded bodies. Payment gateways send callbacks as application/x-www-form-urlencoded. If your middleware only intercepts JSON content types, you've got a blind spot.
Headers. Authorization, X-Api-Key, X-Msisdn — nobody masks headers. But they contain the same sensitive data your body does.
And here's the testing approach:
func TestApplyMask(t *testing.T) {
tests := []struct {
name string
input string
masks map[string]MaskConfig
want string
}{
{
name: "partial mask credit card",
input: `{"credit_card":"5212345678901234","name":"Budi"}`,
masks: map[string]MaskConfig{
"credit_card": {Type: PartialMask, ShowFirst: 4, ShowLast: 4},
},
want: `{"credit_card":"5212****1234","name":"Budi"}`,
},
{
name: "hide mask password",
input: `{"password":"rahasia123"}`,
masks: map[string]MaskConfig{
"password": {Type: HideMask},
},
want: `{"password":"**********"}`,
},
{
name: "short value doesn't panic",
input: `{"pin":"12"}`,
masks: map[string]MaskConfig{
"pin": {Type: PartialMask, ShowFirst: 1, ShowLast: 1},
},
want: `{"pin":"**"}`,
},
}
// Table-driven tests — no narration needed
}func TestApplyMask(t *testing.T) {
tests := []struct {
name string
input string
masks map[string]MaskConfig
want string
}{
{
name: "partial mask credit card",
input: `{"credit_card":"5212345678901234","name":"Budi"}`,
masks: map[string]MaskConfig{
"credit_card": {Type: PartialMask, ShowFirst: 4, ShowLast: 4},
},
want: `{"credit_card":"5212****1234","name":"Budi"}`,
},
{
name: "hide mask password",
input: `{"password":"rahasia123"}`,
masks: map[string]MaskConfig{
"password": {Type: HideMask},
},
want: `{"password":"**********"}`,
},
{
name: "short value doesn't panic",
input: `{"pin":"12"}`,
masks: map[string]MaskConfig{
"pin": {Type: PartialMask, ShowFirst: 1, ShowLast: 1},
},
want: `{"pin":"**"}`,
},
}
// Table-driven tests — no narration needed
}Negative tests matter here: empty payloads should not panic. Missing fields should not panic. Unicode with + signs should work. Fields with similar names like password vs password_confirmation should be treated independently.
The Numbers That Matter
Real benchmark on a 4-core server, 2KB JSON payload with 5 sensitive fields:
Byte-level overhead: 7.5%. Regex overhead: 37%. At 10 million requests per day, that 7.5% costs you basically nothing. The regex approach? That's an extra 21 hours of CPU time. Every day. Translate that to your cloud bill.
Masking isn't glamorous. Nobody files a "Great Job" JIRA ticket when you deploy a middleware. No standup applause.
But when the auditors come and they review your logging pipeline and find zero PII leaks — that's when you know the invisible work paid off. That's the quiet satisfaction of building something that nobody notices because it never failed.
Your production logs are probably leakier than you think. Go check them before someone else does. And if you've found edge cases I haven't covered — I'd genuinely like to hear about them.