How I recognize I am solving for the wrong thing

This is my process for figuring out whether I'm solving for the right thing, and walking through a real example seems more useful than…

Halecolin

~5 min read · April 25, 2026 (Updated: April 25, 2026) · Free: Yes

This is my process for figuring out whether I'm solving for the right thing, and walking through a real example seems more useful than describing it abstractly. So this is one feature I shipped a few years ago, and what the process actually looked like inside it.

A fintech app I was working on needed to render full news articles inline, pulled from several external sources at the moment a user opened them. Three of the four sources returned the full hostile-web payload — ads, third-party frames, tracking, inline scripts, malformed markup — and one returned plain text. Browser-side fetching, browser-side rendering, next to a financial UI carrying a real session. I hadn't worked on browser security before. The brief was, effectively, make this safe.

The first thing I did was decide what "safe" had to mean before I touched any code. That's the part of the work I think actually matters, and it's the part I want to talk about.

Defining what success has to mean

The way I approach work generally is to figure out what would count as the thing being done before I start building it. Not in the loose "what are we trying to accomplish" sense, but a definition specific enough that I can hold a candidate solution up against it and get a clear verdict. The point of having that definition early is so the rest of the work — the build, the testing, the validation — has a fixed target to point at.

For this feature, the definition I settled on was: I should be able to explain, to myself, why the design is safe. A positive case, not "I haven't found a problem with it." If I can't construct that explanation, the feature isn't done.

The reason that phrasing matters is that it forces validation to produce something specific. "Is it safe enough" is unanswerable. "Can I make the case for why it's safe" is answerable, and the answer is either yes or no. When the answer is no, I have to do something about it.

Building the obvious thing carefully

The first approach I took was to sanitize the content. Some of that was domain inexperience — I didn't have a deep background in browser security, but I knew sanitization was something people did for this kind of problem, and it seemed like a reasonable place to start. The rest was that the same content needed cleaning anyway for cosmetic reasons. The sources were sending ads, third-party frames, layout junk that didn't belong in the app, CSS that would have collided with the app's own styles. I was already going to be doing a cleaning pass on the way in. Folding safety into that same pass felt like the natural shape — one pipeline, doing one job thoroughly, instead of two separate concerns layered on top of each other.

So I built it. I researched everything I could find about hostile browser content — what gets injected, what tricks people use, what categories of markup and inline construction need to be stripped or normalized — and built a cleaner against all of it. Unit tests covering every condition I could find documented anywhere. Then I started validating against real source content, looking for anything that shouldn't have made it through.

Five hours into validation, I found a bypass. Construction I hadn't anticipated, passed cleanly through the cleaner, would have rendered as executable in the browser. I patched it.

Next day, halfway through another validation pass, another one. Different mechanism, same outcome.

I patched again and kept validating. About a day later, a third.

Going back to what I was actually solving for

When validation kept producing bypasses, the obvious move was to keep patching. Each individual one was a real bug with a real fix. But what was showing up wasn't a list of independent bugs. It was a pattern, and the way I deal with patterns is to fix them at the pattern level, not the instance level. Fixing at the instance level just lets the same shape come back in a new costume. So instead of writing a fourth patch, I went back to the question I'd started with — what does it mean for this feature to be safe — and tried to construct the positive case from scratch. And the moment I tried to actually write out the chain of reasoning from "the cleaner runs" to "the user is safe," I noticed it didn't connect. The cleaner's job is to remove things that look dangerous in the markup. But the property I needed was about what the rendered content was allowed to do in the browser. Those aren't the same property. A perfectly clean cleaner doesn't get me what I need, because cleanliness of markup isn't what determines whether content can do damage. Privilege is.

I'd been treating this as a content cleanliness problem. It was a browser security problem. They're different domains, and I'd been working in the wrong one.

This is the move I think actually matters in the work, and it's smaller than it sounds. It's not a sudden insight. It's a habit of going back to the problem statement when the evidence starts not adding up, and re-deriving the approach from there instead of continuing forward from where you happen to be.

The actual answer to the actual problem

Once the question became "how do I prevent the content from having privilege" instead of "how do I make the content clean," the design space looked completely different. The browser already has a mechanism for restricting privilege: sandboxed iframes, ideally on a different origin, with sandbox flags that strip the capabilities that would let the content reach the parent. The content can do whatever it wants inside the frame. It just can't touch anything outside.

This is a structurally different kind of safety than a cleaner provides. The cleaner approach makes me responsible for proving the content is safe. The sandbox approach leaves that responsibility to the experts of the domain — the browser engineers who design and harden the sandbox boundary as their actual specialty. The first approach requires validating against the entire creative output of the adversarial web. The second is a property the browser enforces, on a boundary that's already been hardened by the people best positioned to harden it. The trust surface drops by orders of magnitude, and it drops onto something far better defended than anything I was going to build.

I kept the cleaner. It just stopped being a safety tool. It became a cosmetic one — strip the ads, strip the trackers, strip the visual junk so the rendered article looks like it belongs in the app instead of looking like a 2008 news site. That's a job a cleaner can do well, because being almost always right is fine for cosmetics. It's only fatal when you mistake it for a safety boundary.

What I think this generalizes to

The technical lesson here is narrow. The thing I find more broadly useful is the move that surfaced it: when validation against your own definition of done is producing failures with a recognizable shape, the work isn't on the failures, it's on the shape. Going back to what you're actually solving for is what gives that shape somewhere to resolve.

The important part, for me, is to be considering the pattern of the problems that arise and making sure I account for the pattern rather than the instance.

#software-engineering #programming #software-architecture #engineering-mangement #web-security

< Go to the original