June 9, 2026
The code was right. We couldn’t read it.
There is a phrase I keep hearing on engineering teams now, including my own.
Matt Whetton
7 min read
We didn't really check it. But it works.
A year ago that sentence would have stopped me. Now it slides past. Something shifted in the last six months or so. The oldest safety net in software, the fact that a human could always read the code, stopped holding. Not because it became impossible to read. Because skipping it became acceptable. Nobody decided it should. It just quietly stopped being the rule, one reasonable decision at a time.
I want to take that seriously, because I don't think it's a discipline problem. I think it's a new abstraction we accepted without applying the test we applied to every previous one.
One distinction first, because it turns out to be the whole point. The code was not unreadable. Any single piece of it you could open and follow. What we lost was the ability to read the system as a whole, to look at it and understand what it was doing. The parts stayed legible. The shape they made together did not.
What we used to be able to check
Engineering has always traded away work it no longer needed to do by hand. We stopped writing assembly. We stopped managing memory. We stopped reading the machine code the compiler produced. Each time we gave up some visibility and got a good deal in return.
Every one of those trades had the same property underneath it. The thing we stopped checking was safe to stop checking. A calculator returns the same answer every time. A compiler is deterministic, and if you ever needed to, you could inspect what it did and follow the logic. We stopped looking not because looking became impossible but because it became unnecessary. The work below the abstraction was settled, checked once, properly, behaving the same way forever after.
That is the quiet condition that made it all work. We removed work we no longer needed to check.
AI removes work we still need to be able to check.
Not all of it. Plenty of what an agent produces you can let go of safely, the tests, the migrations, the glue, the same way you let go of the assembly. The trouble is the part you still need to understand, and it does not announce itself as different from the rest.
It does not fail loudly either. A compiler that breaks tends to break visibly. AI-generated code that is wrong usually looks exactly like code that is right. It compiles, it passes the obvious tests, it reads as something a competent engineer would have written. The wrongness is in the judgement, not the syntax, and judgement does not show up in a diff.
There is also a lot of it, arriving at a volume no review process was built to absorb. So we stop checking, for the rational reason that there is too much and most of it is fine. What we are left with is a new abstraction with none of the properties that made the old ones safe to stop looking.
The pattern that looked safe
Here is where it bit us.
We were building something routine. Our core system raises events, and we sync those events and various data attributes out to a third-party platform that handles customer messaging. The pattern for doing this is reproducible. You wire up one event, prove it works, then repeat the shape for all the others.
There were a lot of others. Dozens of events, each a minor variation on the same idea. So we did the sensible thing. We built one properly, confirmed the pattern held, and let an agent complete the rest. Because each case was so similar to the one before it, and because the first was correct, we did not review the rest in any detail.
Every individual piece was correct. The agent replicated the pattern faithfully. Each event synced, each attribute landed where it should. If you had reviewed any single one of them you would have signed it off.
But the agent never consolidated. A human building the tenth, then the twentieth, then the fortieth version of the same thing starts to feel the weight of it and reaches for a central place to route it all through. That instinct is not tidiness. It is what you do when you know you will have to come back and understand this later. The agent has no later. It produced the fortieth case as readily as the first, wired up exactly where it sat. The result was that our event syncing was scattered across the entire system, with no single place you could look to understand it.
Nothing was broken. Everything worked. And we could no longer answer a basic question by inspection. What are we sending out of here, and when?
Legibility is a property of the whole
This is the distinction I keep coming back to. Correctness is local. Legibility is global.
You can verify any single change and still miss that the system as a whole has stopped being explainable. No correct decision protects the legibility of the whole, because the whole is never any single task's problem. It was nobody's job to consolidate, so nobody did, and each correct piece made the overall picture slightly harder to read.
Then legibility stopped being abstract. When we came back weeks later to add more events, some were firing at times that made no sense. Debugging meant reconstructing by hand what was being sent from where, the exact thing the scattered structure had made hard. In the course of that we found the second issue. The agent had made wrong assumptions about which events should be sent when, misreading the business context in a way no single diff would show.
The misrouting was the real bug. But the illegibility is what let it hide. If everything had run through one pipe, the wrong events would have been sitting there in plain sight. Legibility is not a nicety you add once the system works. It is the thing that makes the next problem findable.
Why a human would have consolidated
It is tempting to call that consolidating instinct good engineering. I don't think that's quite it. I think it is self-interest with a long time horizon.
When you build something you know you will own, you build it differently. You will be the one debugging it late at night, extending it in six months, answering for it when someone asks why it behaves the way it does. That future shapes the work. You make the system comprehensible because you are the one who will have to comprehend it. Legibility is not a virtue you bring to the code. It is what anticipated ownership produces.
This is part of why outsourcing a build is so hard to get truly right. The people writing the code are not the people who have to live in it, so the incentive that quietly keeps a system legible is missing, and no process fully replaces it.
An AI is the purest version of that. It has no stake in the codebase it has to maintain next year, because there is no version of it that will be there next year. Every visit is a cold start. It does not get frustrated untangling something it wrote, it just untangles it, every time, at no cost to itself. You can see the absence in how these tools respond when you point out a mess. They agree at once, fix the instance, and carry nothing into the next file. Nothing pulls them toward comprehensibility, because nothing makes them pay for the lack of it.
Putting the incentive back by hand
So what do you do about a property that used to come for free and has stopped?
Not read every line. That boundary is gone and it is not coming back, and pretending otherwise is nostalgia. Most generated code can go unread, the same way you do not read the assembly. The line was never how much you read. It is what you keep legible. The shape of how data moves, the places business intent is encoded, the decisions that would be expensive to reconstruct. The work now is to protect those deliberately, because the thing writing the code will not.
The first move is ownership. Put a named human on the hook for explaining a system, not just for shipping it. Owners build legibly because they expect to be asked why it does what it does, so re-create that expectation and you re-create most of the instinct. It is also why, on the build itself, the human owns the structure the volume hangs off. You build the central pipe, the agent wires each case into it. Not because design-first is a virtue, but because the thing producing the volume has no reason to converge on a single shape.
The second is to keep it explainable, and to treat that as a real property of the system, not a nice-to-have. If nobody can say why a piece of code works, that is a defect, even when the code is correct. Fix illegibility when you spot it, the way you would a broken window, without making a ceremony of it. And on the parts where a wrong answer is expensive, do the old-fashioned thing. Sit people down and have someone explain what a piece of code is doing, out loud, from the code. If they cannot, that is the finding.
The third is to stop optimising only for the agent. There is a fashionable idea that the goal is an agent-friendly codebase. It is half right. The code has two readers now, the agent and the human who has to answer for it. The goal is not a codebase AI finds easy to work in. It is one that stays comprehensible to a person after AI has touched every part of it. Optimise only for the first reader and you have shipped the exact problem this piece is about.
What it costs, and when
We caught ours. It cost us a confusing afternoon and a refactor, not an incident. But it was luck dressed as a near miss.
What I keep coming back to is that the AI did not create this problem. It removed the friction that used to warn us about it. Writing the fortieth near-identical thing by hand was irritating, and the irritation was the signal to stop and consolidate. We used to credit the result to judgement. A lot of it was just people refusing to be annoyed a forty-first time.
Two things used to keep systems legible almost for free. The near-term friction of doing repetitive work by hand, and the long-term fact that someone would have to live with the result. The agent feels neither. It is never annoyed, and it never comes back.
So legibility does not look after itself any more. It is a discipline you protect deliberately, or you watch it leave one reasonable decision at a time.