June 11, 2026
We Tested Many Cases. The Critical Bug Still Escaped.
The message came through on a Monday at 8:52, tagged urgent, from a support agent who had already been on the phone for twenty minutes.
Martin Cáceres
5 min read
A logistics company — one of the larger accounts — had placed an order the previous Friday afternoon. The confirmation email arrived. The order processed correctly in every system the team monitored. But when the customer's finance team reconciled invoices on Monday morning, the number was wrong. The platform had been running a week-long promotional code for a new product category. The customer also had a standing loyalty discount on their account, applied automatically at checkout. The API had combined both. The final invoice reflected a price that nobody on the team had intended to offer, and when the support agent asked the customer to describe what they had done, the answer was: nothing unusual. They had placed an order the way they always did.
That was the part that stayed with the QA engineer for the rest of the week. The customer had not exploited a loophole or done something unexpected. The system had done exactly what it was built to do. The problem was that nobody on the team had fully understood what it was built to do.
— -
The Question That Arrives First
By mid-morning, the senior developer had reproduced the issue. Two lines of validation logic were missing from the discount application service. The fix was deployed before lunch. The customer was contacted, the invoice was corrected, and by mid-afternoon the team was sitting in a brief retrospective.
Someone asked the question that was already in the room.
"We ran over three hundred cases for this release. How did this get through?"
It did not sound like blame. It sounded like a reasonable response to a major account calling support on a Monday. But the question carried a specific assumption inside it: that this kind of failure is caused by a missing test case, and that the solution is to find the gap and fill it. That assumption is accurate in a narrow set of situations — when a known scenario was dropped from the execution plan, or when regression was compressed to hit a deadline. It does not hold when the team tested everything they believed was in scope and the failure came from something that had never been identified as in scope at all. In that situation, pointing at the test suite is not entirely wrong — the case is indeed missing — but it locates the problem in the wrong place, which means the retrospective produces a fix for the symptom and leaves the underlying condition untouched.
— -
What Three Hundred Test Cases Were Actually Covering
The automated regression suite for the promotional feature was careful and thorough. Valid codes, expired codes, malformed input, codes applied to partially empty carts, codes applied after the cart had been modified mid-session, codes for product categories the promotion did not apply to. The manual session had followed the acceptance criteria closely and added a handful of exploratory scenarios around the boundaries of the new product category. The QA report showed 94% coverage and zero blockers.
None of that work was careless. Every case followed from a coherent picture of the feature. The issue is that tests do not test the world — they test the team's model of the world. They encode the shared understanding of how the system is supposed to behave and verify that the actual system matches it. When the model is accurate, high coverage produces confidence that is earned. When the model contains an error, the same coverage produces confidence that is misplaced — and misplaced confidence is harder to recognize than a failed test, because nothing is visibly wrong.
The team's suite had been internally consistent. Every case traced back to the same underlying picture of how discounts worked, and that consistency had made the gap invisible. There was no failing assertion pointing at the problem, no anomaly in the results, nothing to pause on. There was a clean report, a passing pipeline, and an assumption that had been encoded into the test strategy before anyone had thought to examine it.
Three hundred cases had measured how thoroughly the team had tested their understanding of the product. They had not measured how accurate that understanding was.
— -
Where the Risk Actually Lived
The assumption had not started as a technical decision. It had started as a product explanation during a refinement meeting three sprints earlier, offered while the team was reviewing mockups and before anyone had read the API documentation for the discount service.
The product owner had described the two discount mechanisms — loyalty discounts and promotional codes — as separate systems. A customer could use one or the other, not both. The explanation was brief, it made intuitive sense given the UI design, and it was accepted without follow-up. The developer who later built the integration had probably encountered the detail in the API contract — stacking was technically possible and not restricted by default — but by the time the code reached review, the team's working understanding had already settled around "one or the other," and nothing had surfaced the contradiction.
This is where risk tends to accumulate in software teams: not in the scenarios that were identified and skipped, but in the early conversations that felt conclusive and were never returned to. Standard risk frameworks — the probability-times-impact models that most teams absorb informally from project management practice — work reasonably well when the risk is visible and named. They have little to say about assumption-based risk: the behavior that nobody marked as uncertain because it matched something that was described in a meeting and the meeting seemed to close with agreement. These risks are invisible not because the team was careless, but because the assumption was never held up against the technical reality long enough for anyone to notice the gap.
The failure had not started in QA execution. It had started in a refinement room, in a brief description of how two systems would interact, and in the fact that nobody had asked whether the description was technically accurate.
— -
What an Escaped Bug Is Actually Telling You
A retrospective that ends with "add a test case for simultaneous discounts" will prevent this specific combination from escaping again. It will leave the conditions that produced it exactly as they were.
The more useful question to ask after a critical escaped defect is not "where was the missing test case" but "where did our shared understanding of the product break down, and how early did it happen." The answer is rarely in the execution phase. It is usually upstream: in a ticket that was written before a key conversation took place, in an acceptance criterion that described expected behavior without anyone checking that expectation against the system, in a refinement meeting where a plausible explanation settled into shared fact and nobody slowed down to separate the description from the verification.
For this team, the bug was pointing at a repeatable pattern: the boundary between two business rules had been defined once, verbally, and the definition had been trusted without being confirmed. That pattern will appear again — the next time two systems interact in a way that nobody drew out, the next time a rule is introduced through a conversational explanation rather than a checked constraint, the next time an API does something consistent with its own documentation but inconsistent with what the team had discussed two sprints earlier. Naming the pattern — not just closing the specific gap — is the thing that changes the risk surface over time. One new test case updates the test plan. Changing how the team treats early assumptions changes what gets tested before the test plan is written.
— -
A Note That Was Easy to Miss
The test plan for the following release looked mostly the same. Pass counts, automated results, manual cases grouped by feature area, a summary of open defects. Under the checkout section, added between two other line items in the same plain formatting, was a single note:
"Verify API-level behavior when multiple discount mechanisms are eligible in the same session. Confirm that technical constraints match documented product rules before test cases are written."
Three weeks later, during refinement for a change to subscription pricing tiers, the developer presenting the ticket explained how the new tier discounts would interact with existing promotional codes. It was a brief explanation, offered naturally, between two other agenda items. Before the conversation moved on, the QA engineer opened the ticket on her screen.
"Is that how the discount service actually handles it, or is that how we're expecting it to handle it?"
The meeting ran twelve minutes longer than scheduled.