TL;DR; If you are that lazy, paste into your favorite Gen AI machine and ask for it. Does nobody read anymore?

Reality Check 1: The Code Review.

Have you seen this PR review message before?

LGTM!

Or it's more discrete variants? "All good", "Looks nice", "Minor nitpicks but approved", etc.

Maybe no message at all? Just the approved action.

If you had to review other people's code recently, there is a good chance you were in a hurry, because there is a good chance you also have other tasks with your name on it that you have to get to. Plus, if the people you work with are good at their job you kind of trust things are mostly in order, the economic calculation will probably involve a few questions like these:

  • "What is the cost to lift all the context for this task as if I was working on it from scratch"
  • "Does this author usually produce good code?"
  • "How much can I actually contribute here?"
  • "It is worth holding this PR to address the things I have in mind?"
  • "Do I have anything in mind? If not, is it because I understand everything completely or am I just lacking context. Oh man, I might need to pull this branch locally and take a better look"
  • [add your questions here]

Of course this changes a lot with team seniority composition, but there is an economic calculation going on under there somewhere even if you are not aware of it, because: you can always spend more time reviewing code, how do you decide when you should stop?

Reality Check 2: The Bugs.

All the bugs in production got there through a complex pipeline which involved The Code Review step, you approved a lot of them, you wrote some bugs yourself and got them approved by a Code Reviewer. You know it.

But they said: "Code Reviews are not supposed to [fill in the blanks with the failure a Code Review did not prevent]!"

My friend, this is just the end curve of the invention degradation. I was there, "3000 sprints ago" when Code Reviews were actually awesome. Now they are but one step on the modern day SDLC that is bogged down by all the brilliant ideas someone who is not on a Squad who uses them made up.

None
I was there, 3000 sprints ago.

Reality Check 3: The Silos.

Does your team have enough people? I mean, do you have more people than silos?

Well, maybe you do, maybe you don't, but I'll say that the highest performing teams I've work on we've had more silos than people and that's how we managed to build everything fast enough to grow and hire more people and specialize teams. That doesn't seem to ever stop if your business is successful.

So the problem is not having these specialized areas one engineer is very good at, but isolating that knowledge. Now with remote teams it is even easier to fall into that trap and become the "[insert project name]" guy. Good luck taking a vacation in peace, those docs you wrote are not even close to good enough. Good luck onboarding people onto your team, one new hire will bog down everyone with simple knowledge requests.

If you are engaged with the business you work for, it may feel good being "the guy" for that project or code area but if that is happening you are probably going to be "the guy" for many, so are your peers, not very far from not being an actual team anymore but just a band of rogue rangers trying to keep the tribe alive by hunting alone.

Reality Check 4: The Testing Scenario.

Do you find fulfilment thinking and writing complex multi-layer test scenarios?

Writing automated tests for toy apps is all fun and games, but when you have a million line code base with more layers than a George R. R. Martin plot you kind of want to #### yourself, or use stubs and mocks, which is basically the same thing.

I have seen engineers who find joy in seeing their code work, writing the actual code word by word as if they were doing woodworking, but I'm yet to see a software engineer who likes to test software, even if the machine is the one actually running the test.

In the golden age of automated testing our tests worked like this:

  • Start at the highest level of the application layer, like the Controller on the MVC, the Event Emitter/Handler, etc.
  • End at the assertion of the expected end result.

Was there unit testing? Sometimes, maybe you want to break the problem down into small pieces because that's your thing, but you didn't need it.

Now everywhere you look you see:

  • Start at the edge of the class you are updating, mock and stub away the rest of the application;
  • Assert the outputs are what you expect, when the inputs are also what you expect.
  • Pray that makes sense for the application as a whole.

Of course, you have a big code base you will need to mock and stub for "reasons", I get it. Still, you are not programming a system anymore, you are programming a component that: yes it is part of the whole, but you test as if you could trust the whole will behave in a very controlled way. Doable, even good enough if you have a finite state machine, but do you? You probably don't know because you are not designing it right for testing.

You are wrong, that's not me.

Well, does this sound familiar?

Tests pass, PRs get approved, and then staging — or worse, production — tells us what we missed.

The pattern became clear: we test fragments, not behaviors. We mock the hard parts and hope for the best. And when someone without context reviews the code, they can't tell if it's right because the tests don't describe what the thing actually does.

Tests that mock everything are just expensive assertions that the code you wrote is the code you wrote.

You pick up a ticket. It's in a part of the codebase you haven't touched before — maybe it's a third party integration, maybe it's a decision making machine, maybe it's something in the payments and subscriptions system that three people have context on and none of them are you.

You start reading. You open one file, which leads to another, which leads to a service you've never seen, which leads to some cluster of files that introduce a new pattern that was trending the last time interest rates were low, which calls something that's mocked in every test so you can't actually tell what it does for real. You spend half a day just figuring out where the logic lives.

Eventually, you get enough context to start. You write the code. It works locally. Now you need tests.

Here's where it gets interesting.

You look at the existing tests for guidance. They're full of mocks. `allow(SomeService).to receive(:call).and_return(true)`. The test doesn't actually test the integration — it tests that your code calls the right method with the right arguments. If the mock is accurate. If the real service still behaves that way.

You don't have time to rewrite the test suite, so you follow the pattern. You mock what everyone else mocks. You stub the hard parts. The tests pass. Green. Ship it.

The PR goes up.

Your reviewer opens the diff. They see 400 lines of code in a domain they don't work in. They skim the implementation, maybe catch a typo, suggest using `&:method_name` instead of a block. There is an "aha!" moment where they can share a new language feature you are not using. They look at the tests but can't tell if the scenarios are complete — because they don't have the context either.

"LGTM!"

Two days later, Sentry or DataDog or whatever have you starts complaining with a New Error. The edge case nobody thought of, or even worse, a "not edge case" everyone should have totally spotted. The integration that behaves differently than the mock suggested. The scenario that seemed obvious after it broke.

Back to the code. Another round of changes. Another PR. Another review from someone without context, but now everyone tries to pay more attention because they know this was broken recently. Of course now you sound even more confident you know what you are doing as you learned for your mistake.

The review added value to the code style. It added nothing to the logic. And logic is where bugs live.

Sound familiar?

Getting Practical: The Solution

There are no solutions, only compromises.

The approach of "Start pairing, deliver solo" is an old but good way to dampen some issues or ever offer you a cure or way out of a few.

Before writing implementation code, two engineers sit together and write the test scenarios. Not the implementation — just the tests.

  • What should this feature do?
  • What is the full scenario and data we need around to test this? If it was in the staging environment or even production, where do we need to even go to create all the things we need to test this thing?
  • Do we have fixtures like proper chads, or do we need a tree of factories?
  • What are the edge cases?
  • What exceptions must we handle?
  • Where do even put the test file?

As they work on these answers together, implementation design discussions emerge, no engineer can resist the temptation of imagining how the solution will work, and that is a good thing to collaborate on. Once these questions are answered, full scenarios and tests skeletons are written, one engineer implements. Solo.

That's it.

What just happened? We created opportunity for:

  • The engineer who potentially owns a Silo to share with another engineer a lot more than an after thought document. The best way to transfer knowledge is not documentation — it's collaboration on real work.
  • The engineer with more experience to share how he thinks about system design, architecture, error handling, etc.
  • Both engineers to engage in seriously thinking about a comprehensive test scenario, sharing the load — and sometimes pain — of doing so.
  • Think about testing before implementing, which can increase the design quality, error handling and logic, hopefully reducing bugs just a little bit.
  • Having two people do context lift for a piece of work, do you think your pair buddy will be able to do a better Code Review than before? Even if there is no extra value in the Code Review, it will certainly be more efficient.
  • PR merging cycles get shorter.
  • Bonus: when the team grows and needs to onboard more people, now you have at least two people with context on something, maybe you can even take a two week vacation.

Why this matters

Long ago when I thought clever code was fancy, I used to think pairing was about catching typos in real-time. It's not. The real value is before the code exists — when you're deciding what "done" looks like.

Imagine you're building a Code Review feature. You sit down with a teammate. Before anyone writes a line of code, you ask each other:

  • "What happens when the PR already approved?"
  • "What if a comment gets deleted, do we rollback the notification?"
  • "Should we hold the email notification then? For how long?"

These are crappy examples, I know, but they serve their purpose.

These questions are easy to ask when you're not neck-deep in implementation. When you're in flow, writing code, you don't stop to question the happy path as easily— you just build it and if you are lucky you hit walls and trip wires that will raise questions. The pairing session is the moment where questioning is the whole point, you broaden the view before narrowing in.

Two people asking "what should happen when…" will cover more ground than one person deep in implementation mode. The test scenarios become the contract. The implementation becomes the easy part.

When you write the test first, you're forced to understand the problem before you solve it.

This is old-school TDD. Not the "write a unit test, mock everything, move fast" version. The original version — where the test describes behavior, and the code exists to satisfy it.

What this is not

This is not a policy. It's a tactic. We're trying it because we think it will help in some cases, If we think it doesn't, we'll adjust.

This is not pairing for the entire task. Just the beginning — the part where decisions matter most. But if you get blocked and need to pair again, you have this teammate with lots of context on what you are doing to help.

This is not about slowing down. It's about doing less rework. If you consider the team's actual velocity not to be time from "task card" to deployed in production, but time from "task card" to value in the hands of customers, then designing better solutions means shipping faster overall.

Picture two futures

In the first one, you pick up a ticket, spend half a day or more building context alone, write tests that mock the hard parts, ship a PR that gets a rubber-stamp review, and find the bug in staging three days later (you shipped on Thursday). You fix it. Another PR. Another review cycle. Another not-that-useful Code Review. The feature that was supposed to take a week takes two.

In the second one, you pick up a ticket and spend an hour with a teammate. You talk through the scenarios. You write tests that describe what the feature actually does. You implement solo, with clarity. Your pair partner reviews in 20 minutes after you post it for review because they already know what they're looking at. The feature ships. It works.

Every time I see a defect caught in staging, I ask myself: "Would two people writing the test scenarios together have caught this?"

Usually, the answer is yes.

Let's find out.