By the end of this article, you'll have a repeatable framework you can run in your next architecture meeting — one that replaces gut-feel arguments with a scored, multi-dimensional analysis in about an hour.
The framework has five steps:
- frame the decision
- build a weighted matrix,
- surface characteristic tensions
- apply a reframe that ends debates,
- and produce a risk-assessed recommendation.
Most architecture discussions follow a predictable script. Someone proposes Kafka.
Someone else says "that's overkill."
A third person draws boxes on a whiteboard. Everyone argues from intuition dressed up as experience. The loudest voice or the most senior title wins.
Six months later, the team lives with consequences nobody fully thought through.
The problem isn't that engineers lack technical knowledge. It's that they evaluate technology on one dimension — usually performance or scalability — while ignoring five others that matter just as much.
Everything in software architecture is a trade-off. Not some things. Everything. There is no "best" database, no "right" messaging system, no "correct" architecture style. There are only trade-offs that are more or less appropriate for a specific team, in a specific context, at a specific time.
Here's a framework that makes those trade-offs visible, scorable, and debatable — before you commit to an irreversible decision.
Step 1: Frame Before You Compare
Before comparing any technology, answer five questions. Most teams skip this and jump straight to pros-and-cons lists. That's why most teams make decisions they regret.
Take a straightforward example: a 6-person startup choosing between a monolith and microservices.
What are the actual options? Not "should we use microservices?" — that's one option with nothing to compare against. You need 2–5 genuinely different approaches. If someone proposes a single technology, generate alternatives that aren't strawmen.
What's the context? A 6-person startup has different constraints than a 200-person platform team. Team size, existing skills, timeline pressure, budget, compliance requirements — these are the filters that eliminate options before you score anything.
Which architecture characteristics matter? Not all of them. Pick 5–8. A payment system cares about data integrity, security, and reliability. A content feed cares about scalability and developer experience. The characteristics you choose — and how you weight them — determine the outcome.
What's the time horizon? A 6-month decision favors simplicity and speed. A 5-year decision favors evolvability and maintainability.
Is this a one-way door or a two-way door? Jeff Bezos popularized this distinction. A two-way door (choosing a logging library) gets a 15-minute gut call. A one-way door (choosing your primary datastore) gets the full treatment. Don't waste rigor on reversible decisions, and don't wing it on irreversible ones.
For our 6-person startup, the framing reveals something important before any scoring happens: with a tiny team, tight budget, and 6-month runway to prove product-market fit, simplicity and cost aren't just "nice to have" — they're existential. That framing alone eliminates microservices for most honest evaluators.
Now let's apply the remaining four steps to a more complex scenario. A 20-engineer team runs a Django + Postgres monolith for a B2B logistics SaaS. Customers want real-time updates — live dashboards, instant webhooks, activity feeds. The team currently polls every 30 seconds. Competitors offer sub-second updates. Three options are on the table: add Kafka alongside Postgres, go full event sourcing with Kafka as the primary store, or stay on Postgres using Change Data Capture to push updates via WebSockets.
Here's a snapshot of the decision context before we score anything:
Decision Factor Reality Team size 20 engineers Relevant expertise Deep Postgres, zero Kafka Delivery window 3 weeks max Current event volume 2M events/day Growth expectation 5x in 12 months Latency SLA ≤ 2 seconds
This table alone eliminates options that require months of migration or skills the team doesn't have. Context is the filter.
Step 2: Build the Trade-Off Matrix
This is the core mechanism. Score each option against each characteristic on a 1–5 scale, with anchored definitions to keep scores honest.
A score of 1 means "actively harmful for this characteristic" — a known anti-pattern. A 3 means "adequate, nothing to write home about." A 5 means "purpose-built for this dimension." The anchors prevent score inflation and make it possible for the team to disagree productively — you're debating whether something is a 3 or a 4, not arguing vibes.
Here's the matrix for the logistics team:

Three Principles That Make Scoring Useful
Without these, the matrix is just a decorated spreadsheet.
Scores are relative, not absolute. You're not measuring against a Platonic ideal of scalability. You're measuring which option is better for this team, with these skills, at this moment. A monolith scores higher on simplicity for a team of 5 than for a team of 50.
Divergence is the signal. Look at the rows where options score very differently. Simplicity ranges from 1 to 4 across these options. Data integrity ranges from 2 to 5. Those rows are where the real trade-off lives. If all options score 3–4 on a row, that characteristic isn't a differentiator — remove it and focus on what actually separates the options.
Weight by context, not in the abstract. Our logistics team weighted simplicity as "high" because they had a 3-week delivery window and zero Kafka experience. A team with 6 months of runway and Kafka expertise would weight it "medium" or "low." The weights encode your situation, not universal truth.
Multiply high-weight scores by 3, medium by 2, and sum the columns. The logistics team's result: Postgres CDC scored 78, Kafka Sidecar scored 60, Event Sourcing scored 56. A 30% margin isn't a close call. But the matrix alone doesn't tell the full story.
Step 3: Name the Tensions Nobody Wants to Say Out Loud
Certain architecture characteristics are in natural conflict. Scalability conflicts with simplicity — scaling requires distribution, which adds complexity. Data integrity conflicts with scalability — strong consistency limits horizontal scaling. Performance conflicts with maintainability — optimized code is harder to change.
When two conflicting characteristics both rank "high" in your context, the tension is the trade-off. Name it explicitly.
For the logistics team, the central tension was simplicity vs. evolvability. Postgres CDC preserves simplicity but produces row-level change events ("column X in table Y changed"), not rich domain events ("shipment 456 was re-routed to warehouse B"). Kafka delivers a proper event backbone for future consumers but at a complexity cost the team can't absorb in 3 weeks.
Naming this tension transformed the discussion from "which technology is better?" to "which tension are we more willing to live with right now?"
Then go deeper. For each option, ask: if we choose this, what else changes?
These second-order effects don't appear in the matrix, but they often determine whether a decision succeeds or fails.
Introducing Kafka to a team with zero experience means every future hire needs Kafka familiarity — shrinking the candidate pool. On-call rotations now include Kafka broker failures with no runbooks. Every feature PR must consider dual-write consistency, adding cognitive load to every developer on every commit.
Event sourcing is more dramatic still. It changes how 20 developers think about their domain. Debugging shifts from SQL queries to replaying event streams. You effectively need a 2–3 person "event platform" team — 10–15% of your engineering org dedicated to infrastructure plumbing instead of customer features.
These costs are invisible in a pros-and-cons list. They're visible in a structured analysis.
Step 4: "What Would Have to Be True?"
This is the single most powerful tactic in the framework, borrowed from Roger Martin's strategy work.
Instead of asking "which option is best?" — a question that triggers ego and preference — ask "under what conditions would each option be the clearly right choice?"
For the logistics team:
For Kafka Sidecar to be the right choice, the team would need 2–3 engineers with Kafka experience, 6+ weeks of runway, a near-term need for a domain event backbone, and budget for $1,500–3,000/month in managed Kafka infrastructure.
For Event Sourcing to be the right choice, this would need to be a greenfield system (not a retrofit), the team would need event sourcing experience, and complete historical replay would need to be a first-class business requirement.
For Postgres CDC to be the right choice, the team's Postgres expertise would need to be their primary technical asset, the delivery window would need to be tight, event volume would need to stay under ~50M events/day, and row-level change capture would need to be sufficient for immediate use cases.
Now instead of debating preferences, the team assesses reality. Which conditions actually hold? For the logistics team, all five CDC conditions were true. Zero of the Kafka conditions held. The debate was over — not because someone won the argument, but because the diagnostic made the answer obvious.
This reframe works because it's psychologically safe. Nobody has to admit their preferred option is wrong. They just have to acknowledge which conditions hold in their current reality.
Step 5: Risk Profile and the Recommendation
Score each option across four risk dimensions: technical risk (what could go wrong), organizational risk (does the team have the skills), timeline risk (hidden complexity that surfaces late), and reversibility cost (if this doesn't work in 12 months, what does it cost to change course).
For the logistics team, Event Sourcing carried extreme organizational and timeline risk — a complete paradigm shift for 20 engineers, realistically 3–6 months of migration, fundamentally incompatible with a 3-week constraint. Kafka Sidecar carried moderate organizational risk (zero Kafka expertise) and medium reversibility cost (once Kafka becomes load-bearing, removing it is a multi-month project). Postgres CDC carried low organizational risk and high reversibility (Debezium is additive infrastructure; remove it without touching application code).
The recommendation: Postgres CDC with high confidence.
Not because it's the "best" technology in the abstract — Event Sourcing scores higher on evolvability and Kafka on scalability. But because every condition required for CDC to be the right choice actually holds for this team, right now.
Every recommendation should include two things most people skip. First, confidence caveats — name what would change the recommendation. If event volume hits 50M/day within 6 months, if customers demand historical replay, if the team acquires Kafka expertise through an acquisition — any of these would shift the analysis.
Second, fitness functions — 2–3 metrics to monitor post-decision so you know whether the choice is aging well. For the logistics team: CDC latency at p99, Postgres replication slot lag, and event-to-domain transformation failure rate.
"But My Team Won't Sit Through a Scoring Exercise"
The most common objection. Three responses.
For two-way doors, they shouldn't.
A 15-minute gut call is appropriate for reversible decisions.
The full framework is for one-way doors where the wrong choice costs 6–18 months of engineering effort.
On subjectivity: yes, the scores are subjective. But transparent subjectivity that an entire team can challenge, debate, and improve is categorically better than invisible subjectivity in one architect's head.
On time: the framework takes about an hour. The wrong architecture decision takes 6–18 months to unwind. The math is not complicated.
The Question That Changes Everything
The framework's deepest value isn't the matrix or the weighted scores. It's forcing the conversations the team was avoiding — the tensions between competing priorities, the second-order effects nobody wanted to quantify, the conditions that must hold for the preferred option to actually work.
Architecture decisions don't fail because engineers pick the wrong technology. They fail in the gap between what was analyzed and what was assumed.
The next time someone proposes a technology in a meeting, don't ask "is it good?" Ask: what would have to be true for this to be the right choice for us, right now?
If the room can't answer that question, you're not ready to decide.