CVSS is a severity score. You’re using it as a priority list.

It came in through a bug bounty platform. The hunter had chained together a set of techniques, some of them held together with scripts they'd clearly written themselves for exactly this, to coax a MariaDB pipeline into handing over secrets it should never have exposed. An SQL injection, technically, but not the kind you find in any tutorial i know of. It was a novel chain. The kind where you read the report twice, thrice even, go quiet for a second, and then start checking whether anyone else could have found it first.

When it came time to triage, there was no score to reach for. The technique was new enough that nothing in the usual catalogues described it. So, the hunter did what I assume most would do in that situation, they made a number up. Called it critical, justified it in a paragraph, and they were right. We paid out at the top of the range, because the finding really deserved it.

That was the most important vulnerability we dealt with that quarter, and it arrived with no CVSS score, got one invented for it on the spot, and that improvised number was more accurate than most of the "official" ones sitting in our backlog.

That's the whole problem in one story. But let me make the general case, because it's not just us, and it's not just bug bounties.

Severity and priority are different questions

CVSS, the Common Vulnerability Scoring System, answers one question well: how bad is this vulnerability, in the abstract, to anyone who has it? This is a really useful thing to standardise. Before CVSS, "critical" meant whatever the person saying it wanted it to mean. Having that shared vocabulary is genuinely good, and I won't pretend otherwise.

But that is not the question a working security team actually needs to answer. The question we live with is different: what should my team fix first, in our codebase, this week?

They look similar but they are not the same question.

Most security teams Monday mornings tend to follow the same pattern. You know the morning I mean. The scan finished overnight. Three hundred and something new findings turn up. Forty of them are marked critical. You already know, in your gut, even before you've opened a single one, that maybe five of those criticals really matter, that two of the mediums are scarier than all forty criticals combined, and that a good chunk of the list is noise you've triaged and dismissed three times already (even have a policy file in your repo to help). But the dashboard sorts by CVSS. The SLA policy is written against CVSS. Your manager's spreadsheet is coloured by CVSS. So, you spend the week on a 9.8 sitting in a code path that hasn't shipped since 2022, while the thing that actually keeps you up at night waits its turn.

The score isn't lying to you. It is simply answering a question you didn't ask.

Where it breaks, specifically

CVSS is context-free by design. If you have a 9.8 in dead, unreachable code and a 6.5 in your authentication flow, CVSS will rank the 9.8 higher every time, because CVSS cannot know which of your code is reachable, which service faces the internet, or which database holds the data that would end you if it leaked. It simply scores the vulnerability. It knows nothing about your exposure to it. And exposure is the game. The same CVE is a non-event in one architecture and a company-ending incident in another, and the base score is identical in both.

"Critical" has been inflated into meaninglessness. Scanners are incentivised to flag high. Nobody ever got sued for over-warning. So when thirty percent of your backlog is "critical," the word stops carrying information. You're left with two bad options: honour every critical and burn out or quietly start ignoring the severity field (which is what some teams actually do) and then severity is doing no work at all. It is worth noting here that the exploit-prediction data (EPSS, if you want to go read it) consistently shows only a small fraction of CVEs are ever exploited in the wild. CVSS treats a theoretical bug and one being actively exploited identically, unless someone is religiously maintaining temporal scores by hand and trust me, nobody is religiously maintaining temporal scores by hand.

The score says nothing about the fix. Two criticals land in your queue. One is a dependency bump, ten minutes work, one line, done before coffee. The other needs you to re-architect session handling across three services over a quarter. CVSS ranks them identically. But any sane human prioritises by weighing urgency against effort (i.e. what does delay cost me, versus what does the fix cost me) and CVSS structurally cannot help, because remediation effort was never one of its inputs. A priority list that ignores how hard things are to fix isn't a priority list. It's a wish list.

And sometimes the genuinely dangerous things often have no score at all. Back to the MariaDB chain. CVSS scores catalogued, categorised vulnerabilities with a known shape. The attacks that actually hurt are frequently novel chains, three minor things combined in an order nobody documented, each one harmless on its own. There's no entry for that. No vector string to paste in. The scoring system is silent at exactly the moment it should be loudest, because the model only knows what's already been named. The hunter who found ours had to invent the number, and the fact that an improvised score beat the catalogue should tell you something about where the catalogue's authority actually comes from.

There's a quieter cost on top of all this, too. Every time a ticket says "critical, fix within 7 days per policy" and the engineer who opens it can plainly see it's in dead code, you spend a little credibility. Do that enough and engineering learns to ignore your tickets wholesale, including the one time it really, truly matters. CVSS-as-policy is one of the most reliable ways an AppSec function makes itself easy to tune out.

To be fair to CVSS

It is good at the thing it was built for. As a common severity vocabulary, it works. CVSS 4.0 even gestures at the context problem with environmental metrics that let you adjust the base score for your own situation, which is the right idea.

In practice, almost nobody fills them in, because doing it well requires exactly the per-asset, per-codebase judgement that the rest of your week doesn't leave time for. So the capability exists and goes unused, and we're back to sorting by the base score because the base score is the number that's actually sitting there.

None of this is a moral failing of CVSS. It's a category error in how we use it. We took a severity dictionary and pressed it into service as a prioritisation engine, and then act surprised when it prioritises badly.

The questions that actually order a backlog

If severity alone can't do it, what does? Honestly, it's just the questions you already ask in your head when you triage well:

Is this actually exploitable in practice, or only in theory?
Is the affected code even reachable in a way an attacker could trigger?
What does the affected asset mean to the business. is this the marketing blog or the payments service?
What does delay cost if we leave it a week, a month, a quarter?
What does the fix cost. ten minutes or a re-architecture?

Order a backlog by those five and it looks nothing like the CVSS-sorted version. The reachable, business-critical, cheap-to-fix 6.5 rises to the top where it belonged the whole time. And crucially, the novel chain that has no catalogue score still gets a place in the line, because you're scoring exposure and consequence, not waiting for someone to assign a vector string.

That's it. That's the framework. It's not clever or novel, it's what good practitioners already do by instinct, written down so a tool can do it consistently.

I'm building something around exactly these five questions, a project called Sinterly that takes raw scanner output and turns it into a prioritised, explainable list, scored on exposure rather than abstract severity. It's early. If this is a problem you live with, follow along; I'll be writing about how the scoring actually works next, and I'd rather build it in the open with people who've felt this than in a vacuum.

And if you've got your own "the worst bug had no score" story, I'd genuinely like to hear it.

Contents

Severity and priority are different questions

Where it breaks, specifically

To be fair to CVSS

The questions that actually order a backlog