← All thinking pages

Outcome Hit Rate

working Last updated 2026-01-15
CS Operations Metrics

Outcome Hit Rate measures whether CS interventions actually change outcomes, not just whether they happen. For every intervention your team runs (QBR, outreach, health score response, save attempt), it asks three questions: Was the target right? Was the timing right? Did the outcome change?

A hit requires all three. Which, in my experience, is a higher bar than most teams realize.

Most CS dashboards track activity, things like touches completed, QBRs held, cadences run, playbooks fired. Those are input metrics, and while they tell you how busy the team is, they say almost nothing about whether any of it mattered. McKinsey's analysis of 100+ B2B SaaS companies found that top-quartile companies (median 24x EV/revenue multiple) achieve 113% NRR while bottom-quartile peers (5x multiple) sit at 98%. That's fifteen points of NRR separating the two groups, and the companies on the high end aren't necessarily doing more - they're allocating their interventions more precisely.

The three questions work as a sequential filter, and the distinctions between them matter more than they seem at first.

"Was the target right?" asks whether this account had a problem you could actually influence. Some accounts are going to churn regardless. The deal was wrong, or the use case was a stretch, or the real constraint sits upstream of your product. Intervening there isn't proactive CS. It's a non-productive intervention, capacity spent on an account where the outcome was already determined (more on that in the Velocity Trap page).

"Was the timing right?" asks whether the customer could still change course. AI-enhanced health scores can now flag churn risk 60-90 days in advance at 85%+ accuracy. But accuracy of detection is not the same as actionability of intervention. If the decision to leave was made three weeks ago in a meeting you weren't invited to, your QBR deck isn't saving anything.

"Did the outcome change?" is the hardest of the three because it's asking about the counterfactual: not "did we do the thing" but "did the trajectory of this account actually shift because we did it?" In my experience, most teams never get around to asking this question, partly because the answer is uncomfortable and partly because the methodology for answering it is genuinely difficult.

I use this primarily as a retrospective exercise. Once a quarter, pull a sample of 50-80 interventions and score each one against all three questions. The scoring is imperfect, you're making judgment calls on each criterion, but even rough scoring reveals patterns that activity metrics completely miss. I'll admit I resisted doing this the first time because I suspected the answer would be uncomfortable.

Across the client portfolios where I've run this exercise, the first-time hit rate consistently lands around 15%.

That number shocks people, though it probably shouldn't. Bain's CS report found that CSMs spend over half their time on low-value, repetitive tasks. If more than 50% of activity is structurally unproductive, a 15% hit rate on outcome-changing interventions starts to look less like a people problem and more like a targeting problem.

Where CS Interventions Actually Land
Across multiple client portfolios, roughly 85% of interventions fail at least one of the three criteria: right target, right timing, outcome changed.

The math: 200 interventions per quarter at 15% accuracy generates 30 meaningful outcomes. The other 170 are non-productive. Now restructure targeting (using something like the Milestone-to-Intervention Model) and cut volume to 120 interventions while raising accuracy to 40%. That's 48 meaningful outcomes. A 60% improvement in impact from a 40% reduction in volume.

At fully-loaded CSM costs of $55-75/hour, those 80 eliminated non-productive interventions (averaging 2 hours each with prep, execution, follow-up) represent $8,800-$12,000 per quarter in recaptured capacity per CSM. Across a team of 8, that's $70K-$96K quarterly redirected from waste to impact.

There's a related problem that I think gets underexplored. When I've audited client expansion pipelines, roughly 40% of deals classified as "expansion" are actually repair. A second team adopting the product because the first implementation was too narrow. An "upsell" that's really a feature the customer should have had from day one. Benchmarkit's 2025 data shows expansion becomes the dominant growth engine beyond $20M ARR, with companies in the $15-30M range getting 40% of growth from existing customers (up from 30% in 2021). But if a significant chunk of that expansion revenue is repair in disguise, the real NRR picture is weaker than the dashboard shows.

At one client, I tagged every expansion deal. Forty-one percent were repair. The forecast had them at 115% of target. Adjusted for repair, they were at 68%. That was a difficult conversation to have with the team, but it reframed their entire planning cycle in a way that I think was ultimately more useful than the original forecast.

Expansion Forecast: Reported vs. Adjusted for Repair
When 41% of "expansion" deals are reclassified as repair, the forecast picture changes dramatically.

Outcome Hit Rate applied to expansion motions catches this, because repair doesn't satisfy the "did the outcome change?" criterion the way genuine expansion does.

Question three, the counterfactual, is where this whole framework gets shaky, and I want to be honest about that. Would this account have renewed without the QBR? Would that expansion have happened organically? You won't always know. Sometimes the best you can do is tag the intervention and check back in 90 days. Staircase AI's research found that customers with regular QBRs are twice as likely to renew, which is a useful baseline. But "regular QBRs" and "well-targeted QBRs" are not the same population. The correlation between activity and retention is real. The question is which subset of that activity is driving the correlation and which is noise.

Some teams I've worked with have started using a simple attribution framework: tag interventions at execution time, then at renewal or churn, have the CSM (and ideally the customer) assess which interventions were consequential. It's subjective and it's directional, but directional is a meaningful upgrade from "we have no idea whether any of this mattered," which, if I'm honest, is where most teams are operating today.