Signal Architecture | Customer Value Labs

Signal architecture is the design layer that determines which data signals feed into your account risk detection, where they come from, how they combine, and whether they actually predict anything. Most B2B SaaS teams call the output a "health score." The question I've been spending time with is whether the problem with most health scores is the scoring model or the data infrastructure underneath it.

The starting point is uncomfortable. A 2020 survey of roughly 200 US-based SaaS companies by CSM Practice found no clear correlation between having a customer health score and renewal forecast accuracy. Health score usage didn't track with lower churn, either. Companies using CS software (regardless of whether they scored accounts) had the lowest churn, which suggests the tooling matters more than the specific metric it produces. That finding has been sitting with me since I first encountered it, because it implies a lot of teams are maintaining a number that doesn't change their ability to forecast.

It's worth looking at NPS specifically, because most scoring models weight it as a primary input, and the academic evidence on its predictive validity is surprisingly thin. Fred Reichheld, the Bain consultant who created NPS, originally claimed it was the "single most reliable indicator" of company growth. Keiningham et al. (2007) tested that claim using longitudinal data from 21 firms and over 15,500 interviews. They couldn't replicate it. The paper won the Marketing Science Institute's H. Paul Root Award, which is not the kind of thing you ignore. More striking is Zaki et al.'s 2016 study out of Cambridge, which examined 3,000 B2B customers over three years and found that most of the highest-value churners had been classified as "promoters." When the same researchers built a model that combined behavioral data (what customers actually did: purchase patterns, support interactions, product usage) with the attitudinal data (what they said on surveys), that model predicted churn at 98% accuracy in their validation set. The NPS scores, on their own, had pointed in the wrong direction for the accounts that mattered most.

This connects to a pattern I keep seeing in the systems I audit. The signals that teams choose to track tend to be the ones that are easiest to collect, not the ones with the strongest predictive relationship to retention. Login frequency shows up in almost every scoring model I've reviewed, but the academic literature on SaaS churn prediction (Calli & Kasim, 2023; Dahlen & Mauritzon, 2023) consistently finds that feature depth and breadth of adoption outperform login frequency as predictors. An Aalto University study (2024) found that a composite "relationship strength" metric, combining engagement patterns across multiple dimensions, was the single most important predictor in their model. The number of times someone logs in matters less than what they do after they log in, and whether they're pulling other teams into the product.

Across every study I've pulled into this review, the multi-signal finding keeps holding up. Keiningham et al. explicitly stated that "a combination of VOC metrics universally outperforms the use of only the NPS's recommendation intentions." Zaki's multi-source model hit 98% accuracy where NPS failed. Amin et al. (2024) in Scientific Reports achieved 95% accuracy with an ensemble-fusion approach combining multiple ML algorithms. The direction here is always the same. Composite behavioral signals outperform single metrics, and the accuracy gaps are large enough that you can't attribute them to methodology differences.

This is where it becomes an infrastructure problem. If composite behavioral signals are what work, the scoring formula is probably fine. The harder problem is whether your systems can actually collect and combine those signals before the formula ever runs. In most stacks I've worked in, product usage lives in the product analytics tool, support data lives in the ticketing system, engagement signals live wherever marketing put them, and the CRM assembling the score can pull from maybe one of those. So the score is a function of data availability, and data availability has very little to do with predictive value. And that might be why the CSM Practice survey found no correlation: teams build scores from whatever data already exists in the CRM, and the question of whether that data is the RIGHT data never gets asked.

About a third of companies still update scores by hand (CSM Practice, 2020), and this compounds the infrastructure gap. Practitioner literature consistently flags recency bias, inconsistency between reps, and data staleness as failure modes. But automated scores fail differently: they tend toward over-complexity and opacity, producing numbers that nobody trusts enough to act on. What I find interesting is that both problems point to the same structural issue. The scoring model and the data layer feeding it were never designed as a single system.

There's also a lifecycle dimension I'm still working through. What constitutes a meaningful signal changes as an account matures. At 30 days post-go-live, what matters is integration completion and activation breadth. By 18 months, the meaningful signals have shifted to depth of adoption, multi-team penetration, and whether the customer is actually hitting the outcomes they bought the product for. Most scoring models I've seen apply the same weights regardless of where the account sits in its lifecycle, and I suspect that's another reason the aggregate data on score effectiveness is so underwhelming.

Churn Prediction Accuracy: Multi-Signal vs. Single-Algorithm Models

Sources: Zaki et al. 2016 (behavioral composite model); Amin et al. 2024 (ML algorithm comparison). Models that combine multiple behavioral signal types consistently outperform single-algorithm approaches.

I started this review thinking the question was how to build better scores. I'm finishing it thinking the question might be how to build better signal infrastructure. Everything I've reviewed suggests that the inputs matter more than the formula. And the inputs are an architecture challenge, which is a different discipline than the analytics work most teams focus on.

When I've done this work with clients, the sequence that seems to hold up goes something like this: first, inventory what data you actually have flowing into your CRM or CS platform today, and be honest about what's missing. Second, work backward from what the research says predicts retention (feature depth, behavioral composites, relationship strength) and identify which of those signals your systems can't currently see. Third, map the integration gaps: where does the data live, what would it take to pipe it into the system that calculates the score, and who owns each connection. The scoring model comes last, once the infrastructure is actually feeding it the right inputs. Most teams I've worked with had the formula-building instinct first, and I understand why, but the research keeps pointing me toward the plumbing.

Dillon Young is the founder of Customer Value Labs, where he builds and maintains revenue systems infrastructure for B2B SaaS teams. If your signal architecture could use a structural review, that's a good place to start.

Related Reading