Most customer health scores are a lie. Someone in a meeting decided that login frequency is worth 30 points, support tickets are minus 10, and NPS is worth 20. Those weights came from gut feelings, not data. And they've probably never been updated since.
AI health scores are different. They learn the actual relationship between customer behavior and churn outcomes. They discover that for your specific product, "customers who connect 3+ integrations but stop using the reporting dashboard" is a stronger churn signal than any single metric alone.
Why traditional health scores fail
I've audited health scoring systems at about 30 SaaS companies. The same problems show up everywhere:
- Arbitrary weights. Why is login frequency 30% and NPS 20%? Because someone guessed. Maybe NPS matters 5x more for your product. Maybe login frequency is a terrible proxy for value delivery. Without data, you're guessing.
- Static thresholds. "Green if they log in 10+ times per month." But a marketing analytics tool and a payroll tool have completely different natural usage frequencies. Static thresholds punish products with lower-frequency use cases.
- Missing interactions between features. A customer with high login frequency AND high support ticket volume is at higher risk than either signal alone would suggest. They're logging in frequently because they're struggling. Rule-based scores miss these interactions.
- No feedback loop. Traditional scores don't learn from outcomes. If a "green" customer churns, the score doesn't automatically adjust. Someone has to manually re-tune the weights, which almost never happens.
How AI health scoring works
An AI health score takes the same inputs (usage data, support interactions, billing behavior) but learns the weights from your actual churn history. The process:
- Collect 6-12 months of customer activity data paired with outcomes (renewed vs. churned)
- Train a model (typically gradient boosted trees, same as churn prediction) on this data
- The model outputs a probability score for each customer, updated daily or weekly
- Map probabilities to health categories (green/yellow/red) based on your intervention capacity
- Retrain quarterly as your product and customer base evolve
The key difference from a standalone churn prediction model: health scores are designed for your CSM team to use daily. They need to be interpretable ("this customer is red because usage dropped 60% and they downgraded seats") and actionable ("here's what to do about it").
What inputs matter most?
After training models on various SaaS datasets, here are the inputs that consistently drive health scores:
Usage signals (biggest predictors):
- Core action frequency relative to the customer's own baseline (are they using it more or less than their normal?)
- Feature breadth (how many different features are they using?)
- Last meaningful action date
- Active users vs. total seats on the account
Relationship signals:
- Support ticket volume and sentiment trends
- NPS or CSAT score changes (the trend matters more than the absolute number)
- Response time to your outreach (are they ignoring you?)
Business signals:
- Contract value changes (upgrades vs. downgrades)
- Payment failures or delays
- Approaching renewal date without expansion signals
One thing that surprises people: company firmographics (size, industry, funding) are usually weak predictors compared to behavioral data. A well-funded enterprise customer can churn just as fast as an SMB if they stop getting value from your product.
Setting up alerts that your team will actually use
The score itself is useless if nobody acts on it. Here's the alert framework that works:
Daily digest: A Slack or email summary of customers whose health dropped significantly in the last 24 hours. Not every red customer, just the ones that changed. This keeps alert fatigue low.
Threshold alerts: Immediate notification when a high-value customer (top 20% by MRR) crosses from yellow to red. These get a CSM assigned within 24 hours.
Trend alerts: Weekly report on customers with a consistent downward trend over 2-3 weeks, even if they haven't hit "red" yet. Early intervention is the whole point of AI scoring.
The biggest mistake: alerting on every score change. Your team will ignore everything within a week. Be selective. Alert on changes and high-value accounts, not on the full red list.
From rule-based to AI: a migration path
You don't have to throw out your existing health score overnight. Here's a phased approach:
Phase 1: Keep your current score. In parallel, start logging the raw data inputs (usage metrics, support data, billing events) into a single table with customer outcomes. You need 6 months of this data minimum.
Phase 2: Train an AI model on the historical data. Run it alongside your current score for 4-6 weeks. Compare: which one flags customers who actually churn more accurately?
Phase 3: Replace the old score with the AI score. Use the interpretability features of the model to explain each score to CSMs ("this customer is at risk because of X, Y, Z").
The health score monitoring experiment has the full implementation playbook. And if you want to understand how health scores fit into the bigger AI retention picture, the AI churn reduction guide covers the full stack.
If you're not sure whether your churn problem is best solved with health scoring or another approach, start with the churn risk quiz to identify your biggest lever.