What the practice was dealing with.
In a value-based care contract, the practice is financially responsible for avoidable cardiac and renal events in its hypertensive population. Every stroke, every MI, every CKD progression hits the P&L.
The challenge: hypertension is common, and most patients look "fine" on paper, until they do not. The patients who cause 80% of the avoidable cost are usually quiet clinically until something acute happens, because they are the ones skipping follow-up visits or self-titrating their medication.
Traditional chart review caught these patients, but usually only after a BP reading or a lab result had already crossed a threshold. By then the practice had already lost the intervention window. The care team wanted to move detection upstream, before the labs looked bad.
How we scoped and built it.
We aggregated four years of de-identified chart data across roughly 14,000 adult hypertensive patients: demographics, BP readings, labs (eGFR, A1c, lipids), medications, visit patterns, social determinants, and documented comorbidities. All of it already existed in the EHR, nobody had ever pulled it into one frame.
The outcome variable was any hypertension-related event within a 12-month window: stroke, MI, CKD stage progression, hypertensive urgency, or a new CHF diagnosis. We held out the most recent 12 months to validate honestly, not to grade our own homework.
We tested logistic regression, gradient-boosted trees (XGBoost), and a small tabular transformer. XGBoost won on AUROC and calibration, and gave us clean SHAP attribution, which turned out to be the deciding factor. The clinicians had to trust it, and "the model said so" was not going to be enough.
The model runs nightly on the patient panel. Every morning the care team opens a dashboard showing the 30 highest-risk patients who have not been seen in the past 90 days, and the top three reasons each one is flagged. No black box. Every score is explainable down to the feature contributions.
"The model does not replace our judgment. It tells us which twenty charts to review this week instead of which four hundred."
What changed in the practice.
On held-out data the model achieved AUROC 0.87 with strong calibration in the top two deciles, exactly where the clinical decisions happen. High-specificity at the top of the list was the point, not a good-looking ROC curve overall.
In the first six months of production use, the care team intervened with 38% more patients in the pre-event window compared to the prior year. Each "intervention" is a real thing: an appointment, a medication change, or a care-management touch. Something happened before an event, not after.
Importantly, the care team reported that the model changed which patients they worried about. Several of the top-flagged patients had normal-looking recent BPs but trajectories the model picked up that a human chart reviewer would not. The model became a second set of eyes, not a replacement pair.
