Pressman AR, Lo JC, Chandra M, Ettinger B., J Clin Densitom. 14(4):407-15. doi: 10.1016/j.jocd.2011.06.006. Epub 2011 Oct 1., 2011 Oct 01
Area under the receiver operating characteristics (AUROC) curve is often used to evaluate risk models. However, reclassification tests provide an alternative assessment of model performance.
We performed both evaluations on results from FRAX (World Health Organization Collaborating Centre for Metabolic Bone Diseases, University of Sheffield, UK), a fracture risk tool, using Kaiser Permanente Northern California women older than 50yr with bone mineral density (BMD) measured during 1997-2003.
We compared FRAX performance with and without BMD in the model. Among 94,489 women with mean follow-up of 6.6yr, 1579 (1.7%) sustained a hip fracture. Overall, AUROCs were 0.83 and 0.84 for FRAX without and with BMD, suggesting that BMD did not contribute to model performance. AUROC decreased with increasing age, and BMD contributed significantly to higher AUROC among those aged 70yr and older. Using an 81% sensitivity threshold (optimum level from receiver operating characteristic curve, corresponding to 1.2% cutoff), 35% of those categorized above were reassigned below when BMD was added. In contrast, only 10% of those categorized below were reassigned to the higher risk category when BMD was added. The net reclassification improvement was 5.5% (p<0.01).
Two versions of this risk tool have similar AUROCs, but alternative assessments indicate that addition of BMD improves performance. Multiple methods should be used to evaluate risk tool performance with less reliance on AUROC alone.