Development and validation of a model for predicting incident type 2 diabetes using quantitative clinical data and a Bayesian logistic model: A nationwide cohort and modeling study

Academic Article


  • © 2020 Wilkinson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Background Obesity is closely related to the development of insulin resistance and type 2 diabetes (T2D). The prevention of T2D has become imperative to stem the rising rates of this disease. Weight loss is highly effective in preventing T2D; however, the at-risk pool is large, and a clinically meaningful metric for risk stratification to guide interventions remains a challenge. The objective of this study is to predict T2D risk using full-information continuous analysis of nationally sampled data from white and black American adults age ≥45 years. Methods and findings A sample of 12,043 black (33%) and white individuals from a population-based cohort, REasons for Geographic And Racial Differences in Stroke (REGARDS) (enrolled 2003–2007), was observed through 2013–2016. The mean participant age was 63.12 ± 8.62 years, and 43.7% were male. Mean BMI was 28.55 ± 5.61 kg/m2. Risk factors for T2D regularly recorded in the primary care setting were used to evaluate future T2D risk using Bayesian logistic regression. External validation was performed using 9,710 participants (19% black) from Atherosclerotic Risk in Communities (ARIC) (enrolled 1987–1989), observed through 1996–1998. The mean participant age in this cohort was 53.86 ± 5.65 years, and 44.6% were male. Mean BMI was 27.15 ± 4.92 kg/m2. Predictive performance was assessed using the receiver operating characteristic (ROC) curves and area under the curve (AUC) statistics. The primary outcome was incident T2D. By 2016 in REGARDS, there were 1,602 incident cases of T2D. Risk factors used to predict T2D progression included age, sex, race, BMI, triglycerides, high-density lipoprotein, blood pressure, and blood glucose. The Bayesian logistic model (AUC = 0.79) outperformed the Framingham risk score (AUC = 0.76), the American Diabetes Association risk score (AUC = 0.64), and a cardiometabolic disease system (using Adult Treatment Panel III criteria) (AUC = 0.75). Validation in ARIC was robust (AUC = 0.85). Main limitations include the limited generalizability of the REGARDS sample to black and white, older Americans, and no time to diagnosis for T2D. Conclusions Our results show that a Bayesian logistic model using full-information continuous predictors has high predictive discrimination, and can be used to quantify race- and sex-specific T2D risk, providing a new, powerful predictive tool. This tool can be used for T2D prevention efforts including weight loss therapy by allowing clinicians to target high-risk individuals in a manner that could be used to optimize outcomes.
  • Published In

  • PLoS Medicine  Journal
  • Digital Object Identifier (doi)

    Author List

  • Wilkinson L; Yi N; Mehta T; Judd S; Timothy Garvey W
  • Volume

  • 17
  • Issue

  • 8