Abstract
Type 2 diabetes mellitus (T2D) is a prevalent metabolic disorder with significant health and economic burdens worldwide. The relationship between inflammation-related indicators and the risk of developing new-onset T2D remains underexplored. This study aims to identify and validate an interpretable predictive model for incident T2D using inflammation-related indicators. We analyzed data from 220,937 participants free of diabetes at baseline in the UK Biobank. Six machine learning algorithms were employed to construct predictive models. Feature selection was performed using Least Absolute Shrinkage and Selection Operator regression. SHapley Additive exPlanations (SHAP) were used to interpret the best-performing model. A genetic risk score (GRS, an aggregate measure of genetic susceptibility to T2D) was constructed, and multivariate Cox regression assessed the combined effects of genetic and inflammatory factors on T2D incidence. The Extreme Gradient Boosting model demonstrated the best performance (training set AUC = 0.863, testing set AUC = 0.838). Key predictors included body mass index, cholesterol, age, alanine aminotransferase, high-density lipoprotein, and Prognostic Nutritional Index (a marker predictive of inflammation and nutritional outcomes). SHAP analysis revealed significant contributions from these features. C-reactive protein and white blood cell count showed strong associations with future T2D risk. Integrating the GRS significantly improved the model's predictive performance (ΔAUC = +0.025, P < 0.05 via DeLong's test). This study presents an interpretable machine learning model for new-onset T2D risk prediction, emphasizing the role of inflammation and genetic factors. The findings provide a valuable tool for early T2D prevention and intervention, offering insights into the complex interplay between inflammation and diabetes development.</p>