Abstract
BackgroundCoronary artery disease (CAD) is the leading cause of death globally, with early risk prediction being vital for timely intervention. In this study we utilized advanced machine learning techniques, incorporating clinical, lifestyle, and Genetic factors with the aim to enhance the 10-year risk prediction of CAD in early middle-aged adults (40-55 years).MethodsFor developing the machine learning models, we used data from the UK Biobank, a large cohort of over 500,000 participants. Among participants aged 40 to 55, we identified 3,012 CAD cases and 155,176 controls, without any prior CAD diagnosis. More than 700 clinical and lifestyle variables were evaluated, alongside a polygenic risk score (PRS) derived from more than 560,000 genetic variants. Forward feature selection was performed, followed by the application of a stacking ensemble model combining Random Forest and Deep Learning Neural Networks as base learners with a Logistic Regression model serving as a meta-learner. The model's performance for the 10-years prediction of CAD risk was evaluated using Area Under the Receiver Operating Characteristic Curve (AUC) and calibrated against the ACC/AHA guidelines.ResultsForward feature selection identified 22 CAD risk factors, including the PRS. The final model yielded an AUC of 0.81 in the training dataset and 0.80 in the testing dataset. At the 7.5% risk threshold, the model demonstrated higher sensitivity (0.511 vs. 0.346) and comparable specificity (0.863 vs. 0.866) compared to the ACC ASCVD model. The Net Reclassification Index (NRI) improved by 13.79%, emphasizing the model's accuracy and alignment.ConclusionsOur findings highlight the potential of combining Genetic information with clinical and lifestyle factors through machine learning to improve CAD risk prediction in early middle-aged adults. The use of a stacking ensemble approach enabled modeling of complex relationships, yielding superior performance over standard risk models. This framework supports more precise and actionable 10-year risk stratification, reinforcing the value of PRSs when combined with traditional factors. Future research should validate these methods across diverse populations and clinical environments.</p>