Abstract
BackgroundPolygenic risk scores (PRSs) are increasingly being used to predict disease risk from genetic data. While promising in research, their clinical utility - especially when combined with non-genetic (NG) data such as lab results, physical measurements, and diagnostic history - remains uncertain. Myocardial infarction (MI), a leading cause of morbidity and mortality, is a key use case for assessing the incremental value of PRSs in risk models.MethodsUsing UK Biobank data, we evaluated the added value of PRSs for 10-year MI risk prediction. We trained models with NG data alone and in combination with PRSs, varying model complexity and the NG feature space. Two modeling frameworks were used: logistic regression and a neural network. NG data was defined using two feature sets: NG1, which included established MI risk factors from structured fields; and NG2, a high-dimensional dataset derived from millions of diagnostic codes across five linked UK Biobank electronic health records (EHR) datasets combined with NG1 features. NG2 was generated using a deep representation learning approach that produced low-dimensional embeddings capturing latent medical concepts and disease co-occurrence patterns. Each model was trained with and without PRSs and evaluated using metrics such as the area under the ROC curve (AUC).ResultsPRSs add minimal predictive value when used alone. In contrast, diagnostic data from EHRs significantly improve performance. The best results are achieved using a multimodal neural network combining NG1, NG2, and PRSs.ConclusionsPRSs provide limited standalone utility for MI prediction compared to detailed diagnostic data. Their clinical value likely lies in integration with EHR-based models. Future work should focus on multi-modal approaches that contextualize PRS information within broader clinical data.</p>