Abstract
Background:Despite high heritability estimates, complex genetic disorders have proven difficult to predict with genetic data. Genomic research has documented polygenic inheritance, cross-disorder genetic correlations, and enrichment of risk by functional genomic annotation, but the vast potential of that combined knowledge has not yet been leveraged to build optimal risk models. Additional methods are likely required to progress genetic risk models of complex genetic disorders towards clinical utility.Methods:We developed a modeling framework that uses annotations providing genomic context alongside genotype data as input to convolutional neural networks. To test this framework, we used a matched-pairs dataset of individuals with and without type 2 diabetes. We compared the performance of models trained on genotype data alone to those trained on context-informed genotype data. We also introduced adversarial tasks to remove the ability of models to predict genetic ancestry while assessing whether risk prediction performance was preserved.Results:Here, we show that a neural network using genotype data (AUC: 0.66) and a convolutional neural network using context-informed genotype data (AUC: 0.65) both significantly outperform polygenic risk score approaches in classifying type-2 diabetes. Adversarial ancestry tasks eliminate the predictability of ancestry without changing model performance.Conclusions:We present neural network approaches that improve classification performance by integrating genotype data with genomic context while accounting for ancestry. Although current performance is not yet sufficient for clinical use in type 2 diabetes, incremental advances such as these may help move genetic risk prediction toward future clinical utility.</p>