Abstract
OBJECTIVES: Proteome-wide risk models for lupus remain underexplored. We developed classification models to identify lupus from serum proteomic profiles.</p>
METHODS: Lupus patients and individuals with other autoimmune diseases in the UK Biobank were included. Differential proteomic expressions were characterized and followed by hierarchical clustering analysis. Proteomic linear and machine learning models were developed for established disease classification and future lupus prediction, and compared to polygenic risk scores (PRS). Two additional independent lupus cohorts from Sweden, as part of the Human Protein Atlas (HPA), and from China were used for replication.</p>
RESULTS: 44,173 participants with proteomic data including 2,063 individuals with at least one autoimmune disease at enrollment were studied in the UK Biobank. This included 383 lupus patients with a mean age of 43.6 ± 11.7 at disease onset. Lupus showed the largest number of dysregulated proteins among autoimmune diseases and clustered with rheumatoid arthritis. Comparison with HPA showed that ~70% of lupus proteomic associations could be replicated with moderate correlation in effect sizes. The machine learning model outperformed the linear model in identifying preexisting lupus and generalized well to future lupus prediction. Among lupus patients on immunomodulatory medications, the model reached ~90% sensitivity at 95% specificity, which was replicated in an independent cohort. Model interpretation highlighted SCARB2, SOD2, CD302, Galectin-9, and GGT5 proteins with substantial effects on lupus identification.</p>
CONCLUSIONS: Proteomic machine learning models show excellent performance for identifying preexisting lupus and generalizes well for predicting lupus before clinical diagnosis. Model interpretation identified novel candidate biomarkers for lupus.</p>