: Publication 19082

Publication 19082

Title:	MetImputBERT: a pretrained BERT framework for missing value imputation in NMR metabolomics data
Journal:	Briefings in Bioinformatics
Published:	1 Nov 2025
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/41470048/
DOI:	https://doi.org/10.1093/bib/bbaf682

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Missing values in nuclear magnetic resonance metabolomics data compromise downstream clinical interpretation. Here, we present MetImputBERT, an imputation method based on a pretrained BERT framework. MetImputBERT uses the masks in the masked language model to simulate missing values and leverages predictions and reconstructions to these positions to simulate the imputation process. The learning of MetImputBERT is driven by minimizing the reconstruction error. MetImputBERT was pretrained on the largest metabolomics dataset to date, comprising data from over 230 000 individuals in the UK Biobank. When new datasets with missing values were encountered, MetImputBERT loaded the pretrained parameters and directly imputed the missing values by inferring their reconstructed estimates. MetImputBERT outperformed commonly used methods-K-nearest neighbors, multiple imputation by chained equations, and singular value decomposition-in imputation performance on two independent test sets. We provide an open-source Python tool that allows users to quickly impute missing values in their own NMR metabolomics data without any additional training.</p>

5 Keywords

Algorithms
Humans
Magnetic Resonance Spectroscopy
Metabolomics
Software

5 Authors

Shizheng Qiu
Yang Hu
the Alzheimer's Disease Neuroimaging Initiative
Guiyou Liu
Yadong Wang

1 Application

Application ID	Title
249728	Investigating Risk Factors, Susceptibility Genes, and Early Prediction of Complex Brain Disorders Using UK Biobank Multimodal Data

Enabling scientific discoveries that improve human health