Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter

Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter

Qiong Pan, Wenya Hu, Ding He, Chen He*, Linzhou Zhang, Quan Shia

state Key Laboratory of Heavy Oil Processing, Petroleum Molecular Engineering Center (PMEC), China University of Petroleum, Beijing, 102249, China 

Department of Ocean Science and Hong Kong Branch of the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, 999077, China

*Corresponding author: E-mail address: hechen@cup.edu.cn (C. He).

DOI: 10.1016/j.talanta.2023.124484


Keywords: Dissolved organic matter; FT-ICR MS; Molecular formula assignment; Orbitrap MS


Abstract: High-resolution mass spectrometry (HRMS) provides molecular compositional information of dissolved organic matter (DOM) through isotopic assignment from the molecular mass. However, due to the inevitable deviation of molecular mass measurement and the limitation of resolving power, multiple possible solutions frequently occur for a given molecular mass. Lowering the mass deviation threshold and adding assignment restriction rules are often applied to exclude the incorrect solutions, which generally involves time-consuming manual post-processing of mass data. To improve the result accuracy in an automated manner, we developed a molecular formula assignment algorithm based on machine-learning technology. The method integrated a logistic regression model using manually corrected isotopic composition and the peak features of HRMS data (m/z, signal-to-noise ratio, isotope type, and number, etc.) as training data. The developed model can evaluate the correctness of a candidate formula for the given mass peak based on the peak features. The method was verified by various DOM samples FT-ICR MS data (direct infusion negative mode electrospray), achieving a ∼90% accuracy (compared to the traditional approach) for formula assignment. The method was applied to a series of NOM samples and showed a significant improvement in formula assignment compared with the mass matching method.