CompMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets

Academic Article


  • A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS2Miner (Comprehensive MS2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS2 data files as inputs. The number of MS2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS2Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS2 data with a Web application GUI compMS2Explorer (Comprehensive MS2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS2Miner workflow consists of the following steps: (i) matching unknown MS1 features to precursor MS2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS2Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS2Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS2Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS2Explorer applications for both positive and negative modes ( and The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( and a version of the published application is available on the site (
  • Published In

    Digital Object Identifier (doi)

    Pubmed Id

  • 17524076
  • Author List

  • Edmands WMB; Petrick L; Barupal DK; Scalbert A; Wilson MJ; Wickliffe JK; Rappaport SM
  • Start Page

  • 3919
  • End Page

  • 3928
  • Volume

  • 89
  • Issue

  • 7