The Global Natural Products Social Molecular Networking (GNPS) service is a popular online toolbox for analyzing untargeted tandem mass spectrometry data. The most popular tool on GNPS is Molecular Networking (MN). MN is great for:
- Collapsing similar spectra into consensus spectra, reducing the size of a dataset;
- Helping you visualize the chemical space represented in your experiment;
- Suggesting identities for spectra that don’t have library matches.
Crucially, MN doesn’t use MS1-level data generated in your experiments. This can cause it to miss links between similar spectra, or make it lump together spectra that belong to different compounds. This means you miss compounds that were in your sample, many of which may be relevant to your research.
To remedy this, GNPS has released two new Molecular Networking tools that use MS1-level data in addition to the MS2 data used by classic MN. This allows you to better distinguish between similar compounds such as isotopes or coeluting metabolites, which were both indistinguishable in classic MN.
We spoke to Daniel Petras, part of the team that developed these tools, to understand how they work and how you can use them to annotate metabolites in your data.
Let’s first delve deeper into how classical MN annotates your spectra, and see where Daniel and the GNPS team saw room for improvement.
Classical molecular networking only uses MS2 data
MN is used to group similar spectra into clusters, which helps propagate spectral ids from spectral library matching. MN accomplishes this by carrying out a few steps under the hood:
- Condenses matching spectra into consensus spectra: Identifies spectra with the same precursor ion/high spectral similarity and groups them, then collapses grouped spectra into a consensus spectrum (one per group). These groups become nodes in the molecular network.
- Links similar consensus spectra: Calculates pairwise cosine similarity between consensus spectra and constructs edges in the molecular network based on these similarity values.
- Library annotation: Matches as many nodes in the network to library spectra as possible.
- Propagates identities: Uses chemical rationale to infer the identity of unannotated nodes that neighbor putatively identified nodes (called “identity propagation”).
Despite the power of MN, there are some issues that it cannot resolve, since it doesn’t use MS1-level data. For example, sometimes you’ll see isomers exhibit the same fragmentation patterns, but elute from an LC at different times. These isomers would be lumped together by MN. You can circle back to the original data to distinguish these compounds using additional software packages like MS-DIAL or MZmine. But using multiple software tools to analyze the same data set is laborious and time-consuming.
To allow MN to capture these differences, and to make MN a one-stop shop for annotating your data, Daniel and his team developed a new version of MN that leverages MS1 data called Feature-Based Molecular Networking.
Feature-Based Molecular Networking uses MS1 data to improve clustering results
Like MN, Feature-Based Molecular Networking (FBMN) clusters MS2 spectra, produces a consensus spectrum for each cluster, and builds edges between spectra based on cosine similarity. FBMN achieves superior resolving power to MN by incorporating MS1-level information into its clustering calculations, as well as two optional information sources: ion mobility separation and MSE.
To illustrate the advantages of FBMN, Daniel and the GNPS team ran FBMN on data generated in a drug discovery project of Euphorbia samples. They found that application of FBMN uncovered several positional isomers/stereoisomers of 4-deoxyphorbol ester compounds that were missed by MN.
In this example, using FBMN allowed the team to identify potential new anti-viral compounds that would have remained hidden had they used classical MN. Incorporating MS1-level information into clustering through FBMN is obviously a step in the right direction, which is why FBMN has been cited in more than 80 publications since its release in 2017.
However, even with the advanced power of FBMN, you can still encounter difficulties when linking nodes derived from the same precursor compound in your data. Ions from the same compound sometimes fragment differently, resulting in different MS2 spectra, which isn’t traceable using the information leveraged by MN and FBMN. To fix this, a cutting-edge tool released this year called Ion Identity Molecular Networking (IIMN) improves the accuracy of Molecular Networking results by accounting for these scenarios using new ion identity information sources.
Ion Identity Molecular Networking uses ion identity correlation to find further links between spectra
IIMN lets you link together ions derived from the same compound by using MS1 information such as chromatographic feature shapes. FBMN and IIMN have a lot of similarities, but they use different types of MS1 information.
Ion Identity Molecular Networking significantly increases your annotation rate
Daniel and the GNPS team tested IIMN on 24 public datasets, using the metaCorrelate algorithm from the MZmine workflow. These public datasets ran the gamut of biospecimen varieties, from standards to cell cultures, feces, food, marine samples, and biofluids (saliva, urine, and plasma). The team propagated identities to first neighboring IIMN nodes built on these datasets. They found that 16 out of the 24 datasets saw the number of MS2 library-annotated features increase by over 10%.
One way that IIMN improves annotation rates is that IIMN can help overcome the spectral bias of most MS2 libraries. The team observed differing proportions of protonated, sodiated and ammoniated adducts across their sample, which all differed from the adduct proportions of MoNA and GNPS, (both are ~65% protonated adducts, while in most datasets the proportion of protonated adducts was ~23%). IIMN can be used to expand spectral libraries to these less well-annotated adducts, which can have big payoffs for biological samples that feature unusual ions.
Find all the links between your spectra by combining FBMN and IIMN
Lastly, you can use FBMN and IIMN in tandem for maximum benefit, since they leverage distinct information sources. To demonstrate this, the team applied FBMN to a dataset with 88 feces/gall bladder extracts from several animal species. They zoomed in on a specific class of lipids (bile acids and bile acid conjugates) and observed that FBMN placed highly similar molecules with the same adducts in different subnetworks – an undesirable outcome.
The team then applied IIMN to this same network, and it introduced additional edges based on ion identity information. As a result, the bile acid subnetworks became far more connected. Additionally, the number of nodes was reduced because IIMN was able to collapse different ion species from the same compound into a single node. IIMN also integrated a number of singleton nodes whose ion identity could be determined, which incorporated more features into the final network.
Increasing the number of connected nodes in your molecular networks greatly improves your chances of annotating as many features as possible in your data. Getting started with the right tools is as easy as accessing the GNPS website.
How to get started with FBMN and IIMN
Integrating FBMN and IIMN into your computational workflow is a breeze. FBMN accepts MS1-level information from popular feature-detection and alignment tools like XCMS, mzMine, MS-DIAL, and standard formats like mzTab-M. You can try out FBMN at the GNPS website, and also find instructions on exporting output from your tools for use in the online FBMN interface. You can also read more details about how FBMN in the original publication through Nature Methods.
IIMN is also available on the GNPS website. IIMN is fully integrated into the FBMN workflow and interfaces with all the tools above that work with FBMN. IIMN is also featured in a publication in Nature Communications.
We help researchers to understand and use the latest algorithms in metabolomics. If you want to improve your workflow and need advice on integrating FMBN and IIMN into your research, go ahead and reach out to us.