Metabolome mining and annotation tools have proliferated in the last few years. So much so, it’s become difficult to meaningfully combine all the information from these different tools.
That's why Madeleine Ernst, Justin van der Hooft, and colleagues developed MolNetEnhancer: a tool that adds the output from several of the most useful tools onto a molecular network. Conveniently, you can then view this annotated network in CytoScape.
When all the information is in context like this, it’s much easier to understand the chemical space in your sample, find new molecular families, and gain clues into their structures and origins.
The different annotation layers that MolNetEnhancer combines
Molecular network
Everything starts with a molecular network: a network that organizes all the MS/MS spectra in your sample by their spectral similarity. Each node in the network represents one MS/MS spectrum, and the links between the nodes (edges) represent spectral similarity.
MolNetEnhancer works with the GNPS Molecular Networking workflows and you can choose between either classical molecular networking or the more recently added feature-based molecular networking. Classical molecular networking only considers the MS/MS spectra, whereas feature-based molecular networking also factors in MS1 information, such as isotope patterns, retention time, and ion mobility.
In both workflows, GNPS builds a molecular network and runs a search for hits in the GNPS spectral library.
However, in most cases, a library search only annotates a small fraction of the nodes in your network (~2-4%), depending on your sample and the database you match against. For example, human metabolites are generally described better than plant metabolites.
You will have more information than when you started, but most features in your network will still have no annotation. If you’re looking at a molecular family where no member has a library hit, then you’ll have to rely on a tedious manual inspection of the spectra to get any information.
Many classical workflows stop here, but MolNetEnhancer helps you add much more information in an automated way.
MS2LDA
MS2LDA learns commonly co-occurring mass peaks and neutral losses (Mass2Motifs). The underlying assumption is that these relate to certain molecular substructures.
When you then use MS2LDA to label each of your mass spectra with the Mass2Motifs it contains, you can do three exciting things.
Learn more about molecular relations in your sample: The substructure annotations allow you to highlight which features are structurally related and to see which samples contain which distribution of certain substructures.
Gain clues about the structure of features: If you get a library hit for the substructure, you can use this as an important clue to the structure of any molecule that contains this substructure.
Re-use Motif databases: The MS2LDA website hosts a set of pre-learned MotifSets in MotifDB that already come with analytically helpful annotations.
MS2LDA is also great because it’s an unsupervized technique and you can annotate all your spectra with it, without limitations. And it comes with a web application where you can inspect the discovered substructure patterns.
MolNetEnhancer enables you to visualize the information from MS2LDA on top of the molecular network in two ways: a pie chart indicates the relative importance of Mass2Motifs for each feature; and additional colored edges between nodes indicate shared Motifs.
And this is just the beginning. MolNetEnhancer also allows you to add annotations from the best in silico annotation tools.
In silico structural annotations
Spectral libraries are still quite small, but structural libraries, like PubChem, have many more entries.
Several techniques can bridge the gap between your mass spectrum and these described structures. They usually involve a combination of rules and predictions.
These tools all give you a list of potentially matching structures for a large portion of the mass spectra in your sample.
MolNetEnhancer can integrate the output from different in silico annotation tools. You can choose to run either:
- Network Annotation Propagation;
- DEREPLICATOR;
- VARQUEST; or
- SIRIUS+CSI:FingerID (not supported through the GNPS workflow, but jupyter notebooks are available)
MolNetEnhancer takes the resulting structural candidates and adds them as another layer to your molecular network.
Once you have putative structures for many of your features, you make an important step further.
Annotating molecular classes
ClassyFire annotates molecular classes to any molecular structure that its extensive chemical class ontology ChemOnt describes. Recently released, it means you can now add molecular class annotations to nearly all the structural candidates in your network.
MolNetEnhancer refines ClassyFire results further by using the topology of the molecular network: molecules that are clustered together should probably all belong to the same molecular class. So MolNetEnhancer finds the majority consensus among all class annotations and labels the molecular family with that majority class.
In just one glance at the network, you can now see which types of molecular classes are present in your sample. Put all this together and you get a richly annotated molecular network that makes drawing conclusions or even generating new hypotheses from your dataset much easier.
Visualize your results in Cytoscape
You can give MolNetEnhancer the outputs from your molecular networking, your MS2LDA Motif labeling, and your in silico annotation, and it will combine them and add ClassyFire class annotations.
MolNetEnhancer exports all this information into a GraphML file, which you can easily read with Cytoscape, an open source tool for exploring networks and annotations visually.
Here are just a few examples of exactly how MolNetEnhancer can streamline your research.
Applications of MolNetEnhancer
MolNetEnhancer for biomarker discovery
Let’s say you’re studying the difference between affected and healthy patients. After you’ve identified compounds that explain the difference between the two groups, MolNetEnhancer’s annotations might tell you that the majority of the compounds in your list are fatty acids.
Since you know you should look at fatty acids, you can now refine your sample preparation and instrument settings to focus on lipids, and stop wasting time on peptides.
MolNetEnhancer for natural product discovery
Now, say a particular plant species contains a molecule with interesting antimicrobial properties. However, the species doesn’t produce enough of this compound to make it worth cultivating.
However, other species of the same genus could possess similar molecules, and have them in greater abundance.
Once you have a dataset with the spectra from a range of species, you can use MolNetEnhancer to find molecular families that share substructures with the compound you’re interested in.
You can draw general conclusions about the chemical spaces present in different species and potentially link these insights to genetic differences. More detailed examples are available in the MolNetEnhancer paper.
Of course, this is just a taster; the fastest way to see it enhance your research is to give it a try.
Use MolNetEnhancer now
Whether you’re experienced in R or Python, or prefer a web interface, MolNetEnhancer is easy to use:
- R – Use MolNetEnhancer as an R package. Check it out on Github;
- Python – It’s also available as an easy-to-use Python package;
- GNPS – MolNetEnhancer is fully integrated in the GNPS platform. You simply run the full workflow there, without writing any code.
GNPS also allows you to easily share your workflow with anybody so that they can reproduce your research.
In less than two years, MolNetEnhancer’s innovative visualizations of compound annotations have already helped multiple teams improve their workflow. It’s no surprise it’s already been cited over 50 times.
What’s next for MolNetEnhancer
MolNetEnhancer is a great platform and will keep improving on integrating and making the most of the annotations techniques that are available. Here are some advances we are looking forward to:
- Spec2Vec: Using more advanced spectral similarity metrics to build more refined molecular networks and improve library searches. Read more about Spec2Vec.
- Consensus building: With the right approach, substructure and in silico annotations could be merged from multiple tools to re-rank candidate lists by consensus and plausibility, like the presence of substructures, for example.
- Large and reliable MotifSets: Mass2Motif libraries that describe a larger space of possible substructures for chemical mixtures of various origins will certainly improve the results.
- Mass2Motif network optimization: Trusted Mass2Motif annotations could also improve the structure of the molecular networks by contributing their information to the building of the molecular networks.
Speed up your research
We can help you explore metabolomics datasets more efficiently. Our tools and infrastructure make automating analysis steps with machine learning much easier, so if you’re struggling with metabolomics analysis bottlenecks, reach out to us.