You may have found MS2 parameter selection a major obstacle in your metabolomics research. You need proper parameters for feature finding algorithms to see past noise and extract useful features. But manually optimizing these parameters is really inefficient, as there are too many possible parameter combinations.
AutoTuner is a new tool built to address this issue: To find the correct parameters much more quickly, without compromising on accuracy. Craig McLean and Elizabeth Kujawinski, the creators of Autotuner, were happy to talk us through the details of how AutoTuner works.
AutoTuner and another commonly used automatic optimization tool, isotopologue parameter optimization (IPO), find the best parameters by sifting through many candidate combinations for you. However, IPO relies on gradient descent for optimization, which requires repeated runs of the expensive centWave algorithm. This can tie up your computers for days on end.
Instead, AutoTuner relies on a statistical inference strategy that slashes runtime down to minutes in many cases. In this post, we dig into how AutoTuner achieves this blazing fast computational speed, and show how AutoTuner can improve your results, as well as how to get started with it.
AutoTuner quickly optimizes parameters for the centWave algorithm
AutoTuner optimizes seven different parameters for the popular centWave peak-picking algorithm used in XCMS and mzMine. This means you can integrate AutoTuner into any workflow that already uses centWave.
Craig and Elizabeth chose seven centWave parameters to optimize that have the highest impact on peak selection results:
How AutoTuner finds optimal parameters
At the outset, AutoTuner asks you to identify peaks in the total ion chromatogram (TIC) of the experiment. This is a representation of the total amount of ions that passed through the column at each time point in the chromatography run. AutoTuner helps you out by applying a sliding window analysis to suggest potential peaks.
It then automatically estimates the group difference parameter using the maximum time difference of pairs of peaks from the same feature in the TIC coming from different samples.
Once you’ve identified TIC peaks, AutoTuner computes the remaining parameters by itself:
- It estimates PPM (error) using empirical distributions of binned m/z values which are used to test the difference between hypothesized true features and noise.
- It calculates S/N Threshold using the minimum intensity per bin, minus the average noise intensity, divided by the standard deviation of noise intensity.
- It estimates Scan count as the smallest number of scans contained across all bins.
- It computes Noise and Prefilter intensity as 90% of the minimum integrated bin and single scan intensities, respectively.
- The Minimum Peak-width is the smallest number of scans in any bin, multiplied by the duty cycle (specific to instrument).
- AutoTuner estimates Maximum Peak-width by expanding bins until it finds a scan that falls below the computed PPM threshold.
Autotuner then estimates dataset-wide parameters:
- PPM and S/N Threshold are the average of these values across peaks weighted by the number of bins per peak.
- For Scan count, Noise, Prefilter intensity, and Minimum Peak-width, AutoTuner returns the minimum values from all bins detected.
- Group difference is the dataset-wide maximum.
- Maximum Peak-width is the average of each largest peak-width by-sample.
Elizabeth and Craig designed each step of AutoTuner to minimize computational load and improve performance, without sacrificing accuracy. To test this design, they compared AutoTuner’s performance to IPO in an example metabolomics dataset with 85 pure metabolite standards.
AutoTuner finds more true features than IPO
Elizabeth and Craig first compared the rates at which AutoTuner and IPO detected features belonging to standards spiked into their samples. AutoTuner detected a larger portion of known features across isotopologues.
They then compared AutoTuner and IPO on a different dataset, generated from cell cultures. Although the two methods identified a large number of features in common (1,022), IPO found many features not seen by AutoTuner (2,606) while AutoTuner found few that were not found by IPO (203).
Could this be a matter of quantity vs quality? To answer this question, Craig and Elizabeth calculated the continuous wavelet transform (CWT) coefficient of these non-overlapping features – which measures peak steepness – and compared them between methods. Here, Craig and Elizabeth used CWT as a proxy for feature resolution (a measure of how well peaks were separated by chromatography), allowing them to compare the quality of features found uniquely by each method.
They found that the CWT coefficient was significantly higher in unique AutoTuner features compared to IPO features. This suggests that the unique features picked up by AutoTuner were more likely to be real.
Of course, when you’re preprocessing data, your end goal is often to find features that can be matched to database spectra. It’s also easier to be confident in your feature table when it contains more identifiable metabolites. Interestingly, when Craig and Elizabeth looked for MS2 spectra associated with features found by one method and not the other, they found that unique AutoTuner-identified features were more likely to have associated MS2 spectra than IPO-identified features.
Overall, AutoTuner identified more isotopologues of standards than IPO, and found identifiable metabolites at a higher rate than IPO. However, given the stochastic nature of some of AutoTuner’s calculations, Craig and Elizabeth wanted to make sure that AutoTuner’s estimates were stable when calculated many times in the same data. Consistent parameter estimates would ensure they could see AutoTuner’s high performance consistently in datasets of different sizes.
Estimated parameters are consistent
To test the stability of AutoTuner parameter estimates, Craig and Elizabeth compared parameter estimates that were made from 55 subsets (3 to 9 samples each) of a 90-sample rat fecal microbiome (community) dataset.
They found that results were highly consistent across these subsets, although the coefficient of variation (CV) of these predictions decreased with sample size. In six out of the seven parameters tested in the community dataset, the CV across predictions was less than or equal to 0.1 when using a sample number of 9. In other words, AutoTuner parameter estimates are highly stable and become more stable as more samples are included.
AutoTuner dramatically improves run time
The most remarkable feature of AutoTuner is its blazing fast speed.
Across two ionization modes in the standards, culture, and community dataset, AutoTuner ran between 400 and 7,000 times faster than IPO. It’s important to note that these computations were run on subsets of the full data (6, 4, and 6 samples from culture, standards, and community data, respectively). But this illustrates the amazing gains in efficiency that AutoTuner provides.
How to use AutoTuner in your work
AutoTuner is implemented as a Bioconductor package for the R programming language. As with all Bioconductor packages, instructions on installation, documentation, and vignettes can be found on the Bioconductor website.
A potential drawback of AutoTuner is that it is designed only for use with the centWave feature detection algorithm, as opposed to IPO, which can estimate parameters for many different feature detection methods. Therefore, you should be sure to integrate AutoTuner into a workflow that uses MZmine or XCMS. Craig and Elizabeth also recommend a minimum sample size of 9 for best results in culture data, and 12 for community data.
We love helping researchers implement cutting-edge tools and methods in metabolomics. If you have any questions, please contact us.