Skip to content

Unlabelled LCMS Data Processing in El Maven

Raghav Sehgal edited this page Aug 25, 2017 · 1 revision

This is a detailed guide for processing unlabelled LCMS data with ElMaven. We advise you to go through Getting started with ElMaven. Following is the general workflow involved in ElMaven:


ElMaven Workflow. Peak detection, alignment, grouping and scoring are done multiple times for best results. Data from two different sample sets can be compared using visualization tools and easily exported to other formats.

1. Load Sample

Load all the .mzXML files into ElMaven. If your data has blank samples, select them and Set as a blank sample as depicted below. Click on the Show Samples panel on the right toolbar in case the sample list is not visible.

The data set provided does not have blanks and this is for demonstration purposes only.

2. Alignment

Multiple runs in LCMS can lead to a drift in retention time across samples. Alignment of the samples corrects for that drift. To perform alignment, click on the Align button on the top tool bar as shown.

You will see a dialog box for alignment settings. The first panel is for group selection criteria. 'Group' here refers to the set of LCMS peaks caused by a particular metabolic ion in all the samples. Groups with high quality peaks are used for alignment purposes.

The first entry in the panel, Group must contain at least [X] good peaks, defines the least number of samples that should have a particular peak to consider that peak group for alignment. Therefore its value should not exceed the number of samples in your experiment. The second input limits the total number of groups in an alignment. Setting too high a number increases the computation time while too low a number might lead to missing important peak groups. The default value of 1000 is considered appropriate for most experiments. Third entry is Peak Grouping Window which controls the number of scans required to get the most accurate peaks. Enter a high number if the reproducibility is low to ensure best results.

The next panel is for Peak Selection settings. The Minimum Peak Intensity values depends on the instrumentation. Minimum peak S/N ratio is the minimum signal to noise ratio of your experiment and Minimum Peak Width is the least number of scans to be considered to evaluate the width of any peak. If you are performing untargeted feature detection, choose Automated Peak Detection algorithm for peak detection.

In the Alignment Algorithm panel, fill in the number of times ElMaven should fit a model to the data in order to align it. The polynomial degree is the degree of the non-linear model. Recommended settings are entered by default. Click the Align button at the bottom.

Alignment Analysis

An alignment graph is created after the process is complete that plots the deviation in retention time across samples against the retention time. You can also see the extracted ion chromatogram for all samples at a particular retention time. To make sure the alignment has been done properly, enter an m/z value in the top-right text box and look for peaks in the EIC. If no peak is visible, zoom out by clicking on the magnifying glass icon above the EIC panel. The ppm(parts per million) window beside the m/z text box reflects the mass accuracy of the instrument. Increase the value in the ppm window if there are still no peaks visible. Contrary to this, if there is too much noise in the data, decrease the value in the ppm window and check again.

The circles on top of the EIC indicate the quality of the peak. Greater the size, higher the peak quality.

You can also see a bar graph above the EIC. By default, it shows the Area Top or average intensity of the top three points of a peak. You can also depict the Peak Area, retention time, quality etc of the peak by setting the appropriate option in the drop-down menu in the top-right corner.

Repeat the process for a few m/z values and look how well the peaks have been aligned. Failing which, you are suggested to change the alignment settings, in particular the grouping window size and peak intensity, and perform the alignment again. After successful alignment of the samples you may save your workspace as a .mzroll file from the File menu for future use.

3. Peak Detection

Peak detection is performed after alignment to find more peaks/features. The algorithm groups identical peaks across samples and calculates the quality score by a machine learning algorithm. Click on the Peaks icon on the top toolbar and it will open a Peak Detection settings window.

EIC Processing and Filtering

By default, the EIC Processing and Filtering tab is open. Fill in the user settings in the EIC Processing panel. There are three algorithms provided for EIC smoothing: 1)Savitzky-Golay- low-pass filter performs a least square fit for a small set of data to a polynomial and center of the fitted curve as a smoothed data point. It preserves the original shape and features of the signal better than most other filters (Learn more), 2)Gaussian- reduces noise and detail by averaging over neighborhood with the central pixel having higher weight. (Learn more), 3)Moving Average- takes the simple average of all points over time therefore only used when there are no trends in the data. Least preferred method for smoothing (Learn more)

Specify the EIC Smoothing Window size where larger values lead to greater smoothing. Peak Grouping is used to define the rt window that should be used when grouping the peaks. The value would depend on the reproducibility of your experiment. Use a larger window size if there is more variability between samples.

Fill in the settings for Baseline Calculation. ElMaven calculates a baseline for each EIC for signal/noise ratio and peak area calculations. For this it removes the top x% of data points in the original EIC and then smooths it based on the value of Baseline Smoothing. The value of x is input by the user in the Drop top x% intensities from chromatogram text box.

Fill in the settings for Peak Scoring. As mentioned earlier, a machine learning algorithm in ElMaven assigns a quality score to every peak based on the probability that it represents a genuine analyte. The parameters can be redefined to allow the user to modify the scoring without retraining the algorithm. If there is a training model file available for the data, you can load it as a Peak Classifier Model File. Enter the Min Peak Intensity below which the peaks cannot be quantified reliably. For Peak Width, enter the desired scanning rate to distinguish peak from baseline and noise. Higher value leads to broader peaks. Min Signal/Blank Ratio helps reduce noise by eliminating peaks that are also present in the blank. Similarly, Min Signal/Baseline Ratio is to eliminate noise in the signal itself. For Min Good Peak/Group, enter a value less than or equal to the number of samples being analyzed. It defines the minimum number of samples in a group that have a peak at a particular m/z and rt combination.

Feature Detection Selection

Move to the Feature Detection Selection tab. In case of untargeted feature detection, check the box titled Automated Feature Detection and enter the Mass Domain Resolution which is equal to or greater than the resolution power of the instrument in ppm (parts per million). Time Domain Resolution is the average number of scans across a peak. This value also depends on the instrument. If you are looking for peaks in a particular m/z, intensity or rt range, there is an option to adjust the ranges. There is a Auto Detect and Ignore Isotope option in case you do not wish to find isotopic peaks.

The Match Fragmentation panel is currently under work for LC/MS/MS data processing.

Check the Report Isotopic Peaks box if you have labeled data. Clicking on the Isotope Detection Options will open the user settings window for Isotope Detection. Choose the labels present in your sample and how you wish to visualize them. Click on Find Peaks to perform Peak Detection.

Manual data inspection

A new table will appear on the bottom with peak groups as rows and their associated m/z, retention time, max intensity, etc as columns. To hide/unhide this table, click on the newly created, lowermost icon on the right toolbar. A new icon will be created every time you perform peak detection. Click on any row to look at the detected peak groups across samples.

You can go through the table and mark the peaks as 'good' by clicking on the blue check mark on the toolbar above the table and 'bad' by clicking on the orange X. You can also right click an entry and mark it as good or bad. There are multiple export options available for peak data. You can either generate a PDF report to save the EIC for every metabolite, export data for a particular group in .csv format or export the EICs to a Json file.

4. Comparing samples from different sets

Say, you are analyzing samples that differ in a certain respect, for example, samples from a diseased cell and a healthy cell. In that case you would want to compare the EICs from the two sets of samples in order to study the difference between them. By default, all sample files in the samples panel are marked as set 'A'. You can go through the list and classify the samples by double-clicking the set column and entering a text label for each row. In case your signal needs to be normalized by a scaling factor, double-click the Scaling column and enter the factor by which that sample should be scaled. Normalization could be required if the sample volumes are different between two sets, or the dilution factor are different, etc. Repeat the peak detection process after you change the scaling factor.

Select the Scatter Plot icon in the toolbar above the table to open the Compare Samples window. Set the min p. value, Min. Intensity and number of Min. Good Samples to filter the data. The minimum number of good samples should be less than or equal to the number of samples being analyzed. The circle size in a scatter plot represents the log2 fold change between the two sets. Enter the min LOG2 Fold Diff to be used as cut-off. Set Missing Value fills in the entered value where there is missing data. Select the two samples to be compared in the top panel and click Compare Sets and Done.

Analyze the resulting plot. In this example, set B has consistently higher intensities than set A samples. Most of the points falling on the diagonal line indicates high similarity between the sets. Click on the data point (or circle) to look at the associated EIC. The Show Spectra widget on the right toolbar opens up the mass spectra of the selected data point in a new window. The x-axis represents the m/z value and intensity of that metabolite is represented on the y-axis. Double-click on your ion of interest so it is marked red. Pointing your cursor on the next peak will give you the percentage difference in abundance between the two ions. If your data has C13 ions, the value between the C12 and C13 peaks will be approximately 1.0033. You can look for other isotopes in the same way.

capture8

5. Compound Identification

m/z ratio can be used to find out the compound formula from the KEGG database. Click on the Show Match Compound widget on the right toolbar to obtain a list of all compounds in KEGG with the provided m/z ratio or the mass of the selected EIC. The error margin can be adjusted in the compound search window.