Skip to content
sneumann edited this page Oct 24, 2016 · 10 revisions

This is a set of various updates in current xcms developments.

New parallel processing support in XCMS

XCMS has supported parallel processing since 2008 in several processing functions that promise a linear speed-up if run in parallel on multiple input files, like e.g findPeaks() used in the xcmsSet() function. The parallelism was controlled by the nSlave argument.

Several mechanisms were supported, the first one, the Message Passing Interface (MPI) is the most powerful, as it is the standard on big HPC cluster systems. MPI is (still) a wide-spread standard for message passing (i.e. it is covering more than just firing up a bunch of sub-tasks), and runs on single multi-core servers, but is also able to "glue" a whole HPC cluster into a seemingly single machine, and can be integrated with batch systems like e.g. Sun Grid Engine (SGE, which later evolved into the Oracle Grid Engine (OGE) and several other children). At one stage, we were able to use "nSlave=100" on such a setup!

Later, other backend packages like SNOW and parallel were added as well, and tried in a fixed order, until one was found to be installed, which was not very flexible.

In 2012 Martin Morgan started the BiocParallel package, to provide a common interface to a number of different approaches for (massively) parallel execution. In the current xcms3 development efforts, Johannes Rainer now improved the xcms parallel execution to use the new interface. The benefit is that now you have much more control over the parallel processing in XCMS.

Examples

some_code_snippets

We will deprecate the nSlave argument in April 2017, and remove it in October 2017.

Why xcms3 ?

You might wonder why we jump from XCMS_1.51.X via 2.99.X straight to XCMS_3.0.0 in April 2017. The reason is that behind the scenes, there are some quite substantial improvements behind the scenes. First, the code is undergoing some re-factorisation, which means that functions change their names, some arguments change and R files are re-structured. Most of this is invisible to the end-user. During the re-organisation, Johannes also did a rigorous code-review and spotted issues, e.g. in the binning functions used for plotting and matchedFilter. Some of the binning functions also suffered from optimisations that balanced accuracy against speed and memory consumption. They were certainly important back in 2006, but now we can drop some of the optimisations in favour of consistent results. This implicates that with xcms3, you might not be able to fully replicate all numbers you obtained with some 1.X.Y version. Finally, xcms3 also paves the way for a new on-disk format for xcmsRaw files implemented in MSnbase by Laurent Gatto.

All of these developments are a good reason to bump the major version, and since there was a paper by Paul Benton XCMS2: Processing Tandem Mass Spectrometry Data.... But due to formatting limitations on PubMed, this XCMS2 always got changed to XCMS2, and to avoid confusion, we decided to go straight to XCMS3 !

Clone this wiki locally