I'm a biologist currently doing my Master in Biology at Université de Sherbrooke, in Canada. I have a bachelor's degree in Economics as well and I speak French, English and German. I've been a craft beer specialist since 2011 and still have a foot in that sector in my hometown. My résumé can be found here: https://bit.ly/JRL_CV and a short bio can be found on this page https://www.laforestlab.com/team/jonathan-rondeau-leclaire.
The Laforest-Lapointe Lab deals in microbial ecology. I apply several bioinformatic tools in the context of microbiome studies in humans and plants. I have been testing all kinds of tools and implementing them in pipelines suited to the needs of my colleagues' research. I am currently working on a Boreal Moss Microbiome project and a Human Saliva microbiome project, each of which present several technical and methodological challenges. Alongside this, I try to steadily push my understanding of statistics and key ecological theories.
Since there are so many available tools worthy of a place in a single step of a good metagenomic pipeline, I am regularly reviewing the literature to understand, and eventually advise on, which tool to use for any given task. Publications in the field of metagenomics and microbial ecology often lack a rationale behind the choice of tools when multiple approaches exist to a given aim; this suggests such a choice is often being made based on what others have done. Any bias introduced by a given tool spreads across the literature and a tool with a high citation rate may give a false impression of being gold-standard.
I have tested and integrated on our supercomputer several upstream data processing tools which aim at solving similar problems, e.g. taxonomic classification, metabolic profiling, metagenome assembly, binning and clustering. For downstream analysis of such data as will come out of these pipelines, I have also been testing a variety of approaches, widespread and _old school_ ones regarded as standards of ecology, as much as novel and cutting-edge ones adapted to the systemic properties of NGS data. My aim is to help choose among these tools and understand where they disagree and why.
I am particularly interested in metagenome assembly as a tool to tackle the socalled _dark matter_ of metagenomes. From binning and clustering strategies to strain richness exploration, I think it's a fascinating way of looking at community characteristics while always stepping as far away as possible from the biases inherent to reference-based methods, even though in the end, all approaches eventually need references to attribute taxonomy or functional potential.
I am also a strong, albeit amateur, proponent of proper language in microbiome researche. In metagenomics, we are looking not at the abundance of taxa, but that of the reads we attribute them. We are looking not at the diversity of the _gut microbiota_, but that of a _sample's microbiome_ which is proposed to be an _estimator_ of actual diversity. Although my formal statistical training is only intermediate, I understand that properties of NGS data impose limitations on how it can be analysed, a notion that is often not mentioned in so many studies, let alone addressed. I always look forward to discuss these matters with knowledgable people, all the more since there are different schools of thought out there. I hope one day to myself creatively tackle, and perhaps unify, these divergent approaches to a single problem.
As of 2022, many bioinformatic tools have been reviewed, tested and benchmarked; however, the focus is often on computational requirements. I want to find out which tools are the most accurate and what biases are being introduced by each approach. Although some benchmarking have produced such information for certain tools (especially assemblers and aligners), the answer is often non-trivial, especially with assembly-based (de novo) approaches.
Though it is possible that, one day, gold-standards be recognized in metagenomic bioinformatics, many parameters and tool choices seem to depend on data quality and origin, research objectives, and processing approach choices. Since the field of metagenomics is undergoing an extremely fast evolution, I expect that the best way to help researchers prepare their data for downstream analyses is not only to make the tools accessible, but to guide them in choosing the right tools and parameters thereof.
In this regard, my goal is to summarize
- which tools are better suited to various data processing task;
- their function and crucial parameters;
- the reviews supporting their accuracy;
- articles supporting the choice of parameters;
- any recommended best practice for their use.
I believe that such information will be valuable to any researcher who does not want to blindly rely on preconceived pipelines that may or may not suit their data and scientific objectives. I think any bioinformatic analysis should be understood and thought-through before being run, and therefore believe that a tool-intensive process such as metagenome sequencing data processing should be pondered and understood.