Skip to content

Latest commit

 

History

History
53 lines (38 loc) · 3.18 KB

TODO.md

File metadata and controls

53 lines (38 loc) · 3.18 KB

Configuration and flexibility

  • Set a flexible configuration file to be able to assemble:
    • one genome
    • multiple genomes
  • Set the necessary adapter FASTA files depending on the technology (NextSeq or MiSeq) or allow detection from filename
  • The adapter files can be downloaded from the trimmomatic repository and the PhiX genome (NC_001422.1) as well to enable full reproducibility
  • SPAdes provide an alternative flag to --careful that is --isolate (introduced in 3.14.0) that could be used for high-coverage (100x) isolate genome. Note that there is no one-size-fits-all as always
  • Same for recycler with a -i True for isolate
  • Adjust the maximum length of the kmer required by recycler based on the SPAdes output

Reporting and quality control

  • Compute the numbers of contigs below 1kb and remove with seqkit
  • Assess completeness and contamination with checkM, but remove the plasmid check. Available in bioconda (1.1.3)
  • Annotate genome with Bakta (5S extraction)
  • Extract the LSU (23S) and SSU (16S) with metaxa2
  • Assess contamination with MDMcleaner
  • Compute basepairs statistics and coverage with seqkit
  • Compute assembly statistics with QUAST
  • Generate checksums with md5 hash on the gz version of the raw reads and the final genome for deposition on Coscine
  • Extract above statistics to produce a standard compliant table based on the sample table
  • Include a report rule

Convert the SOP steps into a Snakemake workflow

Listed in the reverse order because it is easier for Snakemake design. The subsections could serve as building separate Snakefiles to be included.

File management

  • Generate genome FASTA file only
  • Generate genome FASTA with plasmids if present (consider snakemake checkpoints for evaluation of condition)

Genome assembly

  • Assemble with spades (v3.13.1). Snakemake wrapper only for metaspades. Available in bioconda (3.15.3)

Plasmid reconstruction

  • Remove plasmid contigs from reads with bbduk included in the bbmap (v38.84). Snakemake wrapper available (38.90)
  • Extract plasmid sequences with recycler (v unknowm) from de novo assembly graph and alignment. Available in bioconda (v0.7)
  • BAM/SAM management with samtools (v0.1.19). Snakemake wrapper available (1.10)
  • Alignement of reads on the assembly graph with bwa mem (v0.7.5). Snakemake wrapper available (0.7.17)
  • Indexing of the assembly graph with bwa (v0.7.5). Snakemake wrapper available (0.7.17)
  • Convert the assembly graph in FASTA with make_fasta_from_fastg from Recycler (0.62). Available in bioconda (0.7-3)
  • Plasmid reconstruction with plasmidspades (v3.13.1). Snakemake wrapper only for metaspades. Available in bioconda (3.15.3)

Quality filtering

  • Remove phiX sequences from reads with bbduk included in the bbmap (v38.84). Snakemake wrapper available
  • Remove adapters and filter length with trimmomatic (v0.39). Snakemake wrapper available but older (0.36) so bioconda.