Prepare data for wastewater materials #43

tavareshugo · 2023-12-21T08:35:47Z

We have a subset of 60 samples from Karthikeyan et al. that show a transition between delta and omicron variants.

To run it live in a workshop we need to have the pipeline finish in under 1h.
If that is not possible, we may need to instead use a preprocessed folder with results from 60 samples and in the workshop they only process 5-10 samples.

test how long it takes to run 30 samples - locally, this takes over 2h and didn't finish (we killed the process)
test running 5 samples only with default options (see comment below).
test using --freyja_repeats 0 with viralrecon on a small number of samples. To see if it's possible to skip this step. If this throws error use 1 and see if that works. This is to save time running the pipeline.
Find primer locations for the kit used in the publication: Swift Normalase Amplicon Panels (SNAP) kit (PN: SN-5X296 (core) COVG1V2-96 (amplicon primers), Integrated DNA Technologies)
- Hugo contacted idtdna who now commercialise this product.
- Bajuna will open issue on C-VIEW repo to ask what they did in the publication.
Run 3 samples using the SWIFT BED file directly using the --primer_bed option. Also download the reference FASTA file and GFF and pass directly with --fasta and --gff.
prepare participant data directories with the files needed for the workshop. There is a folder on the hpc under sars-wastewater/participants for this. This includes:
data/reads - FASTQ files for the 5 samples to be processed
resources - reference genome FASTA and GFF annotation (may be useful for some analysis)
preprocessed - with results from 30 samples
scripts - shell scripts that they will fix in the exercises
utilities - python scripts we provide, e.g. to prepare samplesheet or tidy freyja output files
sample_info.csv metadata table with "sample,date,country,location,latitude,longitude"

The text was updated successfully, but these errors were encountered:

tavareshugo · 2023-12-21T11:38:28Z

Command to run pipeline:

nextflow run nf-core/viralrecon \
  -profile singularity  -r dev  \
  --max_memory '16.GB' --max_cpus 8 \
  --input samplesheet.csv \
  --genome 'MN908947.3' \
  --platform illumina \
  --protocol amplicon \
  --primer_set artic \
  --primer_set_version 4.1 \
  --outdir results/viralrecon \
  --variant_caller ivar \
  --consensus_caller ivar \
  --skip_picard_metrics \
  --skip_asciigenome \
  --skip_assembly

bsalehe · 2024-01-05T09:55:45Z

Hi @tavareshugo, It took me 2h 11m 9s to run 3 samples on Ubuntu. So 5 samples may take minimum "4hrs". I think we may end up only using preprocessed results instead???

tavareshugo · 2024-01-05T12:37:55Z

Bajuna tested running 3 samples with --freyja_boot 1 and it ran in 50 minutes - this is a good alternative to run in the workshop.

--freyja_boot 0 throws an error.

bsalehe · 2024-01-05T17:38:20Z

For training purpose we may instruct participants to download BED file here for SWIFT_V2 primer set version which seemed to be used by Freyja as well as the codes show here.

bsalehe · 2024-01-16T16:02:26Z

I consistently get this error:-

Command error:
  Traceback (most recent call last):
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 86, in <module>
      sys.exit(main())
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 82, in main
      collapse_primer_bed(args.FILE_IN, args.FILE_OUT, args.LEFT_PRIMER_SUFFIX, args.RIGHT_PRIMER_SUFFIX)
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 58, in collapse_primer_bed
      chrom, start, end, name, score, strand = line.strip().split("\t")
  ValueError: not enough values to unpack (expected 6, got 4)

Work dir:
  /home/bajuna/wastewater_surveillance/work/68/2c5eed79b029de1fa0c8e8530cf35b

I think the bed file even when trying to convert to BED6 as instructed here it is still missing one column apparently two strands columns as in artic bed file. I have found this swift_v2_masterfile here. So, I wonder if we need to extract the left and right primer sequences to the swift bed file hopefully to avoid this error????

Cheers!

bsalehe · 2024-01-18T09:38:34Z

Now with SWIFT provided bed file which was converted to bed6 with this one liner cat sarscov2_v2_primers.bed | perl -lane 'if (/F/) {print $F[0]\t$F[1]\t$F[2]\t$F[3]\t0\t+} else {print $F[0]\t$F[1]\t$F[2]\t$F[3]\t0\t-}' > swift_primers.bed, the pipeline ran successfully approximately 51 minutes using 3 samples. The full viralrecon code used is:

nextflow run nf-core/viralrecon \
  -profile singularity  -r dev \
  --max_memory '16.GB' --max_cpus 8 \
  --fastq_dir "reads" \
  --input samplesheet.csv \
  --genome 'NC_045512.2' \
  --fasta resources/NC_045512.2.fa \
  --gff resources/NC_045512.2.gff \
  --platform illumina \
  --protocol amplicon \
  --primer_bed resources/swift_primers.bed \
  --ivar_trim_offset 5 \
  --primer_left_suffix '_F' \
  --primer_right_suffix '_R' \
  --outdir results/viralrecon \
  --variant_caller ivar \
  --consensus_caller ivar \
  --skip_picard_metrics \
  --skip_asciigenome \
  --skip_assembly \
  --skip_nextclade \
  --skip_pangolin \
  --freyja_repeats 1

Should we start now preparing learning objectives for the sars-cov2-genomics wastewater surveillance section?

tavareshugo assigned bsalehe Dec 21, 2023

tavareshugo added the wastewater label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare data for wastewater materials #43

Prepare data for wastewater materials #43

tavareshugo commented Dec 21, 2023 •

edited by bsalehe

Loading

tavareshugo commented Dec 21, 2023 •

edited

Loading

bsalehe commented Jan 5, 2024

tavareshugo commented Jan 5, 2024

bsalehe commented Jan 5, 2024 •

edited

Loading

bsalehe commented Jan 16, 2024

bsalehe commented Jan 18, 2024

Prepare data for wastewater materials #43

Prepare data for wastewater materials #43

Comments

tavareshugo commented Dec 21, 2023 • edited by bsalehe Loading

tavareshugo commented Dec 21, 2023 • edited Loading

bsalehe commented Jan 5, 2024

tavareshugo commented Jan 5, 2024

bsalehe commented Jan 5, 2024 • edited Loading

bsalehe commented Jan 16, 2024

bsalehe commented Jan 18, 2024

tavareshugo commented Dec 21, 2023 •

edited by bsalehe

Loading

tavareshugo commented Dec 21, 2023 •

edited

Loading

bsalehe commented Jan 5, 2024 •

edited

Loading