Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare data for wastewater materials #43

Open
5 of 6 tasks
tavareshugo opened this issue Dec 21, 2023 · 6 comments
Open
5 of 6 tasks

Prepare data for wastewater materials #43

tavareshugo opened this issue Dec 21, 2023 · 6 comments
Assignees

Comments

@tavareshugo
Copy link
Collaborator

tavareshugo commented Dec 21, 2023

We have a subset of 60 samples from Karthikeyan et al. that show a transition between delta and omicron variants.

To run it live in a workshop we need to have the pipeline finish in under 1h.
If that is not possible, we may need to instead use a preprocessed folder with results from 60 samples and in the workshop they only process 5-10 samples.

  • test how long it takes to run 30 samples - locally, this takes over 2h and didn't finish (we killed the process)
  • test running 5 samples only with default options (see comment below).
  • test using --freyja_repeats 0 with viralrecon on a small number of samples. To see if it's possible to skip this step. If this throws error use 1 and see if that works. This is to save time running the pipeline.
  • Find primer locations for the kit used in the publication: Swift Normalase Amplicon Panels (SNAP) kit (PN: SN-5X296 (core) COVG1V2-96 (amplicon primers), Integrated DNA Technologies)
    • Hugo contacted idtdna who now commercialise this product.
    • Bajuna will open issue on C-VIEW repo to ask what they did in the publication.
  • Run 3 samples using the SWIFT BED file directly using the --primer_bed option. Also download the reference FASTA file and GFF and pass directly with --fasta and --gff.
  • prepare participant data directories with the files needed for the workshop. There is a folder on the hpc under sars-wastewater/participants for this. This includes:
  • data/reads - FASTQ files for the 5 samples to be processed
  • resources - reference genome FASTA and GFF annotation (may be useful for some analysis)
  • preprocessed - with results from 30 samples
  • scripts - shell scripts that they will fix in the exercises
  • utilities - python scripts we provide, e.g. to prepare samplesheet or tidy freyja output files
  • sample_info.csv metadata table with "sample,date,country,location,latitude,longitude"
@tavareshugo
Copy link
Collaborator Author

tavareshugo commented Dec 21, 2023

Command to run pipeline:

nextflow run nf-core/viralrecon \
  -profile singularity  -r dev  \
  --max_memory '16.GB' --max_cpus 8 \
  --input samplesheet.csv \
  --genome 'MN908947.3' \
  --platform illumina \
  --protocol amplicon \
  --primer_set artic \
  --primer_set_version 4.1 \
  --outdir results/viralrecon \
  --variant_caller ivar \
  --consensus_caller ivar \
  --skip_picard_metrics \
  --skip_asciigenome \
  --skip_assembly

@bsalehe
Copy link
Contributor

bsalehe commented Jan 5, 2024

Hi @tavareshugo, It took me 2h 11m 9s to run 3 samples on Ubuntu. So 5 samples may take minimum "4hrs". I think we may end up only using preprocessed results instead???

viralrecon_run_screenshot

@tavareshugo
Copy link
Collaborator Author

Bajuna tested running 3 samples with --freyja_boot 1 and it ran in 50 minutes - this is a good alternative to run in the workshop.

--freyja_boot 0 throws an error.

@bsalehe
Copy link
Contributor

bsalehe commented Jan 5, 2024

For training purpose we may instruct participants to download BED file here for SWIFT_V2 primer set version which seemed to be used by Freyja as well as the codes show here.

@bsalehe
Copy link
Contributor

bsalehe commented Jan 16, 2024

I consistently get this error:-

Command error:
  Traceback (most recent call last):
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 86, in <module>
      sys.exit(main())
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 82, in main
      collapse_primer_bed(args.FILE_IN, args.FILE_OUT, args.LEFT_PRIMER_SUFFIX, args.RIGHT_PRIMER_SUFFIX)
    File "/home/bajuna/.nextflow/assets/nf-core/viralrecon/bin/collapse_primer_bed.py", line 58, in collapse_primer_bed
      chrom, start, end, name, score, strand = line.strip().split("\t")
  ValueError: not enough values to unpack (expected 6, got 4)

Work dir:
  /home/bajuna/wastewater_surveillance/work/68/2c5eed79b029de1fa0c8e8530cf35b

I think the bed file even when trying to convert to BED6 as instructed here it is still missing one column apparently two strands columns as in artic bed file. I have found this swift_v2_masterfile here. So, I wonder if we need to extract the left and right primer sequences to the swift bed file hopefully to avoid this error????

Cheers!

@bsalehe
Copy link
Contributor

bsalehe commented Jan 18, 2024

Now with SWIFT provided bed file which was converted to bed6 with this one liner cat sarscov2_v2_primers.bed | perl -lane 'if (/F/) {print $F[0]\t$F[1]\t$F[2]\t$F[3]\t0\t+} else {print $F[0]\t$F[1]\t$F[2]\t$F[3]\t0\t-}' > swift_primers.bed, the pipeline ran successfully approximately 51 minutes using 3 samples. The full viralrecon code used is:

nextflow run nf-core/viralrecon \
  -profile singularity  -r dev \
  --max_memory '16.GB' --max_cpus 8 \
  --fastq_dir "reads" \
  --input samplesheet.csv \
  --genome 'NC_045512.2' \
  --fasta resources/NC_045512.2.fa \
  --gff resources/NC_045512.2.gff \
  --platform illumina \
  --protocol amplicon \
  --primer_bed resources/swift_primers.bed \
  --ivar_trim_offset 5 \
  --primer_left_suffix '_F' \
  --primer_right_suffix '_R' \
  --outdir results/viralrecon \
  --variant_caller ivar \
  --consensus_caller ivar \
  --skip_picard_metrics \
  --skip_asciigenome \
  --skip_assembly \
  --skip_nextclade \
  --skip_pangolin \
  --freyja_repeats 1

Should we start now preparing learning objectives for the sars-cov2-genomics wastewater surveillance section?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants