Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info about the sequencing tech TAG and reflect that on the reports #150

Open
abhi18av opened this issue Apr 4, 2023 · 7 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@abhi18av
Copy link
Member

abhi18av commented Apr 4, 2023

As part of 4-APR meeting.

  • Focus on homogenous (sequencing) datasets
  • (IN FUTURE) Accommodate hybrid datasets and reflect on the final results (nanopore/illumina)

@vrennie @TimHHH , where exactly do we need to add this sequencing platform information i.e. which summary files?

@abhi18av abhi18av changed the title Accommodate the nanopore (only) and hybrid datasets and reflect on the final results (nanopore/illumina) Add info about the sequencing tech TAG and reflect that on the reports Apr 4, 2023
@abhi18av abhi18av added the enhancement New feature or request label Apr 4, 2023
@TimHHH
Copy link
Collaborator

TimHHH commented Apr 4, 2023

@vrennie @TimHHH , where exactly do we need to add this sequencing platform information i.e. which summary files?

I would think a column in the summary stats file.

@abhi18av abhi18av self-assigned this Apr 4, 2023
@vrennie
Copy link
Collaborator

vrennie commented Apr 11, 2023

Yes, I agree with Tim, just a column that looks like this:

Sequencing Technology
Illlumina
ONT
ONT
Illumina
Illumina
Illumina
...

@abhi18av
Copy link
Member Author

Okay, I understand this would be added to the summary stats file 👍

However, there's one more detail worth mentioning here, currently we hard-code the sequencing technology in the bam_rg_string

bam_rg_string ="@RG\\tID:${flowcell}.${lane}\\tSM:${study}.${sample}\\tPL:illumina\\tLB:lib${library}\\tPU:${flowcell}.${lane}.${index_sequence}"

Should we not add this column to the input-samplesheet as well?

@vrennie
Copy link
Collaborator

vrennie commented Apr 11, 2023

Yes, good catch @abhi18av, lets add this as a column to the samplesheet.

@TimHHH
Copy link
Collaborator

TimHHH commented Apr 17, 2023

Yes, ideally the user provides the sequencing technology in the sample sheet and this is then used in the bam_rg_string along the lines of PL:${technology}. The documentation has to be clear that only one technology is allowed per sample sheet.

@abhi18av
Copy link
Member Author

Guys, what about reflecting that on the actual sample name as well? Something like Shea2017_2021_396.SRR16089406.LNA.A1.ILMN.1.1.1

The NCBI currently lists the following platforms used for the sequences

  • ILLUMINA
  • ION_TORRENT
  • ABI_SOLID
  • PACBIO_SMRT
  • CAPILLARY
  • OXFORD_NANOPORE
  • LS454
  • BGISEQ

To avoid long names, we can perhaps standardize the acronyms like ILMN / ONT / PCB / ION etc - what do you think?

@vrennie
Copy link
Collaborator

vrennie commented Apr 17, 2023

I think unless the full name messes up the .csv its better to keep the full name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants