Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi and a Couple of Questions #29

Open
bradfordwinkelman opened this issue Aug 26, 2022 · 0 comments
Open

Hi and a Couple of Questions #29

bradfordwinkelman opened this issue Aug 26, 2022 · 0 comments

Comments

@bradfordwinkelman
Copy link

bradfordwinkelman commented Aug 26, 2022

Hello! I'm a new user of miRTop. I've started using it through the nf-core smrnaseq pipeline, providing computational support for an academic lab studying miRNA. As we've been examining the outputs of miRTop from that pipeline, we've come across a couple of things that I would like to understand better. I'm not sure if this is the best place to pose my questions, so please point me in the right direction if there is somewhere that would be more appropriate.

Here are my questions:

  1. iso_snv vs iso_snv_seed

One of the reads that is being fed into miRTop is TGGTGTTGTCCCCCCGAGTGGC. This is correctly being called a 2GA variant of TAGTGTTGTCCCCCCGAGTGGC (kshv-miR-K12-10a-3p), however it is odd that miRTop is calling this an iso_snv variant instead of an iso_snv_seed, since it is the second position affected. Is there some reason for this that I might be missing?

  1. Output Table Format (mirtop.tsv)

Another read, TGAGGTAGTAGGTTGTATAGTT, is called correctly as hsa-let-7a-5p. But it seems that since there are three different stem-loop records for hsa-let-7a that this sequence aligns equally well to (hsa-let-7a-1, hsa-let-7a-2, hsa-let-7a-3), each of which can be the parent of the mature sequence hsa-let-7a-5p, the output table ends up having three separate records for the read. And each sample seems to randomly get either the correct number of counts or a zero, as shown in the example below. It seems that we can get the correct counts for each read-miRNA combination by aggregating these rows, but I wanted to (1) make sure that this is an appropriate way to handle these rows and (2) understand why the output is this way.

UID Read miRNA Sample 1 Sample 2 Sample 3
iso-22-XKVLRYVPQ TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-5p 172722 0 114082
iso-22-XKVLRYVPQ TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-5p 0 124121 0
iso-22-XKVLRYVPQ TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-5p 0 0 0

Any thoughts, suggestions, or guidance from someone on these issues would be greatly appreciated! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant