Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.1.0 branch questions #67

Open
haannguyen opened this issue Jan 31, 2024 · 2 comments
Open

0.1.0 branch questions #67

haannguyen opened this issue Jan 31, 2024 · 2 comments

Comments

@haannguyen
Copy link

Hi Adnan,

Thank you so much for the great software! I have been trying to use it to look at the content of the polyA tails in my reaction.
I have a few questions:

  1. the polya_tail_fasta_seq column output doesn't match the length in the tail_length column? ie. tail_length is 95.6 but polya_tail_fasta_seq is 'UAAAAAAGU'. How should I interpret this data?
  2. is there any confidence/quality metrics for the estimations? i.e. how confident are we in the non-A calls in 'UAAAAAAGU'?
  3. do you have any idea about stats the baseline or expected noise for the outputs? I have a sample that is supposed to be all As, but the polya_tail_fasta_seq output contains only ~71% As. Is there a way to set thresholds for tail content estimates?

Thank you so much.

Yours,
Ha An

@adnaniazi
Copy link
Owner

Hi,

Thank you for using tailfindr.

Here are answers to your questions:

  1. When tail lengths are greater than 10-15nt, the basecaller is not very accurate in outputting the corresponding number of bases. For example, a tail might actually be 30nt long, but the corresponding tail region might only have 12 As in that region because the basecaller struggles to output the correct number of homopolymer bases. That's the whole reason that tools like tailfindr exist to accurately predict the tail length because the basecaller struggles to output the correct number of bases for long tails.
  2. No, we dont have any confidence metric for the non-A calls. I just find a monotonous signal region that is supposed to represent the signal corresponding to the polyA tail, and then I output the corresponding bases predicted by the basecaller in this region.
  3. If a read is just the adapter, followed by all As, and nothing else, then tailfindr out would not be reliable. For tailfindr to work properly, you must have the sequencing adapter, followed by polyA tail, followed by some non-polyA sequence.
    Best,
    Adnan

@haannguyen
Copy link
Author

haannguyen commented Feb 5, 2024

Hi Adnan,

Thanks for the detailed reply!
I see, I did not realize that the polya_tail_fasta_seq is pulling the data from the basecalled sequence and not reprocessing the data itself to output this.

If a read is just the adapter, followed by all As, and nothing else, then tailfindr out would not be reliable. For tailfindr to work properly, you must have the sequencing adapter, followed by polyA tail, followed by some non-polyA sequence.

yes I do have a non-polyA sequence attached to a polyA tail (that is supposed to be A only), and I get a 'polya_tail_fasta_seq' output that is only ~70% A. Is the nanopore basecalling this noisy/error-prone? For clarity, I am trying to compare a tailing reaction with ATP vs mixed nucleotides and trying to assess tail content.

Thank you for all your help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants