-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with reads when template length is shorter than read length #5
Comments
Hi @lhomas Thank you for using Mitty. Unfortunately, Mitty does not have the capability to perform this kind of simulation out of the box. However it should be possible for you to write a sequencer model that allows this. The built in Illumina model is here. This should give you a decent idea of what you need to modify to get the result you want. You can copy this code and start a new sequencer model (say Right now, I don't have a formal plugin system. Once you have your model ready you can add it to I'm happy to help you along in this process, should you decide to do this. Thanks! |
Hi Kaushik, We have a similar issue here. We are always simulating illumina reads, however, we are trying to simulate the case where the DNA fragments (templates) are shorter that the sequencer's cycles themselves. This is the case in forensics and ancient DNA. However, even when specifying a smaller template length than the read length (50 vs 120) we still get reads of 120bp. Cheers, |
Hi @yassineS this is an interesting use case. When the fragment is shorter than the cycle (and paired end presumably) what is the behavior of the real machine? Machine cycle 10 (for easier counting) What kind of reads will be produced? Thanks! |
Well first thing is that the sequencer reads into the barcodes and index, and then it'll spit a random sequence of bases but at very low quality. So what we do is during the demultiplexing step we trim low quality bases. Here's a snippet from a real dataset (forward reads; R1):
|
Hello,
I am trying to simulate reads with a template length distribution such that some have a template length that is shorter than the read length, meaning not all reads in the file should reach the full read length of 75bp that I am using. However, all reads in these files are 75bp.
As an example of the issue I have provided a link to a dropbox containing two files, the simulated reads and read model used to create them. The template length is set to a mean of 50 and std of 0, meaning that the DNA fragments should all be 50, but all of the reads are 75bp (the read length set in the model).
https://www.dropbox.com/sh/uz2zjo2ze33978f/AAC8OXPwwnOtevohZ5dv_qjka?dl=0
The text was updated successfully, but these errors were encountered: