-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage by region vs. Global dist #184
Comments
Hi, |
The gene IDs were not clearly layed out in my BED file, so each gene has been assigned the original contig name rather than an updated gene name -- so it's hard to determine exactly which entries are not included in the distribution file. I have attached a thresholds output and the distribution text file for a sample (compressed in attachment). If you look for the # of entries for these contig IDs, you can see they do not always match in frequency between the thresholds file and the distribution file. |
do the missing IDs have 0 coverage? |
No, it appears that in the regions.bed file, all of the genes have coverage > 0. |
Hi, Now, if I look at a few of the differences:
I can see it's very low coverage globally, but I can't see wether your regions are covered. So I don't think I have enough information to debug this further. Note that your contig ID's must be unique. |
I can provide you with the regions coverage file if that would be helpful. I tried to name the individual genes in the BED file I supplied, but it seemed to get gene names from the BAM file I attached, which is why they are not unique between genes -- however, the contig IDs are unique. The documentation says that the gene names will come from the 4th column from the supplied BED file - but this has not been working for me, which is why the gene/region IDs are not unique. |
the names in the first column of the bed output must always come from the chromosome/contig column of the bam. the latest regions.bed file you send has nothing in the 4th column. |
Hi there, sorry for the long period of radio silence. I am trying to determine the depth of coverage for genes within my specific MAG bins. I aligned my raw reads to my assembled contigs, binned these contigs into MAGs, and functionally annotated them to get the BED file. For some reason, my output files are only including the names of the contigs and not the names of the genes - so I am getting coverage values in the output files, but they are not associated with specific genes as far as I can tell. Could I be looking at the wrong files, or am I doing something incorrectly? The contig IDs I have are unique, and the BED files I am now using have a unique gene ID in the 4th column. Ideally the mean coverage per gene results would appear like this (an excerpt from a colleague's output):
Sorry for the long message and radio silence, thank you so much for your help. BDC_D_ATL_HL_1_20_1_bin.6.mosdepth.region.dist.txt |
Hi there. I am trying to retain the coverage for genes of interest found in my assembled contigs, and have a region BED file, a thresholds BED file, a global.dist.txt file, and a summary.txt file.
I noticed that for certain genes with multiple gene predictions in the BED file (from Prodigal), the number of entries in the global.dist.txt file describing the distribution of their coverage is less than the number of entries of this predicted gene in the regions & threshold files. For example, a gene with multiple predictions appears 102 times in my BED files, but only 49 times in the global.dist file...
Is there a reason for this? Just trying to understand the output of these files better. Thanks!
The text was updated successfully, but these errors were encountered: