Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Gene Names Implications? #41

Open
amnahsiddiqa opened this issue Jun 7, 2024 · 1 comment
Open

Duplicate Gene Names Implications? #41

amnahsiddiqa opened this issue Jun 7, 2024 · 1 comment

Comments

@amnahsiddiqa
Copy link

hi @lachmann12. I really appreciate this resource , it is truly great help. But apparently I noticed this too and as you may see in the screenshot attached that values of each entry is not identical, should I imply int was at transcript level rather than gene ?
Screenshot 2024-06-07 at 1 13 02 PM

@lachmann12
Copy link
Collaborator

lachmann12 commented Jun 7, 2024

Thank you for your feedback!. I was not aware that there is cases where the counts differ. It comes from some issues in the Ensembl annotation where multiple Ensembl gene ids map to the same gene symbol. When investigating the Ensembl genes that map to the same symbol we found that they usually have identical transcript sequences, meaning they are indistinguishable from each other based on reads. As a result the counts for duplicated genes are always the same. It is thus safe to just keep one gene entry and dismiss the others. In future updates, we will try to resolve this issue. I will leave the issue open until we resolved it. In the case where they are different I would suggest using the entry with the most counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants