Making geneLists for gseKEGG()/gseGO()/GSEA() #651
Unanswered
Lil-Psilocybe
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello! Thanks for a fantastic package! I am having a lot of confusion about the input gene and annotation term info though. I cross-posted this in Biostars , but would like some advice from the folks who wrote the program.
Basically my question boils down to this: for any annotation used (GO/KEGG/etc.), does each GO/KEGG number identifier (GO:5478201, K12345, etc.) have to be associated with a single gene? I'm generating my annotation info (for my non-model organism) from EggNOG mapper, which spits out multiple K#s and GO#s for each queried entry like so:
![EggNOG KEGG GO](https://private-user-images.githubusercontent.com/51445787/291873991-eb4e8e29-e59a-4aa7-a999-c15e76c25f67.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MTczNDgsIm5iZiI6MTcxOTkxNzA0OCwicGF0aCI6Ii81MTQ0NTc4Ny8yOTE4NzM5OTEtZWI0ZThlMjktZTU5YS00YWE3LWE5OTktYzE1ZTc2YzI1ZjY3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDEwNDQwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTM0MjIzNTBhZWFlMTZkOGJkM2YzMTI2NDgzNWY4OGNmMjFlODAxZWM4ZGNjMTZlOTQzNDY3NjEzMGZlZTYwZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Ax0pqUCaA01vXaXdGmD3NhkmP9zc6rEgvybWVOxLCp8)
I got gseKEGG() to run when I reduced the amount of K#s-per-gene like so:
![KEGG geneID](https://private-user-images.githubusercontent.com/51445787/291872130-8f665fa9-350b-4a58-acd6-b76e020730e4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MTczNDgsIm5iZiI6MTcxOTkxNzA0OCwicGF0aCI6Ii81MTQ0NTc4Ny8yOTE4NzIxMzAtOGY2NjVmYTktMzUwYi00YTU4LWFjZDYtYjc2ZTAyMDczMGU0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDEwNDQwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc5N2RjODM2OGEzYTA2ZmJhYTk1MjJlNGNkNTU3MmU4N2Q5MzJhODJkMzcyZTZkZGU4MGVhOTFjZGRlOGM4NjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.-WX1P8FHXVvgtsqPoEptUUs0XoMJaWGUi1GdQzeeFTY)
but am having trouble getting gseGO()/GSEA() to run since I still have multiple GOIDs for each gene like so:
![GO geneID](https://private-user-images.githubusercontent.com/51445787/291874315-8bcba05e-0ab6-4cf9-8fc9-6d36d8bb7ab3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MTczNDgsIm5iZiI6MTcxOTkxNzA0OCwicGF0aCI6Ii81MTQ0NTc4Ny8yOTE4NzQzMTUtOGJjYmEwNWUtMGFiNi00Y2Y5LThmYzktNmQzNmQ4YmI3YWIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDEwNDQwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIwMmVjMTQ5ZmRmMTRiMjJjMjFlNTc1MmFkOTM5MmY4ZjE5NGQzZmI0MTk0YzNiMTllM2EyZmEzNDhkYWFiODUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.9iEbtBARMvvMCR95gMaFgdXhZzNy1Xq0M0OVqpo6KFM)
For GSEA(), TERM2NAME looks like this:
![TERM2NAME](https://private-user-images.githubusercontent.com/51445787/291873112-3309fcdd-bdee-4987-971d-0b85bca24dda.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MTczNDgsIm5iZiI6MTcxOTkxNzA0OCwicGF0aCI6Ii81MTQ0NTc4Ny8yOTE4NzMxMTItMzMwOWZjZGQtYmRlZS00OTg3LTk3MWQtMGI4NWJjYTI0ZGRhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDEwNDQwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFjYmFiYmViNzY0Yjc4YzYzMWEzY2NiY2NlMzM3YWVhNjk5NzAyNzM4Mjg1MzJlOTE4NzNhZjY2YjZjMzlmOTgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.rdbCF3lCF6ByRY7ZK6OvBtzDFKabEJNEF-ln_KPm-BE)
And TERM2GENE looks like this:
![TERM2GENE](https://private-user-images.githubusercontent.com/51445787/291873132-45205106-f840-4aaf-8a0a-d54c930b405f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MTczNDgsIm5iZiI6MTcxOTkxNzA0OCwicGF0aCI6Ii81MTQ0NTc4Ny8yOTE4NzMxMzItNDUyMDUxMDYtZjg0MC00YWFmLThhMGEtZDU0YzkzMGI0MDVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDEwNDQwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5NGQ2ZDFkZTI4ZGRjMWVjNTQ4Yzc5YThmMDlmYWFlY2JkYjU2MWE3YjFlYWEzNzdmYzcwN2I3Njg1MjYzNWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ETWJ4SvtTGRyhsu9SG78oEcZP7ejgffpqBe7UZNtwos)
With all this in mind, do I need to reduce my GO numbers to just a single GO number for each single gene? And if so, how do I account for the GSEA statstistic being tested since this is specific to each gene?
Thanks for any recommendations!
Beta Was this translation helpful? Give feedback.
All reactions