Skip to content

This script takes outdated Ensembl IDs (this case from Rnor6.0 version 80) and spits out the gene symbol and updated Ensemble IDs if available.

Notifications You must be signed in to change notification settings

lindseydruschel/Ensembl-ID-Match

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ensembl-ID-Match

This R script reads old Ensembl gene IDs from an Excel file (from a column titled "Given"), maps them to their corresponding new Ensembl gene IDs using the biomaRt package, and saves the full mapping, including gene symbols, to a new Excel file. The script is designed to handle Ensembl IDs from an older genome version (such as Rnor6.0, Ensembl version 80), and it connects to both the old and latest Ensembl databases to perform the mapping. The input Excel file should be named "Unmatched Ensembl.xlsx" and should contain a column labeled "Given", where the old Ensembl IDs are stored. Ensure the file path is updated accordingly in the script. The output will be an Excel file named "Updated Ensembl.xlsx", which includes the old Ensembl IDs, the associated gene symbols, and the new Ensembl IDs. If any old ID cannot be mapped to a new one, the script will insert NA in the New Ensembl ID column. The script also calculates and displays the percentage of IDs that were successfully mapped to gene symbols and new Ensembl IDs. It assumes that the necessary R packages (biomaRt, readxl, writexl) are installed, and will install them automatically if not. To run the script, open it in R or RStudio and execute it. The updated Excel file will be saved in the same directory, providing a convenient way to update old gene annotation data with the latest Ensembl information.

Will display the success rate of symbol and new Ensembl ID matching. Currently I'm getting approximately 95% success with symbol matching and 33% for new ID. This could be due to merged/separated sequences in the Ensembl databases but feel free to comment if you have suggestions.

About

This script takes outdated Ensembl IDs (this case from Rnor6.0 version 80) and spits out the gene symbol and updated Ensemble IDs if available.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages