This R script reads old Ensembl gene IDs from an Excel file (from a column titled "Given"), maps them to their corresponding new Ensembl gene IDs using the biomaRt package, and saves the full mapping, including gene symbols, to a new Excel file. The script is designed to handle Ensembl IDs from an older genome version (such as Rnor6.0, Ensembl version 80), and it connects to both the old and latest Ensembl databases to perform the mapping. The input Excel file should be named "Unmatched Ensembl.xlsx" and should contain a column labeled "Given", where the old Ensembl IDs are stored. Ensure the file path is updated accordingly in the script. The output will be an Excel file named "Updated Ensembl.xlsx", which includes the old Ensembl IDs, the associated gene symbols, and the new Ensembl IDs. If any old ID cannot be mapped to a new one, the script will insert NA in the New Ensembl ID column. The script also calculates and displays the percentage of IDs that were successfully mapped to gene symbols and new Ensembl IDs. It assumes that the necessary R packages (biomaRt, readxl, writexl) are installed, and will install them automatically if not. To run the script, open it in R or RStudio and execute it. The updated Excel file will be saved in the same directory, providing a convenient way to update old gene annotation data with the latest Ensembl information.
Will display the success rate of symbol and new Ensembl ID matching. Currently I'm getting approximately 95% success with symbol matching and 33% for new ID. This could be due to merged/separated sequences in the Ensembl databases but feel free to comment if you have suggestions.