The AnGeLi tool on the Bahler Lab website http://bahlerweb.cs.ucl.ac.uk/cgi-bin/GLA/GLA_input is used to look up terms associated with gene lists e.g. FPYO or GO terms.
It runs on a webserver controlled by the UCL CS department.
The data is stored in a flat, tab-seperated value file with a txt extension.
Python 3 pip install gffutils requests
Please refer to the code for exact implementation :)
This script focuses on updating the GO and FYPO terms associated with the genes. Thus, where a field is noted as not having a source (NA) then the value from the original file is used. Otherwise, the latest values are deemed to be correct.
The script will open the original AnGeLiDatabase file and parse the content into a map. The sources will then be downloaded into memory and parsed. Values will be calculated (see code for particular details) and a new results map will be created The script will output the new AnGeLiDatabase file.
The original file, AnGeLiDatabase.txt downloaded 18/12/2024, is stored alongside this README.md file. It is used as the original source for some values that are considered static.
There is also a test list of genes in the file XXXXX. The output from the updated database can be compared to the original output to confirm that the changes are working as expected.
SPAC1F8.01 SPAC31A2.12 SPAC22F3.10c SPAC1687.21 SPAC222.06 SPAC56F8.03 SPAC56F8.11 SPAC17A5.14 SPAC9G1.12 SPAC688.11 SPAC1142.04 SPAC25B8.02 SPAC144.03 SPAC17D4.01 SPAPYUG7.05 SPAC1B3.05 SPAC11E3.06 SPAC3G6.01 SPAC29B12.04 SPAC24B11.11c SPAC806.02c SPAC23G3.12c SPAC10F6.03c SPAC6F12.16c SPAC9.03c SPAC23H4.04 SPAC1399.03 SPAC17H9.04c SPAC18G6.04c SPAC4G9.20c SPAC22H10.12c SPAC19A8.04 SPAC4A8.07c SPAC823.08c SPAC1786.01c SPAC3G9.08 SPAC3G9.06 SPAC3G9.05 SPAC6G9.02c SPAC3A11.09 SPAC17G6.15c SPAP7G5.02c SPAC4H3.11c SPAC27E2.03c SPAC25G10.05c SPAC29E6.10c SPAC458.02c SPAC4F10.09c SPAC4F10.13c SPAPB8E5.07c SPAC29A4.20 SPAC26F1.07 SPAC14C4.12c SPBC359.06 SPBC1683.05 SPBC660.16 SPBC800.11 SPBC106.15 SPBC1685.11 SPBC530.01 SPBC530.02 SPBC530.04 SPBC1709.06 SPBC19C2.08 SPBP16F5.06 SPBC1711.04 SPBC12D12.03 SPBC24C6.06 SPBC29A10.07 SPBC2G5.03 SPBC4F6.04 SPBC4F6.09 SPBC19C7.03 SPBC19C7.06 SPBP4H10.15 SPBC4C3.05c SPBC405.01 SPBC20F10.01 SPBC17D11.05 SPBC17D1.05 SPBPB7E8.01 SPBC19F5.04 SPBC26H8.10 SPBC3F6.05 SPBC16A3.15c SPBC839.05c SPBC530.15c SPBC36.02c SPBC713.04c SPBC1734.01c SPBC651.01c SPBC651.08c SPBC27B12.12c SPBC17A3.04c SPBC19G7.08c SPBC1826.01c SPBC1703.03c SPBC25H2.05 SPBC887.04c SPBP8B7.20c SPBC23E6.07c SPBC26H8.08c SPBC14F5.05c SPBC1539.09c SPCC1529.01 SPCC1183.07 SPCC895.06 SPCC550.11 SPCC645.13 SPCC285.17 SPCC1223.01 SPCC737.08 SPCC1739.14 SPCC1494.07 SPCC1672.05c SPCC1672.11c SPCC18B5.01c SPCP25A2.02c SPCC1281.06c SPCC1450.16c
NOTES:
- The primary field of joining is the Systematic_ID e.g. SPAC1002.01
- Mapped value is fomrfrom source noted below, if there is no value then the original value is used.
- GO and FYPO fields are ordered, it has not been verified is this is a requirement, but for sake of simplicity this convention has been retained.
- For GO and FYPO we will scan the directory and fetch the latest file. This is because the date is embedded in the file name
Heading details
Header Name | Description | Sample Value |
---|---|---|
Short name | A shorthand name for the field | Mass |
Long name | A slightly more descriptive name | Molecular weight (kDa) |
Scale of measurement | What is the measure, typically this is a unit measurement or a flag | Metric OR Binary |
Group | The type of the field | Protein Feature |
Source | Where the data comes from | PomBase |
Author | The author? | This will be copied from the "old" file |
Update | The date the data was last updated | 2025-01-05 |
Link | A link to the source or reference | http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005826 |
Field Details