Skip to content

ActioBio/human-gene-ontology

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Gene Ontology annotations

The parent project provides easy-to-use Gene Ontology annotations.

This repository hosts a modified version of the gene ontology annotation script, which has been adapted to focus exclusively on human gene annotations by filtering for tax_id 9606—Homo sapiens.

Execution

conda env create -f environment.yml 

conda activate human-gene-ontology

bash download.sh

python process.py

Input

  • protein_coding_gene.csv
    • The file is a CSV containing extracted data on protein-coding genes from the NCBI dataset.
  • go-basic.obo
    • The Gene Ontology file in OBO format. It contains ontological information in a structured form, describing gene products in terms of their associated biological processes, cellular components, and molecular functions in a species-independent manner.
  • gene2go.gz
    • The file from NCBI that links genes from the Entrez Gene database to Gene Ontology (GO) terms, establishing a connection between gene identifiers and their functional annotations.

Output

  • node_{domain}.csv.gz
    • Lists GO terms and names for specific domains (biological process, cellular component and molecular function).
  • edge_gene_to_{domain}.csv.gz
    • Shows gene-GO term relationships for each domain, indicating if annotations are direct or inferred and if the evidence is experimental.

Note: These CSV files are formatted for easy import into Neo4j.

About

Human Gene Ontology annotations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.7%
  • Shell 5.3%