Skip to content
Harshad edited this page Jan 19, 2021 · 9 revisions

Knowledge Graphs for Microbial data

This repository is derived from kg-covid-19.

Knowledge Graph Hub concept

Please see here

Prerequisites

  • Java/JDK is required in order for the transform step to work properly. Installation instructions can be found here.

Setup

  • git clone https://github.com/Knowledge-Graph-Hub/kg-microbe
  • cd kg-microbe
  • pip install -r requirements.txt
  • python setup.py install

Pipeline stages:

  1. Download
  2. Transform
  3. Merge

Download

This step download all files from the urls declared in the download.yaml file.

script - python run.py download

File currently downloaded:

  1. Traits data from bacteria-arachaea-traits repository. Considering only 'condensed_traits_NCBI.csv' for now.
  2. Environments data from the same repository found as a conversion table titled 'environments.csv'.
  3. ROBOT jar and shell script files. ROBOT is used to convert the OWL format files of ontologies into OBOJSON format to extract nodes and edges from the ontologies. In this case, we also leverage the 'extract' feature of ROBOT to get subsets of ontologies. Documentation on ROBOT could be found here.
  4. CHEBI.owl is used as dictionary while running OGER to annotate 'carbon substrate' information from the traits data.
  5. NCBITaxon.owl is used as the ontology source to capture organismal classification information.

Transform

In this step, we create nodes and edges corresponding to the four downloaded files mentioned above (#1, #4 and #5).

scripts

  1. All together - python run.py transform

OR

Running transforms individually:

  1. For traits data - python run.py transform -s TraitsTransform
  2. For CHEBI.owl = python run.py transform -s ChebiTransform
  3. For NCBITaxon.owl = python run.py transform -s NCBITransform

Merge

In this step, all the above transforms are merged and a cumulative nodes and edges files are generated.

script - python run.py merge

Data

The final merged data is available here