A complex ETL workflow centered around analyzing protein and genomic context similarity
Documentation: https://socialgene.github.io
Nextflow workflow: https://github.com/socialgene/sgnf
Python package: https://github.com/socialgene/sgpy
Knowledge graphs are increasingly popular in bioinformatics, cheminformatics, and drug discovery. However, table based databases tend to be hard to comprehend without time and specialized knowledge (e.g. chembl's tables) and while graph databases allow scientists to query the data in a manner closer to it is conceptualized, few biomedical knowledge graphs are built off reproducible large-scale computations, most simply link disparate databases. SocialGene attempts to create a knowledge graph for natural product drug discovery that is built off incorporating disparate data and calulations thereon. I've attempted to make it as simple as possible to create and work with the databases, and extend to new sources.