Skip to content

A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.

License

Notifications You must be signed in to change notification settings

JBris/nextflow-graph-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

27ed9a7 · Nov 20, 2024

History

46 Commits
Sep 23, 2023
Nov 20, 2024
Sep 23, 2023
Sep 26, 2023
Sep 26, 2023
Sep 26, 2023
Sep 26, 2023
Sep 26, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Feb 26, 2024
Sep 28, 2023
Sep 23, 2023
Apr 2, 2024
Oct 4, 2023
Aug 29, 2023
Feb 19, 2024
Feb 26, 2024
Sep 26, 2023
Sep 23, 2023
Sep 26, 2023
Sep 23, 2023

Repository files navigation

Nextflow Graph Machine Learning

Validate Pipeline Generate Documentation pages-build-deployment CodeQL

Website: Nextflow Graph Machine Learning

A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.

Table of contents

Introduction

The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.

For developing GNNs, we use PyTorch Geometric.

The Nextflow pipeline

Nextflow has been included to orchestrate the GRN reconstruction pipeline.

The pipeline is composed of the following steps:

  1. Exploratory data analysis: View the GRN and calculate some summary statistics.
  2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.
  3. ArangoDB Importing: Import the graph into ArangoDB.
  4. GNN training: Train a GNN using SAGE convolutional layers.
  5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.

Run nextflow.sh to execute the full pipeline.

Run clean_nf.sh to clean up the output and logging files from the Nextflow run.

Python Environment

Python dependencies are specified in this requirements.txt file..

These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0

Execute the following command to pull the image: docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0

MLOps

ArangoDB

This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.