GitHub

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

Helix provides a set of modular, Lego-like components for constructing bioinformatics workflows. By abstracting away infrastructure complexities, Helix allows researchers to focus on biological problems rather than computational logistics. Built on Modal, it offers efficient cloud-based execution for large-scale computational tasks. Leveraging Modal's features, Helix provides flexible environments, seamless integrations with various services, efficient data management, advanced job scheduling, and built-in debugging tools, all while enabling easy deployment of web services.

🧩 Core Philosophy

Helix provides modular components for building scalable bioinformatics pipelines. We've abstracted away infrastructure complexities, allowing researchers to construct workflows using a clean Python API. By leveraging Modal's cloud capabilities, Helix offers powerful distributed computing without the typical overhead. Our design emphasizes programmatic interfaces over CLIs, enabling seamless integration into existing codebases. The goal is to empower bioinformaticians to focus on algorithm development and data analysis, rather than resource management and deployment logistics.

⚙️ Getting Started

Create an account at modal.com for cloud execution access.
Install Helix: pip install helixbio (Python 3.10+ required)
Set up Modal: modal token new

🧬 Examples

Here are some examples of how to run various functions using the Modal app context:

Compute Protein Embeddings with ESM

from helix.core import app
from helix.functions.embedding import get_protein_embeddings


with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    result = get_protein_embeddings.remote(
        sequences,
        model_name="facebook/esm2_t33_650M_UR50D",
        embedding_strategy="cls"
    )

Predict Protein Structures with Chai and ESMFold

from helix.core import app
from helix.functions import chai, esmfold

# Example for Chai
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]

    inference_params = {
        "num_recycles": 3,
        "num_samples": 1
    }
    chai_results = chai.predict_structures.remote(sequences, inference_params)
    print(f"Chai predicted {len(chai_results)} structures")

# Example for ESMFold
with app.run():
    sequences = [
        "MALLWMRLLPLLALLALWGPD",
        "MKTVRQERLKSIVRILERSKEPVSGAQ"
    ]
    esmfold_results = esmfold.predict_structures.remote(sequences, batch_size=2)
    print(f"ESMFold predicted {len(esmfold_results)} structures")

Score Mutation using Protein Language Models

Mutation scoring uses pre-trained language models to evaluate the impact of amino acid substitutions in protein sequences. This implementation is based on the methods developed by Brian Hie and colleagues, as described in their Nature Biotechnology paper (Hie et al., 2024). The function supports different scoring methods:

"wildtype_marginal": Computes the difference in log probability between the mutant and wild-type amino acids without masking.
"masked_marginal": Masks each position before scoring.
"pppl": (Pseudo-perplexity) Calculates the change in model perplexity caused by the mutation.

These methods have been shown to be effective in guiding the evolution of human antibodies and other proteins.

Here's an example of how to use the mutation scoring function:

from helix.core import app
from helix.functions.scoring.protein import score_mutations

with app.run():
    # Define the sequence and mutations
    sequence = "MALLWMRLLPLLALLALWGPD"
    mutations = ["M1A", "L2A", "W5A"]

    # Define model and metric
    model_name = "facebook/esm2_t33_650M_UR50D"
    metric = "wildtype_marginal"

    # Score mutations
    scores = score_mutations.remote(
        model_name=model_name,
        sequence=sequence,
        mutations=mutations,
        metric=metric
    )
    print(f"Scores for {model_name} using {metric}:")
    print(scores)

Reference: Hie, B.L., Shanker, V.R., Xu, D. et al. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol 42, 275–283 (2024). https://doi.org/10.1038/s41587-023-01763-2

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
helix		helix
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

🧩 Core Philosophy

⚙️ Getting Started

🧬 Examples

Compute Protein Embeddings with ESM

Predict Protein Structures with Chai and ESMFold

Score Mutation using Protein Language Models

About

Releases 2

Packages

Languages

License

ragnorc/helix

Folders and files

Latest commit

History

Repository files navigation

🧬 Helix: Modular Components for Bioinformatics Workflows with Modal

🧩 Core Philosophy

⚙️ Getting Started

🧬 Examples

Compute Protein Embeddings with ESM

Predict Protein Structures with Chai and ESMFold

Score Mutation using Protein Language Models

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages