Skip to content

hypha-space/hypha

Hypha

Hypha is a self-managing "Kubernetes for AI" (but simpler) for distributed machine learning. Train and serve models across heterogeneous infrastructure—from GPU farms to commodity hardware.

Get started in minutes following the quick start guide.

Built on the battle-tested libp2p network stack with additional security features, Hypha maintains high security and reliability while making it simple to set up. The system implements DiLoCo (Distributed Low-Communication) style training, an approach that dramatically reduces communication overhead compared to traditional data-parallel training making it feasable to train across data centers.

Key Features

  • Distributed Training — Run DiLoCo-style training across workers with infrequent synchronization, ideal for bandwidth-constrained or geographically distributed setups. Learn more →
  • Production Inference (in development) — The same decentralized architecture supports scalable, resilient inference serving with automatic load balancing.
  • Security — End-to-end encryption via mTLS, certificate revocation for immediate access control, and a permissioned network model. Security guide →

Installation

Install Hypha using the standalone installer script:

curl -LsSf https://hypha-space.org/install.sh | sh

For alternative installation methods (GitHub releases, Cargo), see the Installation Guide.

Next Steps

New to Hypha? Start with the Quick Start to get a local cluster running in minutes, then explore the architecture and deployment guides to get into production.

  • Quick Start — Set up a local cluster and run your first training job
  • Architecture — How Gateways, Schedulers, Workers, and Data Nodes fit together
  • Deployment — Deploy Hypha on cloud infrastructure

Contributing

Want to help improve Hypha and its capabilities for distributed training and inference? We encourage contributions of all kinds, from bug fixes and feature enhancements to documentation improvements. Hypha aims to provide a robust platform for efficient and scalable machine learning workflows, and your contributions can help make it even better. Consult CONTRIBUTING.md for detailed instructions on how to contribute effectively.

About

A self-managing system for distributed machine learning.

Topics

Resources

License

AGPL-3.0, Apache-2.0 licenses found

Licenses found

AGPL-3.0
LICENSE-AGPL
Apache-2.0
LICENSE-APACHE

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published