Skip to content

Commit

Permalink
doc: added links to RTD in README.me, some restructuring
Browse files Browse the repository at this point in the history
  • Loading branch information
sehoffmann committed Oct 25, 2024
1 parent f5776d3 commit 39c0720
Showing 1 changed file with 20 additions and 12 deletions.
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
![dmlcloud logo](./misc/logo/dmlcloud_color.png)
![Dmlcloud Logo](./misc/logo/dmlcloud_color.png)
---------------
[![](https://img.shields.io/pypi/v/dmlcloud)](https://pypi.org/project/dmlcloud/)
[![](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_tests.yml?label=tests&logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_tests.yml)
[![](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_linting.yml?label=lint&logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_linting.yml)
[![PyPI Status](https://img.shields.io/pypi/v/dmlcloud)](https://pypi.org/project/dmlcloud/)
[![Documentation Status](https://readthedocs.org/projects/dmlcloud/badge/?version=latest)](https://dmlcloud.readthedocs.io/en/latest/?badge=latest)
[![Test Status](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_tests.yml?label=tests&logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_tests.yml)

*Flexibel, easy-to-use, opinionated*
A torch library for easy distributed deep learning on HPC clusters. Supports both slurm and MPI. No unnecessary abstractions and overhead. Simple, yet powerful, API.

*dmlcloud* is a library for **distributed training** of deep learning models with *torch*. Unlike other similar frameworks, dmcloud adds as little additional complexity and abstraction as possible. It is tailored towards a carefully selected set of libraries and workflows.
## Highlights
- Simple, yet powerful, API
- Easy initialization of `torch.distributed`
- Distributed checkpointing and metrics
- Extensive logging and diagnostics
- Wandb support
- A wealth of useful utility functions

## Installation
```
pip install dmlcloud
```

## Why dmlcloud?
- Easy initialization of `torch.distributed` (supports *slurm* and *MPI*).
- Simple, yet powerful, API. No unnecessary abstractions and complications.
- Checkpointing and metric tracking (distributed)
- Extensive logging and diagnostics out-of-the-box. Greatly improve reproducability and traceability.
- A wealth of useful utility functions required for distributed training (e.g. for data set sharding)
## Minimal Example
*TODO*

## Documentation

You can find the official documentation at [Read the Docs](https://dmlcloud.readthedocs.io/en/latest/)


0 comments on commit 39c0720

Please sign in to comment.