Skip to content

Commit a41d5fb

Browse files
authored
Merge branch 'main' into loadams/update-transformers
2 parents bbf2fc0 + d100a85 commit a41d5fb

File tree

15 files changed

+48
-47
lines changed

15 files changed

+48
-47
lines changed

.github/workflows/nv-a6000-fastgen.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
python -m pip install .
4242
- name: Install deepspeed
4343
run: |
44-
git clone --depth=1 https://github.com/microsoft/DeepSpeed
44+
git clone --depth=1 https://github.com/deepspeedai/DeepSpeed
4545
cd DeepSpeed
4646
python -m pip install .
4747
ds_report

.github/workflows/nv-v100-legacy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
3636
- name: Install dependencies
3737
run: |
38-
pip install git+https://github.com/microsoft/DeepSpeed.git@lekurile/bloom_v_check
38+
pip install git+https://github.com/deepspeedai/DeepSpeed.git@lekurile/bloom_v_check
3939
pip install git+https://github.com/huggingface/transformers.git
4040
pip install -U accelerate
4141
ds_report

CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
* @tohtana @tjruwase @awan-10 @loadams
1+
* @tohtana @tjruwase @loadams

README.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
[![Formatting](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml)
2-
[![nv-v100-legacy](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml)
3-
[![nv-a6000-fastgen](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml)
4-
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
1+
[![Formatting](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml)
2+
[![nv-v100-legacy](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml)
3+
[![nv-a6000-fastgen](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml)
4+
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/deepspeedai/DeepSpeed/blob/master/LICENSE)
55
[![PyPI version](https://badge.fury.io/py/deepspeed-mii.svg)](https://pypi.org/project/deepspeed-mii/)
66
<!-- [![Documentation Status](https://readthedocs.org/projects/deepspeed/badge/?version=latest)](https://deepspeed.readthedocs.io/en/latest/?badge=latest) -->
77

@@ -12,8 +12,8 @@
1212

1313
## Latest News
1414

15-
* [2024/01] [DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19)
16-
* [2023/11] [DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen)
15+
* [2024/01] [DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19)
16+
* [2023/11] [DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen)
1717
* [2022/11] [Stable Diffusion Image Generation under 1 second w. DeepSpeed MII](mii/legacy/examples/benchmark/txt2img)
1818
* [2022/10] [Announcing DeepSpeed Model Implementations for Inference (MII)](https://www.deepspeed.ai/2022/10/10/mii.html)
1919

@@ -33,7 +33,7 @@
3333

3434
Introducing MII, an open-source Python library designed by DeepSpeed to democratize powerful model inference with a focus on high-throughput, low latency, and cost-effectiveness.
3535

36-
* MII features include blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor parallelism, and high-performance CUDA kernels to support fast high throughput text-generation for LLMs such as Llama-2-70B, Mixtral (MoE) 8x7B, and Phi-2. The latest updates in v0.2 add new model families, performance optimizations, and feature enhancements. MII now delivers up to 2.5 times higher effective throughput compared to leading systems such as vLLM. For detailed performance results please see our [latest DeepSpeed-FastGen blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19) and [DeepSpeed-FastGen release blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
36+
* MII features include blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor parallelism, and high-performance CUDA kernels to support fast high throughput text-generation for LLMs such as Llama-2-70B, Mixtral (MoE) 8x7B, and Phi-2. The latest updates in v0.2 add new model families, performance optimizations, and feature enhancements. MII now delivers up to 2.5 times higher effective throughput compared to leading systems such as vLLM. For detailed performance results please see our [latest DeepSpeed-FastGen blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19) and [DeepSpeed-FastGen release blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
3737

3838
<div align="center">
3939
<img src="docs/images/fastgen-24-01-hero-light.png#gh-light-mode-only" width="850px">
@@ -58,7 +58,7 @@ MII provides accelerated text-generation inference through the use of four key t
5858
* Dynamic SplitFuse
5959
* High Performance CUDA Kernels
6060

61-
For a deeper dive into understanding these features please [refer to our blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen) which also includes a detailed performance analysis.
61+
For a deeper dive into understanding these features please [refer to our blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen) which also includes a detailed performance analysis.
6262

6363
## MII Legacy
6464

@@ -78,14 +78,14 @@ In the past, MII introduced several [key performance optimizations](https://www.
7878
</div>
7979

8080

81-
Figure 1: MII architecture, showing how MII automatically optimizes OSS models using DS-Inference before deploying them. DeepSpeed-FastGen optimizations in the figure have been published in [our blog post](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
81+
Figure 1: MII architecture, showing how MII automatically optimizes OSS models using DS-Inference before deploying them. DeepSpeed-FastGen optimizations in the figure have been published in [our blog post](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
8282

83-
Under-the-hood MII is powered by [DeepSpeed-Inference](https://github.com/microsoft/deepspeed). Based on the model architecture, model size, batch size, and available hardware resources, MII automatically applies the appropriate set of system optimizations to minimize latency and maximize throughput.
83+
Under-the-hood MII is powered by [DeepSpeed-Inference](https://github.com/deepspeedai/DeepSpeed). Based on the model architecture, model size, batch size, and available hardware resources, MII automatically applies the appropriate set of system optimizations to minimize latency and maximize throughput.
8484

8585

8686
# Supported Models
8787

88-
MII currently supports over 37,000 models across eight popular model architectures. We plan to add additional models in the near term, if there are specific model architectures you would like supported please [file an issue](https://github.com/microsoft/DeepSpeed-MII/issues) and let us know. All current models leverage Hugging Face in our backend to provide both the model weights and the model's corresponding tokenizer. For our current release we support the following model architectures:
88+
MII currently supports over 37,000 models across eight popular model architectures. We plan to add additional models in the near term, if there are specific model architectures you would like supported please [file an issue](https://github.com/deepspeedai/DeepSpeed-MII/issues) and let us know. All current models leverage Hugging Face in our backend to provide both the model weights and the model's corresponding tokenizer. For our current release we support the following model architectures:
8989

9090
model family | size range | ~model count
9191
------ | ------ | ------
@@ -120,7 +120,7 @@ The fasest way to get started is with our [PyPI release of DeepSpeed-MII](https:
120120
pip install deepspeed-mii
121121
```
122122

123-
For ease of use and significant reduction in lengthy compile times that many projects require in this space we distribute a pre-compiled python wheel covering the majority of our custom kernels through a new library called [DeepSpeed-Kernels](https://github.com/microsoft/DeepSpeed-Kernels). We have found this library to be very portable across environments with NVIDIA GPUs with compute capabilities 8.0+ (Ampere+), CUDA 11.6+, and Ubuntu 20+. In most cases you shouldn't even need to know this library exists as it is a dependency of DeepSpeed-MII and will be installed with it. However, if for whatever reason you need to compile our kernels manually please see our [advanced installation docs](https://github.com/microsoft/DeepSpeed-Kernels#source).
123+
For ease of use and significant reduction in lengthy compile times that many projects require in this space we distribute a pre-compiled python wheel covering the majority of our custom kernels through a new library called [DeepSpeed-Kernels](https://github.com/deepspeedai/DeepSpeed-Kernels). We have found this library to be very portable across environments with NVIDIA GPUs with compute capabilities 8.0+ (Ampere+), CUDA 11.6+, and Ubuntu 20+. In most cases you shouldn't even need to know this library exists as it is a dependency of DeepSpeed-MII and will be installed with it. However, if for whatever reason you need to compile our kernels manually please see our [advanced installation docs](https://github.com/deepspeedai/DeepSpeed-Kernels#source).
124124

125125
## Non-Persistent Pipeline
126126

@@ -321,13 +321,14 @@ Users can also control the generation characteristics for individual prompts (i.
321321

322322
# Contributing
323323

324-
This project welcomes contributions and suggestions. Most contributions require you to agree to a
325-
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
326-
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
324+
This project welcomes contributions and suggestions.
327325

328-
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
329-
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
330-
provided by the bot. You will only need to do this once across all repos using our CLA.
326+
DeepSpeed-MII has adopted the [DCO](https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin). All deepspeedai repos require a DCO.
327+
(DeepSpeed previously used a CLA which is being replaced with DCO).
328+
329+
DCO is provided by including a sign-off-by line in commit messages. Using the `-s` flag for `git commit` will automatically append this line.
330+
For example, running `git commit -s -m 'commit info.'` will produce a commit that has the message `commit info. Signed-off-by: My Name <my_email@my_company.com>.`
331+
The DCO bot will ensure commits are signed with an email address that matches the commit author before they are eligible to be merged.
331332

332333
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
333334
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or

docs/source/index.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,15 @@ democratize powerful model inference with a focus on high-throughput, low
1414
latency, and cost-effectiveness.
1515

1616
MII v0.1 introduced several features as part of our `DeepSpeed-FastGen release
17-
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
17+
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
1818
such as blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor
1919
parallelism, and high-performance CUDA kernels to support fast high throughput
2020
text-generation with LLMs. The latest version of MII delivers up to 2.5 times
2121
higher effective throughput compared to leading systems such as vLLM. For
2222
detailed performance results please see our `DeepSpeed-FastGen release blog
23-
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
23+
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
2424
and the `latest DeepSpeed-FastGen blog
25-
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19>`_.
25+
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19>`_.
2626

2727
MII-Legacy
2828
----------
@@ -32,9 +32,9 @@ We first `announced MII <https://www.deepspeed.ai/2022/10/10/mii.html>`_ in
3232
of DeepSpeed-FastGen. MII-Legacy, which covers all prior releases up to v0.0.9,
3333
provides support for running inference for a wide variety of language model
3434
tasks. We also support accelerating `text2image models like Stable Diffusion
35-
<https://github.com/Microsoft/DeepSpeed-MII/tree/main/mii/legacy/examples/benchmark/txt2img>`_.
35+
<https://github.com/deepspeedai/DeepSpeed-MII/tree/main/mii/legacy/examples/benchmark/txt2img>`_.
3636
For more details on our previous releases please see our `legacy APIs
37-
<https://github.com/Microsoft/DeepSpeed-MII/tree/main/mii/legacy/>`_.
37+
<https://github.com/deepspeedai/DeepSpeed-MII/tree/main/mii/legacy/>`_.
3838

3939

4040
Contents

docs/source/install.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,11 @@ pip to install from source:
1919

2020
.. code-block:: console
2121
22-
(.venv) $ pip install git+https://github.com/Microsoft/DeepSpeed-MII.git
22+
(.venv) $ pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git
2323
2424
Or you can clone the repository and install:
2525

2626
.. code-block:: console
2727
28-
(.venv) $ git clone https://github.com/Microsoft/DeepSpeed-MII.git
28+
(.venv) $ git clone https://github.com/deepspeedai/DeepSpeed-MII.git
2929
(.venv) $ pip install ./DeepSpeed-MII

examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# MII Examples
2-
Please see [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/mii) for a few examples on using MII.
2+
Please see [DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/tree/master/inference/mii) for a few examples on using MII.

mii/aml_related/templates.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,8 +165,8 @@
165165
RUN /opt/miniconda/envs/amlenv/bin/pip install torch torchvision --index-url https://download.pytorch.org/whl/cu113 && \
166166
/opt/miniconda/envs/amlenv/bin/pip install -r "$BUILD_DIR/requirements.txt" && \
167167
/opt/miniconda/envs/amlenv/bin/pip install azureml-inference-server-http && \
168-
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed.git && \
169-
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed-MII.git && \
168+
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed.git && \
169+
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git && \
170170
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/huggingface/transformers.git
171171
172172

mii/legacy/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
<!-- [![Build Status](https://github.com/microsoft/deepspeed-mii/workflows/Build/badge.svg)](https://github.com/microsoft/DeepSpeed-MII/actions) -->
2-
[![Formatting](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml)
3-
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
1+
<!-- [![Build Status](https://github.com/deepspeedai/DeepSpeed-mii/workflows/Build/badge.svg)](https://github.com/deepspeedai/DeepSpeed-MII/actions) -->
2+
[![Formatting](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml)
3+
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/deepspeedai/DeepSpeed/blob/master/LICENSE)
44
[![PyPI version](https://badge.fury.io/py/deepspeed-mii.svg)](https://pypi.org/project/deepspeed-mii/)
55
<!-- [![Documentation Status](https://readthedocs.org/projects/deepspeed/badge/?version=latest)](https://deepspeed.readthedocs.io/en/latest/?badge=latest) -->
66

@@ -195,7 +195,7 @@ result = generator.query({"query": ["DeepSpeed is", "Seattle is"]}, do_sample=Tr
195195

196196
```
197197

198-
You can find a complete example [here]("https://github.com/microsoft/DeepSpeed-MII/tree/main/examples/non_persistent")
198+
You can find a complete example [here]("https://github.com/deepspeedai/DeepSpeed-MII/tree/main/examples/non_persistent")
199199

200200
Any HTTP client can be used to call the APIs. An example of using curl is:
201201
```bash

mii/legacy/aml_related/templates.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,8 +165,8 @@
165165
RUN /opt/miniconda/envs/amlenv/bin/pip install torch torchvision --index-url https://download.pytorch.org/whl/cu113 && \
166166
/opt/miniconda/envs/amlenv/bin/pip install -r "$BUILD_DIR/requirements.txt" && \
167167
/opt/miniconda/envs/amlenv/bin/pip install azureml-inference-server-http && \
168-
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed.git && \
169-
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed-MII.git && \
168+
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed.git && \
169+
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git && \
170170
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/huggingface/transformers.git
171171
172172

0 commit comments

Comments
 (0)