Skip to content

Commit 38b41df

Browse files
cmikeh2jefframrwyattiiawan-10Masahiro Tanaka
authored
DeepSpeed-FastGen (#4604)
Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Logan Adams <[email protected]>
1 parent 737ef29 commit 38b41df

File tree

263 files changed

+19167
-33
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

263 files changed

+19167
-33
lines changed

.github/workflows/nv-a6000.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
name: nv-a6000
2+
3+
on:
4+
pull_request:
5+
paths-ignore:
6+
- 'docs/**'
7+
- 'blogs/**'
8+
workflow_dispatch:
9+
10+
concurrency:
11+
group: ${{ github.workflow }}-${{ github.ref }}
12+
cancel-in-progress: true
13+
14+
permissions:
15+
contents: read
16+
issues: write
17+
18+
jobs:
19+
unit-tests:
20+
runs-on: [self-hosted, nvidia, a6000]
21+
container:
22+
image: nvcr.io/nvidia/pytorch:23.03-py3
23+
ports:
24+
- 80
25+
options: --gpus all --shm-size "8G"
26+
27+
steps:
28+
- uses: actions/checkout@v3
29+
30+
- name: Check container state
31+
run: |
32+
ldd --version
33+
nvcc --version
34+
nvidia-smi
35+
python -c "import torch; print('torch:', torch.__version__, torch)"
36+
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
37+
- name: Install transformers
38+
run: |
39+
git clone https://github.com/huggingface/transformers
40+
cd transformers
41+
git rev-parse --short HEAD
42+
python -m pip install .
43+
- name: Install deepspeed
44+
run: |
45+
python -m pip install docutils==0.18.1 jinja2==3.0 urllib3==1.26.11 ninja
46+
python -m pip install .[dev,1bit,autotuning]
47+
ds_report
48+
- name: Python environment
49+
run: |
50+
python -m pip list
51+
- name: Unit tests
52+
run: |
53+
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
54+
cd tests
55+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2' unit/ --torch_ver="2.0" --cuda_ver="12"
56+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2_ops' unit/ --torch_ver="2.0" --cuda_ver="12"

.github/workflows/nv-pre-compile-ops.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
#python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
3434
- name: Compile DeepSpeed Ops
3535
run: |
36-
TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0" DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_EVOFORMER_ATTN=0 pip3 install .
36+
TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0" DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_RAGGED_DEVICE_OPS=0 DS_BUILD_EVOFORMER_ATTN=0 pip3 install .
3737
- name: DS Report
3838
run: |
3939
ds_report

.gitmodules

Whitespace-only changes.

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ repos:
4949
entry: ./scripts/check-license.py
5050
language: python
5151
files: \.(py|c|cpp|cu|cc|h|hpp|cuh|hip|tr)$
52+
exclude: ^(deepspeed/inference/v2/kernels/ragged_ops/blocked_flash|deepspeed/inference/v2/kernels/cutlass_ops/grouped_gemm)
5253

5354
- repo: https://github.com/codespell-project/codespell
5455
rev: v2.1.0

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
include *.txt README.md
2+
include deepspeed/inference/v2/kernels/ragged_ops/libs/*.so
3+
include deepspeed/inference/v2/kernels/cutlass_ops/libs/*.so
24
recursive-include requirements *.txt
35
recursive-include deepspeed *.cpp *.h *.cu *.hip *.tr *.cuh *.cc *.json
46
recursive-include csrc *.cpp *.h *.cu *.tr *.cuh *.cc

accelerator/cuda_accelerator.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,26 @@ def max_memory_reserved(self, device_index=None):
153153
def total_memory(self, device_index=None):
154154
return torch.cuda.get_device_properties(device_index).total_memory
155155

156+
def _get_nvml_gpu_id(self, torch_gpu_id):
157+
"""
158+
credit: https://discuss.pytorch.org/t/making-pynvml-match-torch-device-ids-cuda-visible-devices/103020
159+
160+
Remap torch device id to nvml device id, respecting CUDA_VISIBLE_DEVICES.
161+
162+
If the latter isn't set return the same id
163+
"""
164+
# if CUDA_VISIBLE_DEVICES is used automagically remap the id since pynvml ignores this env var
165+
if "CUDA_VISIBLE_DEVICES" in os.environ:
166+
ids = list(map(int, os.environ.get("CUDA_VISIBLE_DEVICES", "").split(",")))
167+
return ids[torch_gpu_id] # remap
168+
else:
169+
return torch_gpu_id
170+
156171
def available_memory(self, device_index=None):
157172
if pynvml:
158-
handle = pynvml.nvmlDeviceGetHandleByIndex(device_index)
173+
if device_index is None:
174+
device_index = self.current_device()
175+
handle = pynvml.nvmlDeviceGetHandleByIndex(self._get_nvml_gpu_id(device_index))
159176
info = pynvml.nvmlDeviceGetMemoryInfo(handle)
160177
return info.free
161178
else:

csrc/includes/activation_type.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
// Copyright (c) Microsoft Corporation.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
// DeepSpeed Team
5+
6+
#pragma once
7+
8+
enum ActivationType {
9+
GELU = 0,
10+
RELU = 1,
11+
SILU = 2,
12+
GEGLU = 3,
13+
ReGLU = 4,
14+
SiGLU = 5,
15+
IDENTITY = 6,
16+
InvalidType = -1
17+
};

csrc/includes/ds_kernel_utils.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ used throughout the codebase.
1111
#pragma once
1212

1313
#include <cuda.h>
14+
#include <cuda_fp16.h>
15+
16+
#ifdef BF16_AVAILABLE
17+
#include <cuda_bf16.h>
18+
#endif
1419

1520
#define DS_HD_INLINE __host__ __device__ __forceinline__
1621
#define DS_D_INLINE __device__ __forceinline__

0 commit comments

Comments
 (0)