Releases · deepspeedai/DeepSpeed

17 Apr 00:44

loadams

v0.16.6

227a60c

v0.16.6 Patch Release

What's Changed

Update version.txt after 0.16.5 release by @loadams in #7180
Cross layer overlapping for domino by @hwchen2017 in #7178
async tp allreduce by @inkcherry in #7115
Fix issue #5242 grad_norm and loss is nan by @Glaceon-Hyy in #7171
Add qwen3 autotp support by @Yejing-Lai in #7187
Update to new torch grad hook API: BF16Optimizer and Stage2 by @deepcharm in #7189
Reland perf fix for nan inf check by @nelyahu in #7184
Update to fix pydantic warning by @loadams in #7193
update dependencies version info by @inkcherry in #7206
HPU accelerator memory mapping is broken because of torch fill uninit memory by @oelayan7 in #7209
Support complicated use cases with TiedLayerSpec by @limjcst in #7208
Add defence for offload_states and reload_states w/o optimizer by @HollowMan6 in #7211
DeepCompile for enhanced compiler integration by @tohtana in #7154

New Contributors

@Glaceon-Hyy made their first contribution in #7171
@limjcst made their first contribution in #7208

Full Changelog: v0.16.5...v0.16.6

Contributors

Glaceon-Hyy, limjcst, and 9 other contributors

Assets 2

27 Mar 21:47

loadams

v0.16.5

20f988e

v0.16.5 Patch Release

What's Changed

Update version.txt after 0.16.4 release by @loadams in #7063
fix an outdated doc wrt CUDA_VISIBLE_DEVICES by @stas00 in #7058
Tecorigin sdaa accelerator by @siqi654321 in #6903
Handle special case of libuv for Windows by @loadams in #7064
Bug Fix for offload_states API by @U-rara in #7050
Update README with info on newest accelerator by @loadams in #7065
Fix TOCTOU issues, switch to fstat by @loadams in #7067
config torch to avoid graph breaks caused by logger by @ShellyNR in #6999
Fix meta load tensor imcompatible issue by @Yejing-Lai in #7073
Replace calls to python setup.py sdist with python -m build --sdist by @loadams in #7069
Revert "Handle special case of libuv for Windows (#7064)" by @loadams in #7076
Add DeepseekV3 AutoTP. by @Yejing-Lai in #7045
Improve inference tutorial docs by @loadams in #7083
Pin transformers version on tests that use latest. by @loadams in #7085
Update README.md with ICS '23 MoE paper link by @siddharth9820 in #7087
Update parallelism for nv-torch-latest/nightly tests due to more GPUs/runner by @loadams in #7086
Remove workflows for very old torch versions by @loadams in #7090
Use new dlpack api; Formatting fixes by @tjruwase in #7101
Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx by @deepcharm in #7081
Avoid graph breaks in torch.compile caused by inner classes in the backward hooks by @deepcharm in #7062
Only run pre-commit on the changes by @hwchen2017 in #7106
Avoid graph break due to unsupported frozenset by @deepcharm in #7105
Fix fused_qkv print model ValueError by @Yejing-Lai in #7109
Update references to new X/Twitter handle by @loadams in #7110
Update gaudi2 nightly,ci to latest 1.20.0 build by @raza-sikander in #7093
fix keep_module_on_host by @inkcherry in #7112
Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs by @loadams in #7131
Training multiple models by @tjruwase in #7018
Update CONTRIBUTING.md to reflect changes from CLA to DCO by @loadams in #7135
Avoid missing attr error by @tjruwase in #7133
Add conditional expression by @A-transformer in #7119
Unpin transformers version for most workflows by @loadams in #7139
Conditionally quote env vars by @saurabhkoshatwar in #7071
Correct the BACKWARD_PREFETCH_SUBMIT mismatch by @A-transformer in #7120
Enhance Gaudi2 CI/Nightly Coverage with Model Parallelism and Linear Tests by @raza-sikander in #7146
Update container version that runs on A6000 tests. by @loadams in #7153
hf tp+zero training doc. by @inkcherry in #7151
Avoid graph break by removing redundant requires_grad attr change by @deepcharm in #7158
Add destroy to tests to free memory by @tohtana in #7160
[NFC] Typo fix in SP layer. by @c8ef in #7152
Link AutoTP blog in the front page by @hwchen2017 in #7167
fix seq_parallel_communication_data_type constant. by @stas00 in #7175
Fix typos in GDS blog by @loadams in #7177
Variable batch size and LR scheduler by @bm-synth in #7104

New Contributors

@siqi654321 made their first contribution in #6903
@A-transformer made their first contribution in #7119
@saurabhkoshatwar made their first contribution in #7071
@c8ef made their first contribution in #7152

Full Changelog: v0.16.4...v0.16.5

Contributors

tjruwase, stas00, and 15 other contributors

Assets 2

20 Feb 15:25

loadams

v0.16.4

e2dc3ee

v0.16.4 Patch Release

What's Changed

Update version.txt after 0.16.3 release by @loadams in #6965
Precisely track nvme optimizer offload by @tjruwase in #6963
Update build_win.bat script to exclue GDS op as it lacks Windows support. by @loadams in #6971
Add CUDA 12.8 support and comment on CUDA 12.7 by @loadams in #6975
Update cpu torch latest to use torch 2.6 by @loadams in #6977
generalize deepspeed linear and implement it for non cuda systems by @oelayan7 in #6932
Update recommended Windows whl building versions by @loadams in #6983
Title: Fix setup_env_ranks to Properly Set Environment Variables Instead of Raising Error by @fabiosanger in #6979
Specify torchvision in nv-ds-chat workflow (prevents errors with torch 2.6) by @loadams in #6982
Remove assumption that padding only occurs on last rank by @xylian86 in #6974
Use ds-specific module id to avoid conflicts by @tjruwase in #6847
Update A6000 workflows to use newer docker container - 24.09 vs 24.03 by @loadams in #6967
Allow NVIDIA Blackwell by @fabiendupont in #6991
Update GH org references by @tjruwase in #6998
[XPU] max1100 workflow update for docker and softwares by @Liangliang-Ma in #7003
autotp training(fix dco) by @inkcherry in #7004
import triton files when triton is supported and installed by @oelayan7 in #6989
Update A6000 tests transformers version by @loadams in #7016
Fix ds-chat CI regression by @tjruwase in #7015
[Ulysses tutorial] typos by @stas00 in #7024
fix hostname -I for macOS #6497 by @fitzjalen in #6990
Update workflows to cuda 12.4 by @loadams in #7000
[ROCm] Enable fp_quantizer on ROCm by @rraminen in #7027
add gds chinese blog by @GuanhuaWang in #7034
Add chinese blog for deepspeed windows, and fix format by @hwchen2017 in #7035
AIO on ROCM by @jomayeri in #7023
Control trace cache warnings by @tjruwase in #7039
Update CUDA compute capability to support Blackwell by @hwchen2017 in #7047
Update setup.py handling of ROCm cupy by @loadams in #7051
nv-ds-chat breaks with latest transformers by @loadams in #7052
Rename aio_thread_count to intra_op_parallelism by @tjruwase in #7056
add autoTP training zero2 tests by @inkcherry in #7049
Fix, bf16 optimizer remove dup loop by @wukong1992 in #7054

New Contributors

@fabiosanger made their first contribution in #6979
@fabiendupont made their first contribution in #6991
@fitzjalen made their first contribution in #6990
@wukong1992 made their first contribution in #7054

Full Changelog: v0.16.3...v0.16.4

Contributors

fabiendupont, tjruwase, and 13 other contributors

Assets 2

21 Jan 22:30

loadams

v0.16.3

c17dc33

v0.16.3 Patch Release

What's Changed

Update version.txt after 0.16.2 release by @loadams in #6893
Allow to compile collective for PT>2.3 by @NirSonnenschein in #6899
Zero2: avoid graph breaks in torch.compile by using param_idx by @nelyahu in #6803
hpu_accelerator: use torch.use_deterministic_algorithms by @nelyahu in #6897
Fix error caused by all_reduce call in domino by @hwchen2017 in #6880
Update Gaudi2 jobs to latest 1.19 build by @raza-sikander in #6905
Change compile for pipeline module torch.compile by @NirSonnenschein in #6478
Stage3: Use new torch grad accumulation hooks API by @deepcharm in #6773
Cleanup ops/transformer/inference tests by @loadams in #6830
Fix checkpointable_layers Logic by @Quentin-Anthony in #6881
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm by @hj-wei in #6909
Add fp8_gemm fallback for non-triton systems by @oelayan7 in #6916
Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module) by @inkcherry in #6694
Cleanup ops/transformer/inference tests by @loadams in #6925
Check transformers version in BLOOM for inference v1 by @lekurile in #6766
inference: remove unused _validate_args function by @nelyahu in #5505
Use torch.log1p by @kit1980 in #6930
Update python version classifiers by @loadams in #6933
Fix building on Windows with presence of Triton by @woct0rdho in #6749
Fix windows blog examples by @loadams in #6934
Add deepseek autotp by @Yejing-Lai in #6937
Add position_ids arg to OPTEmbedding forward function by @lekurile in #6939
Add information on security expectations with this software by @loadams in #6941
Support pure meta model lm_head tp by @Yejing-Lai in #6812
Remove op compilation flags due to perf issue by @NirSonnenschein in #6944
Pin nv-a6000 workflow by @loadams in #6938
[inf] Add config var to enable keeping module on host by @oelayan7 in #6846
warn to warning by @qgallouedec in #6952
Add extra_repr to Linear classes for debugging purpose by @Xia-Weiwen in #6954
Update import for torchvision.transformers by @loadams in #6958
Remove Duplicate Declaration of pandas in Dockerfile by @Zerohertz in #6959
Add the missing view operations from sequence parallel(async). by @inkcherry in #6750
Update torch.norm to torch.linalg.norm and torch.linalg.vector_norm by @loadams in #6931
Using explicit GPU upcast for ZeRO-Offload by @xylian86 in #6962

New Contributors

@hj-wei made their first contribution in #6909
@kit1980 made their first contribution in #6930
@woct0rdho made their first contribution in #6749
@Xia-Weiwen made their first contribution in #6954
@Zerohertz made their first contribution in #6959

Full Changelog: v0.16.2...v0.16.3

Contributors

kit1980, Quentin-Anthony, and 16 other contributors

Assets 2

18 Dec 17:51

loadams

v0.16.2

b344c04

v0.16.2 Patch Release

What's Changed

Update pre-commit version by @loadams in #6821
Update version.txt after 0.16.1 release by @loadams in #6826
Pin HPU tests by @loadams in #6831
Flops profiler support einops.einsum by @lvhoaa in #6755
Pin pytest-subtests version for accelerate tests by @loadams in #6842
Inference UTs check for trition support from accelerator by @raza-sikander in #6782
Unpin pytest-subtests now that 0.14.1 is released by @loadams in #6844
Merge LoCo with Zero++ by @XingyuXie in #6730
Fix type error in ZeROOrderedDict by @oraluben in #6794
Fix uneven head sequence parallelism bug (#6774) by @Eugene29 in #6797
Fix nv-torch-nightly test by pinning transformers by @loadams in #6849
Remove broken links to non-active site by @kaiksi-bb in #6854
Avoid poisoning process with CUDA calls as soon as importing by @HollowMan6 in #6810
Fix xpu tests workflow failure by changing pip index url by @Liangliang-Ma in #6864
Domino updates by @GuanhuaWang in #6861
add domino navigation by @GuanhuaWang in #6866
Update TSC by @tjruwase in #6867
Remove warnings from autodoc and sphinx by @loadams in #6788
Update real_accelerator.py by @keiwoo in #6845
Fix assertion for offloading states by @tohtana in #6855
Remove pin from transformers version and fix Processing/Threading issues in tests by @loadams in #6822
Add MLP/lm_head tp grain size setting. by @Yejing-Lai in #6828
Fix --enable_each_rank_log when used with PDSH multi-node runner by @akeshet in #6863
Update transformers ops unit tests to use requried_torch_version by @loadams in #6884
Don't error out when cpu accelerator doesn't have torch (as default for whl building) by @loadams in #6886
Add arctic model support by adding w2 to all_reduce by @pi314ever in #6856
Update code owners by @tjruwase in #6890

New Contributors

@lvhoaa made their first contribution in #6755
@XingyuXie made their first contribution in #6730
@Eugene29 made their first contribution in #6797
@kaiksi-bb made their first contribution in #6854
@HollowMan6 made their first contribution in #6810
@keiwoo made their first contribution in #6845
@akeshet made their first contribution in #6863
@pi314ever made their first contribution in #6856

Full Changelog: v0.16.1...v0.16.2

Contributors

akeshet, tjruwase, and 14 other contributors

Assets 2

05 Dec 21:46

loadams

v0.16.1

95ead2a

v0.16.1 Patch Release

What's Changed

Update version.txt after 0.16.0 release by @loadams in #6786
Domino news update on readme.md by @GuanhuaWang in #6815
Fix zero checkpoint by @xu-song in #6792
Update python version but now we need to include setuptools on our own by @loadams in #6787
Adding the new feature of FPDT by @YJHMITWEB in #6462
Pin transformers to avoid errors with latest version by @loadams in #6820
Ulyssess offload blog by @samadejacobs in #6814
add FPDT tutorial by @samadejacobs in #6813
Update README.md by @samadejacobs in #6824
Update README.md by @samadejacobs in #6825
Pin transformers version in cpu-torch-latest due to multiprocessing error. by @loadams in #6823

Full Changelog: v0.16.0...v0.16.1

Contributors

xu-song, GuanhuaWang, and 3 other contributors

Assets 2

25 Nov 20:10

loadams

v0.16.0

e5570b1

DeepSpeed v0.16.0

What's Changed

Update version.txt after 0.15.4 release by @loadams in #6731
Update GH hosted workflows to 24.04 by @loadams in #6717
Add COMMITTER file by @tjruwase in #6741
Update AMD apex version by @loadams in #6739
Fix Type Name Inconsistency & Typo in cpu_adam by @xylian86 in #6732
Add Domino code by @zhangsmallshark in #6733
Add data type check for bf16 by @hwchen2017 in #6742
add zero3 module_granularity_threshold to zero optimization. by @inkcherry in #6649
AIO File Offsets by @jomayeri in #6641
Update path for BingBertSquad from DeepSpeedExamples by @loadams in #6746
Sanitize inputs to eval() by @loadams in #6745
Adding the governance doc by @minjiazhang in #6748
Add no_sync context manager by @tjruwase in #6675
Gaudi2 Nightly job for daily check by @raza-sikander in #6753
Disable failing python tests by @loadams in #6758
A faster and more memory-efficient implementation of zero_to_fp32 by @xu-song in #6658
Pin transformers version to work around latest torch requirements by @loadams in #6759
make xpu ops compatible with oneapi 2025.0 by @baodii in #6760
Add explicit parameters for torch.load by @loadams in #6751
Fix setup.py bash cmd generation to correctly extract git info by @nelyahu in #6762
Use json_schema_extra instead of extra keyword in Field by @qgallouedec in #6764
Fix potential memory issues when use deepspeed Z3 by @wenbinc-Bin in #6726
Removes unnecessary cloning by @swigls in #6761
Enable torch compile on _allgather_params by @deepcharm in #6769
Unpin with latest transformers fixes by @loadams in #6763
docs: fix HF links by @imba-tjd in #6780
Fix Doc Error: ZeRO Stage 2 gradient partitioning by @yewentao256 in #6775
Cleanup code docs warnings by @loadams in #6783
Domino Blog by @GuanhuaWang in #6776
Update version.txt before release by @loadams in #6784
Revert release workflow by @loadams in #6785

New Contributors

@zhangsmallshark made their first contribution in #6733
@hwchen2017 made their first contribution in #6742
@minjiazhang made their first contribution in #6748
@qgallouedec made their first contribution in #6764
@wenbinc-Bin made their first contribution in #6726
@swigls made their first contribution in #6761
@imba-tjd made their first contribution in #6780
@yewentao256 made their first contribution in #6775

Full Changelog: v0.15.4...v0.16.0

Contributors

tjruwase, swigls, and 17 other contributors

Assets 2

08 Nov 16:26

loadams

v0.15.4

a1b0c35

v0.15.4 Patch Release

What's Changed

Update version.txt after 0.15.3 release by @loadams in #6652
Fix expert grad scaling problem with ZeRO optimizer by @wyooyw in #6546
Add attribute check for language_model when replace last linear module by @Yejing-Lai in #6650
fix init_device_mesh for torch 2.4 by @Lzhang-hub in #6614
Fix dynamo issue by @oraluben in #6527
sequence parallel for uneven heads by @inkcherry in #6392
Add fallback for is_compiling by @tohtana in #6663
Update profiler registration check by @loadams in #6668
Add support for H100/sm_90 arch compilation by @loadams in #6669
Update Gaudi2 docker image by @loadams in #6677
Update gaudi2 docker version to latest release (1.18) by @raza-sikander in #6648
Update base docker image for A6000 GPU tests by @loadams in #6681
Remove packages that no longer need to be updated in the latest container by @loadams in #6682
Fix training of pipeline based peft's lora model by @xuanhua in #5477
Update checkout action to latest version by @loadams in #5021
Add attribute check to support git-base autotp by @Yejing-Lai in #6688
fix memcpy issue on backward for zero-infinity by @xylian86 in #6670
Free memory in universal checkpointing tests by @tohtana in #6693
Explictly set device when reusing dist env by @tohtana in #6696
Update URL in README Pipeline Status for Huawei Ascend NPU by @xuedinge233 in #6706
Pin transformers to 4.45.2 in nv-ds-chat workflow by @loadams in #6710
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 by @jagadish-amd in #6622
Use one param coordinator for both train/inference scenarios by @tohtana in #6662
Update yapf version by @loadams in #6721
Update flake8 version by @loadams in #6722
Switch what versions of python are supported by @loadams in #5676

New Contributors

@wyooyw made their first contribution in #6546
@xuanhua made their first contribution in #5477

Full Changelog: v0.15.3...v0.15.4

Contributors

oraluben, wyooyw, and 10 other contributors

Assets 2

22 Oct 21:12

jomayeri

v0.15.3

a24cdd6

v0.15.3 Patch Release

What's Changed

Update version.txt after 0.15.2 release by @loadams in #6615
Clean up prefetched parameters by @tohtana in #6557
AIO CPU Locked Tensor by @jomayeri in #6592
reduce setting global variables to reduce torch compile graph breaks by @NirSonnenschein in #6541
Add API to get devices of offload states by @tohtana in #6586
Ignore reuse_dist_env by @tohtana in #6623
Add API for updating ZeRO gradients by @tjruwase in #6590
[compile] Show breakdown of graph break by @delock in #6601
Accept btl_tcp_if_include option through launcher_args by @diskkid in #6613
Add first Step in LR Schedulers by @jomayeri in #6597
Support safetensors export by @xu-song in #6579
add option to disable logger while compiling to avoid graph breaks by @ShellyNR in #6496
Lock cache file of HF model list by @tohtana in #6628
Add README Pipeline Status for Huawei Ascend NPU by @xuedinge233 in #6588
Update torch version in workflows by @tohtana in #6631
Use file store for tests by @tohtana in #6632
Fix Memory Leak In AIO by @jomayeri in #6630
[XPU] upgrade xpu max1100 CI workflow to pytorch2.3 by @Liangliang-Ma in #6646
[XPU] host timer check version from Torch 2.5 to Torch 2.6 by @YizhouZ in #6633
[XPU] [DeepNVMe] use same cpu_op_desc_t with cuda by @Liangliang-Ma in #6645

New Contributors

@diskkid made their first contribution in #6613
@ShellyNR made their first contribution in #6496

Full Changelog: v0.15.2...v0.15.3

Contributors

tjruwase, diskkid, and 10 other contributors

Assets 2

09 Oct 17:46

jomayeri

v0.15.2

474a328

v0.15.2 Patch Release

What's Changed

Update version.txt after 0.15.1 release by @loadams in #6493
HPU: add required ENV vars to acccelerator init by @nelyahu in #6495
Op_builder->is_compatible quite warning by @terry-for-github in #6093
fix pipeline eval_batch micro_batches argument for schedule by @nelyahu in #6484
Fix the broken url link by @rogerxfeng8 in #6500
fix environment variable export bug for MultiNodeRunner by @TideDra in #5878
Revert "BF16 optimizer: Clear lp grads after updating hp grads in hook" by @nelyahu in #6508
wrap include cuda_bf16.h with ifdef BF16_AVAILABLE by @oelayan7 in #6520
Avoid security issues of subprocess shell by @tjruwase in #6498
Add conditional on torch version for scaled_dot_product_attention by @loadams in #6517
Added Intel Gaudi to Accelerator Setup Guide by @ShifaAbu in #6543
Skip failing newly added tests in accelerate by @loadams in #6574
Use msgpack for p2p comm by @tohtana in #6547
DeepNVMe perf tuning by @tjruwase in #6560
[Accelerator] Cambricon MLU support by @Andy666G in #6472
Fix gradient accumulation for Z2+offload by @tohtana in #6550
fix errors when setting zero3 leaf modules with torch.compile by @NirSonnenschein in #6564
[XPU] Support DeepNVMe new code structure by @Liangliang-Ma in #6532
Add APIs to offload states of model, optimizer, and engine by @tohtana in #6011
add bfloat16 to inference support dtypes by @nelyahu in #6528
[COMPILE] workflow for deepspeed + torch.compile by @YizhouZ in #6570
Fixes on the accelerate side mean we do not need to skip this test by @loadams in #6583
Fix torch include in op_builder/mlu/fused_adam.py and update no-torch workflow triggers by @loadams in #6584
[ROCm] Fix subprocess error by @jagadish-amd in #6587
Cleanup CODEOWNERS file to be valid by @loadams in #6603
Add SSF Best practices badge by @loadams in #6604
Move V100 workflows from cuda 11.1/11.7 to 12.1 by @loadams in #6607
Fix SD workflow by @loadams in #6609
Pin accelerate to fix CI failures/issues by @loadams in #6610
Add llama3.2 vision autotp by @Yejing-Lai in #6577
Improve DS logging control by @tjruwase in #6602
Fix device selection using CUDA_VISIBLE_DEVICES by @tohtana in #6530
Handle when backend is also in compile_kwargs by @oraluben in #6502
Rearrange inference OPS and stop using builder.load by @oelayan7 in #5490
Unpin accelerate tests, update lightning with node16 removal. by @loadams in #6611
Enabled Qwen2-MoE Tensor Parallelism (TP) inference by @gyou2021 in #6551

New Contributors

@TideDra made their first contribution in #5878
@ShifaAbu made their first contribution in #6543
@jagadish-amd made their first contribution in #6587
@gyou2021 made their first contribution in #6551

Full Changelog: v0.15.1...v0.15.2

Contributors

tjruwase, oraluben, and 15 other contributors

Assets 2

Releases: deepspeedai/DeepSpeed

v0.16.6 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.5 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.4 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.3 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.2 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.1 Patch Release

What's Changed

Contributors

Uh oh!

DeepSpeed v0.16.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.4 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.3 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.15.2 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!