Releases · deepspeedai/DeepSpeed

19 Sep 19:56

loadams

v0.17.6

e4f6da9

v0.17.6 Patch Release Latest

Latest

What's Changed

Update version.txt after 0.17.5 release by @loadams in #7502
Support DeepSpeed offload and reload states with ZeRO1 and ZeRO2 by @LYMDLUT in #7421
CI funding shout out to modal.com by @stas00 in #7503
Fix assert when 'pp_int' object has no attribute 'custom_print_str' by @aeeeeeep in #7507
Update TSC Committers by @PKUWZP in #7517
Enabling Muon Optimizer in DeepSpeed by @PKUWZP in #7509
Enable non-ZeRO mode by @sfc-gh-truwase in #7515
Update README with ZenFlow release blog featured by PyTorch. by @Antlera in #7520
Add riscv64 cpu support in deepspeed_shm_comm op by @heyujiao99 in #7519
ZeRO3: Improve mismatch detection by @sfc-gh-truwase in #7525
fix typo s/1014 /1024 by @digger-yu in #7528
undo the revert by @stas00 in #7536
[logging] less startup noise by @stas00 in #7526
[doc] fixing moe tutorial by @stas00 in #7538
docs typo: lrrt.md, reference to cycle_min_lr should be cycle_max_lr by @jakehemmerle in #7530
fixed DeepSpeedCPULion with ZeRO-Offload bug by @qibin0506 in #7531
Fix scaling and allgather with torch.autocast by @tohtana in #7534
Fix zenflow_torch_adam.py by @stas00 in #7544
Relax restrictions of torch.autocast integration by @tohtana in #7543
Autotune ZenFlow affinity by @delock in #7506
fix get_cuda_compile_flag by @mingjielu in #7521
avoid setting device_id to init_process_group by @kaixuanliu in #7542
Improve error message and reduce validation in autocast test by @tohtana in #7547
Revert "Add index to HPU devices (#7497)" by @deepcharm in #7545
[ALST tutorial] support bs>1 by @sfc-gh-sbekman in #7550
[MoE] Fix misuse of num_experts as expert parallel group size (ep_size) by @Flakes342 in #7551
Limit random seed range in tests by @tohtana in #7553
Fix gradient buffer access for DeepCompile Z1/2 by @tohtana in #7548
Move modal tests to tests/v1 by @tohtana in #7557
Add dependency for deepcompile test by @tohtana in #7558
deepcompile: Create dummy inputs using empty_strided by @eternalNight in #7564
deepcompile: Record graph order using OrderedDict by @eternalNight in #7563
deepcompile: Create a full list of no-copy ops by @eternalNight in #7562
fix npu device_id AttributeError issue by @we1sper in #7560
Make Muon optimizer easier to enable by @delock in #7555
scripts: Check .is_cuda only in non-C++ files by @eternalNight in #7561
[bugfix] fix partition context unpatch by @hjh0119 in #7566

New Contributors

@LYMDLUT made their first contribution in #7421
@aeeeeeep made their first contribution in #7507
@heyujiao99 made their first contribution in #7519
@jakehemmerle made their first contribution in #7530
@qibin0506 made their first contribution in #7531
@mingjielu made their first contribution in #7521
@kaixuanliu made their first contribution in #7542
@sfc-gh-sbekman made their first contribution in #7550
@Flakes342 made their first contribution in #7551
@we1sper made their first contribution in #7560
@hjh0119 made their first contribution in #7566

Full Changelog: v0.17.5...v0.17.6

Contributors

eternalNight, qibin0506, and 19 other contributors

Assets 2

20 Aug 20:28

sfc-gh-truwase

v0.17.5

047a759

v0.17.5 Patch Release

What's Changed

Update version.txt after v0.17.4 release by @loadams in #7460
Update README.md by @PKUWZP in #7465
Add getter APIs for TP/PP/DP ranks in DeepSpeedEngine by @WoosungMyung in #7427
fix issues raised by Coverity scans by @NirSonnenschein in #7431
Fix all-gather duplicate params and wrong dtype by @eternalNight in #7462
fix #7188 by @lpnpcs in #7371
add --bind_cores_to_rank to zero offload tutorial by @delock in #7474
Add blog for ZenFlow by @Antlera in #7463
Fix cpu CI by @sfc-gh-truwase in #7481
fix deepspeed --venv_script by @stas00 in #7469
Modal CI by @sfc-gh-truwase in #7289
[UlyssesSPDataLoaderAdapter] fix iterator reset by @stas00 in #7472
[TiledFusedLogitsLoss] support inference by @stas00 in #7477
Fix pre-compile on cpu-only machines by @AlongWY in #7168
Enable forked PRs by @sfc-gh-truwase in #7486
fix xpu device_id AttributeError issue by @yao-matrix in #7488
Add Zenflow code for Stage 1 & 2 by @Antlera in #7391
Fix invalid f-strings by @cyyever in #7457
Fix DeepCompile for PyTorch v2.8 by @tohtana in #7496
Reduce performance impact of compiler.enable decorator by @deepcharm in #7498
Add index to HPU devices by @deepcharm in #7497

New Contributors

@WoosungMyung made their first contribution in #7427
@eternalNight made their first contribution in #7462
@lpnpcs made their first contribution in #7371
@Antlera made their first contribution in #7463
@AlongWY made their first contribution in #7168
@yao-matrix made their first contribution in #7488
@cyyever made their first contribution in #7457

Full Changelog: v0.17.4...v0.17.5

Contributors

eternalNight, yao-matrix, and 13 other contributors

Assets 2

31 Jul 20:47

loadams

v0.17.4

c4b1a8c

v0.17.4 Patch Release

What's Changed

Update version.txt after 0.17.3 release. by @loadams in #7455
Fix: UnboundLocalError for variable 'dim' about issue by @weeknan in #7449
adding TiledFusedLogitsLoss by @stas00 in #7437
TiledFusedLogitsLoss bug fix by @stas00 in #7459

New Contributors

@weeknan made their first contribution in #7449

Full Changelog: v0.17.3...v0.17.4

Contributors

stas00, weeknan, and loadams

Assets 2

28 Jul 18:20

loadams

v0.17.3

092625c

v0.17.3 Patch Release

What's Changed

[TiledMLP]: fix for bs>1 by @stas00 in #7412
Update version.txt after v0.17.2 release. by @loadams in #7417
Enable torch version dependent compilation of record_module and iter_params by @deepcharm in #7362
[BUGFIX] Reset bucket.elements after reduction in ZeRO Stage 3 by @rahul713rk in #7418
Align missing argument in AllReduceCoalescedHandle by @deepcharm in #7414
Improvements to Communication Logger by @alexk101 in #7404
trying to fix nv-accelerate-v100.yml CI job by @stas00 in #7424
fix: Propagate strip_tensor_paddings by @saforem2 in #7426
Use past_key_value when provided by @deepcharm in #7428
set device_id in torch's init_process_group by @stas00 in #7266
[Ulysses-ALST] add FA3 support by @stas00 in #7430
TiledMLP + SequenceTiledCompute: improve the bs>1 use-case by @stas00 in #7422
Remove unused yaml test configurations and update README by @loadams in #7441
[ALST] fix typo in the url by @stas00 in #7444
[ALST] fix typo in the url part2 by @stas00 in #7446
Remove additional unused tests (human-eval) by @loadams in #7445
Fix: Adapt Llama injection policy for newer transformers versions by @huanyuqu in #7443

New Contributors

@rahul713rk made their first contribution in #7418
@huanyuqu made their first contribution in #7443

Full Changelog: v0.17.2...v0.17.3

Contributors

saforem2, stas00, and 5 other contributors

Assets 2

07 Jul 18:13

loadams

v0.17.2

15f054d

v0.17.2 Patch Release

What's Changed

Update version after 0.17.1 release by @loadams in #7345
s/UlyssesPlus/Arctic Long Sequence Training (ALST)/ by @stas00 in #7348
Don't break set_start_method by @tjruwase in #7349
Fix error of <glog/logging.h> by @Freed-Wu in #7351
Improve padding util for compile by @tohtana in #7355
Fix 404s by @tjruwase in #7363
Fix tutorial title by @stas00 in #7365
Restore real inputs for recompilation by @tohtana in #7356
Fix(scheduler): WarmupLR inherits optimizer lr when not specified by @Flink-ddd in #7360
sequence parallel default dtype by @stas00 in #7364
Enable torch.autocast with ZeRO by @tohtana in #6993
add Arctic Long Sequence Training paper reference by @stas00 in #7372
Flops profiler support for F.interpolate by @sfc-gh-truwase in #7353
Relax tolerances for FP8 unit test only for ROCm + FP16 by @rraminen in #7373
Update latest news with DeepNVMe by @loadams in #7375
Fix release of IPG buffer by @tohtana in #7376
fix wandb.log() call by removing sync kwarg by @ned2 in #7383
Fix dtype mismatch in TestParamPartitioningSkipInit by @tohtana in #7377
Add support for ws=1 scenario by @NirSonnenschein in #7379
fix(inference): Add missing dtype attribute to ParameterBase setter by @Flink-ddd in #7378
add blog link by @stas00 in #7385
fix broken url by @stas00 in #7390
add support for CUDAtk12.9 by @loscrossos in #7394
Fix unbound local error for return_val by @HollowMan6 in #7395
Fix ZeRO stage 1 and add stage 2 support with DeepCompile by @tohtana in #7366
Improve coverage of DeepCompile by @tohtana in #7386
Added device detection to communication logging by @alexk101 in #7398
fix: Add csrc/compile to include paths for DeepCompile builder by @HollowMan6 in #7401
fix: DeepCompile for torch 2.8 by @HollowMan6 in #7402
fix(comm): Expose GradBucket in deepspeed.comm API by @Flink-ddd in #7400
fix: fix FileNotFoundError for build_win.bat by @gjj2828 in #7399
fix: engine initializes optimizer attributes at the beginning by @HollowMan6 in #7410

New Contributors

@Freed-Wu made their first contribution in #7351
@Flink-ddd made their first contribution in #7360
@ned2 made their first contribution in #7383
@alexk101 made their first contribution in #7398
@gjj2828 made their first contribution in #7399

Full Changelog: v0.17.1...v0.17.2

Contributors

ned2, tjruwase, and 12 other contributors

Assets 2

09 Jun 22:52

loadams

v0.17.1

2ce5505

v0.17.1 Patch Release

What's Changed

Update version.txt after v0.17.0 release by @loadams in #7326
Ulysses Plus Docs by @stas00 in #7331
UlyssesPlus Docs take 2 by @stas00 in #7332
Improve Ulysses Plus Docs by @cynricfu in #7335
Update config_utils.py by @qgallouedec in #7333
Fix pytest version to 8.3.5 in hpu-gaudi actions by @raza-sikander in #7337
Fix issue with symint input by @tohtana in #7243
fp16 optimizer timers fix - TypeError: 'NoneType' object is not callable by @rraminen in #7330
DeepNVMe update by @tjruwase in #7215
fixed: Modified the topkgating function and modified the test_moe file for testing by @xiongjyu in #7163
Fix LoRA arxiv reference by @emmanuel-ferdman in #7340
Update folder name by @sfc-gh-truwase in #7343
Improve overflow handling in ZeRO by @tjruwase in #6976
Fix docs that are rendering Incorrectly by @felixgondwe in #7344
Move pytest pinning from individual tests to requirements-dev.txt until fixed. by @loadams in #7327

New Contributors

@cynricfu made their first contribution in #7335
@xiongjyu made their first contribution in #7163
@sfc-gh-truwase made their first contribution in #7343
@felixgondwe made their first contribution in #7344

Full Changelog: v0.17.0...v0.17.1

Contributors

felixgondwe, tjruwase, and 10 other contributors

Assets 2

02 Jun 23:02

loadams

v0.17.0

720787e

DeepSpeed v0.17.0

What's Changed

Update next version in version.txt after 0.16.9 release. by @loadams in #7306
Update COMMITTERS.md by @PKUWZP in #7305
Fix AutoTP gathering replaced layer params when bias is not None by @HollowMan6 in #7257
Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) by @arminzhu in #7309
Fix: Update grad norm calculation for CPU offload by @therealnaveenkamal in #7302
CI: prefer bf16 over fp16 by @stas00 in #7304
tests/conftest.py: automatically add local deepspeed repo when running tests by @stas00 in #7317
Update gaudi2 nightly,ci to latest 1.21.0 build by @raza-sikander in #7313
anchor transformers version by @stas00 in #7316
fix asymmetric in dequantize by @pencil-hub in #7283
Ulysses SP for HF Integration by @stas00 in #7268
Fix ci hang in torch2.7& improve ut by @inkcherry in #7321
Bump to v0.17.0 by @sfc-gh-mwyatt in #7324

New Contributors

@PKUWZP made their first contribution in #7305
@arminzhu made their first contribution in #7309
@therealnaveenkamal made their first contribution in #7302
@pencil-hub made their first contribution in #7283
@sfc-gh-mwyatt made their first contribution in #7324

Full Changelog: v0.16.9...v0.17.0

Contributors

PKUWZP, stas00, and 8 other contributors

Assets 2

22 May 21:56

loadams

v0.16.9

bdba823

v0.16.9 Patch Release

What's Changed

Update patch version after 0.16.8 release by @loadams in #7296
Avoid graph break by removing another redundant requires grad false by @deepcharm in #7263
Add qwen3 meta loading for AutoTP by @delock in #7293
Modernize system executable detection across components by @emmanuel-ferdman in #7290
Enable ZeRO set/get APIs for NVMe offload by @tjruwase in #7046
Add qwen3moe meta loading for AutoTP by @ranzhejiang in #7297
disable license check until the new license situation has been sorted… by @stas00 in #7301
Fix extra_repr_str when weight is None / in zero-3 by @HollowMan6 in #7254
[XPU] Support XCCL on deepspeed side by @ys950902 in #7299

New Contributors

@emmanuel-ferdman made their first contribution in #7290

Full Changelog: v0.16.8...v0.16.9

Contributors

tjruwase, stas00, and 7 other contributors

Assets 2

19 May 16:16

loadams

v0.16.8

f459502

v0.16.8 Patch Release

What's Changed

Update version.txt after 0.16.7 release by @loadams in #7232
Recommend using latest by @tohtana in #7233
[NFC] Fix comment related to SP group by @c8ef in #7234
Add cpu accelerator fp16 dtype support by @Yejing-Lai in #7207
Update CPU torch version to 2.7 by @loadams in #7241
Update README.md by @jizhang02 in #7246
Fix compile error for nv_bloat162 by @loscrossos in #7248
add Makefile to ease maintenance by @stas00 in #7267
Fix fp8 gemm by @RezaYazdaniAminabadi in #7265
[XPU] update xpu-max1100 CI workflow to torch 2.7 by @Liangliang-Ma in #7284
Fix issues XPU tests hit with extra-index-url by @loadams in #7291
Temporarily skip AIO tests due to an issue with runners by @loadams in #7288
rollback #6726 by @delock in #7258

New Contributors

@jizhang02 made their first contribution in #7246
@loscrossos made their first contribution in #7248

Full Changelog: v0.16.7...v0.16.8

Contributors

stas00, Liangliang-Ma, and 8 other contributors

Assets 2

18 Apr 15:36

loadams

v0.16.7

c66fdaf

v0.16.7 Patch Release

What's Changed

Update version.txt after 0.16.6 release by @loadams in #7218
Fix release links by @tjruwase in #7219
Fix pass for z3 and profiler by @tohtana in #7222
Fix build on AMD GPUs (related to DeepCompile) by @HollowMan6 in #7224
Add defence for DeepCompile w/o optimizer by @HollowMan6 in #7225
Pass with_cuda arg for jit_load in OpBuilder by @HollowMan6 in #7226
Make sure it's not None before offloading contiguous_grad_buffer by @HollowMan6 in #7227

Full Changelog: v0.16.6...v0.16.7

Contributors

tjruwase, HollowMan6, and 2 other contributors

Assets 2

Releases: deepspeedai/DeepSpeed

v0.17.6 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.17.5 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.17.4 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.17.3 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.17.2 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.17.1 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

DeepSpeed v0.17.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.9 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.8 Patch Release

What's Changed

New Contributors

Contributors

Uh oh!

v0.16.7 Patch Release

What's Changed

Contributors

Uh oh!