Releases · deepspeedai/DeepSpeed

02 Jun 22:14

jeffra

v0.9.3

4559aa9

v0.9.3: Patch release

What's Changed

Enable auto TP policy for llama model by @jianan-gu in #3170
Allow users to use mis-matched CUDA versions by @mrwyattii in #3436
Hybrid Engine Refactor and Llama Inference Support by @cmikeh2 in #3425
add sharded checkpoint loading for AutoTP path to reduce the peak mem… by @sywangyi in #3102
launcher/multinode_runner.py: mapping env variables by @YizhouZ in #3372
Update automatic-tensor-parallelism.md by @sywangyi in #3198
Build: Update license in setup by @PabloEmidio in #3484
Doc corrections by @goodship1 in #3435
Fix spelling errors in comments and documents by @digger-yu in #3486
Fix spelling error in function GetMaxTokenLength() by @luliyucoordinate in #3482
Fix a type error on bf16+Pipeline Parallelism by @ys950902 in #3441
Fix spelling errors in DeepSpeed codebase by @digger-yu in #3494
fix spelling error with docs/index.md by @digger-yu in #3443
delete the line to keep user_zero_stages by @MrZhengXin in #3473
Update Inference Engine checkpoint loading + meta tensor assertions by @lekurile in #2940
fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP by @sywangyi in #3457
Add snip_momentum structured pruning which supports higher sparse ratio by @ftian1 in #3300
Update README.md by @goodship1 in #3504
Hybrid Engine Fix Llama by @lekurile in #3505
fix spelling error with deepspeed/runtime/ by @digger-yu in #3509
Skip autoTP if tp_size is 1 by @molly-smith in #3449
Changing monitor loss to aggregate loss over gradient accumulation steps by @jomayeri in #3428
change actions/checkout@v2 to v3 by @digger-yu in #3526
fix typo with docs/ by @digger-yu in #3523
Doc updates by @goodship1 in #3520
Fix bug in Hybrid Engine by @mrwyattii in #3497
Fix wrong passing of offload_optimizer_config to DeepSpeedZeRoOffload by @mmhab in #3420
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 by @YizhouZ in #2999
share inflight registry between PartitionedParameterCoordinators by @HeyangQin in #3462
Syncing FusedAdam with new Apex features by @jomayeri in #3434
fix typo in comments with deepspeed/ by @digger-yu in #3537
[ROCm] Hip headers fix by @rraminen in #3532
[CPU] Support Intel CPU inference by @delock in #3041
Clone tensors to avoid torch.save bloat by @tjruwase in #3348
Fix attribute error when loading FusedAdamBuilder() by @rraminen in #3527
fix typo by @inkcherry in #3559
Fixing bf16 test by @jomayeri in #3551
Fix Hybrid Engine for BLOOM by @lekurile in #3580
Fix op_builder against PyTorch nightly by @malfet in #3596
data efficiency bug fix, avoid invalid range step size by @conglongli in #3609
DS init should not broadcast or move zero.Init models by @tjruwase in #3611
Expose Consecutive Hysteresis to Users by @Quentin-Anthony in #3553
Align InferenceEngine to store ms in _model_times by @HolyFalafel in #3501
AISC launcher fixes by @jeffra in #3637
stage3.py: do not scale if gradient_predivide_factor is 1.0 by @guoyejun in #3630
Add Ascend NPU accelerator support by @CurryRice233 in #3595
Skip tests on docs-only changes by @mrwyattii in #3651
Update megatron.md by @wjessup in #3641
Typo Correction by @MicahZoltu in #3621
deepspeed/comm/comm.py: fix typo of warning message by @guoyejun in #3636
Fix RuntimeError when using ZeRO Stage3 with mpu: #3564 by @eggiter in #3565
Allow dict datatype for checkpoints (inference) by @mrwyattii in #3007
fix typo with deepspeed/ by @digger-yu in #3547
flops_profiler: add option recompute_fwd_factor for the case of activation c… by @guoyejun in #3362
fix typo deepspeed/runtime by @digger-yu in #3663
Refactor check_enabled root validator in DeepSpeedMonitorConfig by @bgr8 in #3616

New Contributors

@jianan-gu made their first contribution in #3170
@YizhouZ made their first contribution in #3372
@PabloEmidio made their first contribution in #3484
@luliyucoordinate made their first contribution in #3482
@ys950902 made their first contribution in #3441
@MrZhengXin made their first contribution in #3473
@ftian1 made their first contribution in #3300
@mmhab made their first contribution in #3420
@malfet made their first contribution in #3596
@HolyFalafel made their first contribution in #3501
@CurryRice233 made their first contribution in #3595
@wjessup made their first contribution in #3641
@MicahZoltu made their first contribution in #3621
@eggiter made their first contribution in #3565
@bgr8 made their first contribution in #3616

Full Changelog: v0.9.2...v0.9.3

Contributors

wjessup, jeffra, and 30 other contributors

Assets 2

03 May 17:33

jeffra

v0.9.2

e0e8085

v0.9.2: Patch release

What's Changed

MiCS implementation by @zarzen in #2964
Fix formatting by @mrwyattii in #3343
[ROCm] Hipify cooperative_groups headers by @rraminen in #3323
Diffusers 0.15.0 bug fix by @molly-smith in #3345
Print default values for DeepSpeed --help by @mrwyattii in #3347
add bf16 cuda kernel support by @dc3671 in #3092
README.md: Update MosaicML docs link by @kobindra in #3344
hybrid_engine: check tuple size when fusing lora params by @adammoody in #3311
fix mpich launcher issue in multi-node by @sywangyi in #3078
Update DS-Chat issue template by @mrwyattii in #3368
add deepspeed chat blog links, add tags by @conglongli in #3369
Fix redundant shared_params in zero_to_fp32.py by @ShijieZZZZ in #3149
fixing default communication_data_type for bfloat16_enabled and docs by @clumsy in #3370
Auto TP Tutorial with T5 Example by @molly-smith in #2962
stage_1_and_2.py: do gradient scale only for fp16 by @guoyejun in #3166
Fix memory leak in zero2 contiguous gradients by @hablb in #3306
remove megatron-lm, no longer pip installable by @jeffra in #3389
Fix pipeline module evaluation when contiguous activation checkpoin… by @hablb in #3005
doc updates by @goodship1 in #3415
Save tensors in context of memory_efficient_linear by @tohtana in #3413
Add HE support for the rest of model containers by @RezaYazdaniAminabadi in #3191
Update PyTorch Lightning/DeepSpeed examples links by @loadams in #3424
Fix PipelineEngine.eval_batch result by @nrailgun in #3316
OPT Activation Function Hotfix by @cmikeh2 in #3400
Add ZeRO 1 support to PP for BF16. by @jomayeri in #3399
[zero_to_fp32] fix shared param recovery by @stas00 in #3407
Adagrad support in ZeRO by @jomayeri in #3401
Update 2020-09-09-sparse-attention.md by @goodship1 in #3432

New Contributors

@dc3671 made their first contribution in #3092
@kobindra made their first contribution in #3344
@hablb made their first contribution in #3306
@nrailgun made their first contribution in #3316

Full Changelog: v0.9.1...v0.9.2

Contributors

kobindra, clumsy, and 20 other contributors

Assets 2

21 Apr 00:49

jeffra

v0.9.1

793c23e

v0.9.1: Patch release

What's Changed

Update DS-Chat docs for v0.9.0 by @mrwyattii in #3216
Update DeepSpeed-Chat docs with latest changes to scripts by @mrwyattii in #3219
Nested zero.Init() and dynamically defined model class by @tohtana in #2989
Update torch version check in building sparse_attn by @loadams in #3152
Fix for Stable Diffusion by @mrwyattii in #3218
[update] reference in cifar-10 by @dtunai in #3212
[fp16/doc] correct initial_scale_power default value by @stas00 in #3275
update link to PL docs by @Borda in #3237
fix typo in autotuner.py by @eltociear in #3269
improving int4 asymmetric quantization accuracy by @HeyangQin in #3190
Update install.sh by @digger-yu in #3270
Fix cupy install version detection by @mrwyattii in #3276
[ROCm] temporary workaround till __double2half support enabled in HIP by @bmedishe in #3236
Fix pydantic and autodoc_pydantic version to <2.0.0 until support is added. by @loadams in #3290
Add contribution images to readme by @digger-yu in #3282
remove torch.cuda.is_available() check when compiling ops by @jinzhen-lin in #3085
Update MI200 workflow to install apex with changes from pip by @loadams in #3294
Add pre-compiling ops test by @loadams in #3277
Update README.md by @digger-yu in #3315
Update Dockerfile to use python 3.6 specifically by @bobowwb in #3298
zero3 checkpoint frozen params by @tjruwase in #3205
Fix for dist not being initialized when constructing main config by @mrwyattii in #3324
Fix missing scale attributes for GPTJ by @cmikeh2 in #3256
Explicitly check for OPT activation function by @cmikeh2 in #3278

New Contributors

@dtunai made their first contribution in #3212
@Borda made their first contribution in #3237
@digger-yu made their first contribution in #3270
@bmedishe made their first contribution in #3236
@jinzhen-lin made their first contribution in #3085
@bobowwb made their first contribution in #3298

Full Changelog: v0.9.0...v0.9.1

Contributors

cmikeh2, tjruwase, and 11 other contributors

Assets 2

13 Apr 15:33

jeffra

v0.9.0

0b5252b

DeepSpeed v0.9.0

New features

🚀 DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales 🚀

What's Changed

[docs] add MCR-DL paper to readme/docs by @Quentin-Anthony in #3066
Several fixes to unblock CI by @loadams in #3047
Assert mp_size is factor of model dimensions by @molly-smith in #2891
[CI] follow-up fixes by @jeffra in #3072
fix return prev key and value , added strides to from_blob by @mzusman in #2828
Remove bf16 from inference config dtye enum by @molly-smith in #3010
Softmax Scheduling Cleanup by @cmikeh2 in #3046
Fix nebula in save_16bit_model issue by @FreyaRao in #3023
Allow lists by @satpalsr in #3042
Goodbye Torch 1.8 by @mrwyattii in #3082
Empty ZeRO3 partition cache by @tjruwase in #3060
pre-commit check for torch.cuda in code by @delock in #2981
Move cuda check into utils by @loadams in #3074
update yapf version and style settings by @jeffra in #3098
Fix comms benchmark import issues and support MPI/slurm launching by @Quentin-Anthony in #2932
Disable Stage 1&2 CPUAdam pathways by @mrwyattii in #3097
♻️ replace deprecated functions for communication by @mayank31398 in #2995
Make fp32 default communication data type by @tjruwase in #2970
Update DeepSpeed copyright license to Apache 2.0 by @mrwyattii in #3111
Add Full Apache License by @mrwyattii in #3119
VL MoE Blog by @yaozhewei in #3120
Update SD triton version in requirements-sd.txt by @lekurile in #3135
Fix launch issue by @tjruwase in #3137
Fix CI badges by @mrwyattii in #3138
Optimize Softmax Kernel by @molly-smith in #3112
Use generic O_DIRECT by @tjruwase in #3115
Enable autoTP for bloom by @sywangyi in #3035
[cleanup] remove pass calls where they aren't needed by @stas00 in #2826
[ci] nv-transformers-v100 - use the same torch version as transformers CI by @stas00 in #3096
Fixes code and tests skipping/asserting incorrectly on torch 2+. by @loadams in #3136
fix example symlink about DeepSpeed+AzureML by @EeyoreLee in #3127
Remove Extra Bracket by @VHellendoorn in #3101
Recover shared parameters by @ShijieZZZZ in #3033
Fix for Diffusers 0.14.0 by @molly-smith in #3142
Fix copyright check, add copyright replace script by @mrwyattii in #3141
Update curriculum-learning.md by @goodship1 in #3031
Remove benchmark code by @mrwyattii in #3157
fixing a bug in CPU Adam and Adagrad by @xiexbing in #3109
op_builder: conditionally compute relative path for hip compiled files by @adammoody in #3095
zero.Init() should pin params in GPU memory as requested by @tjruwase in #2953
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache by @guoyejun in #2803
Add DeepSpeed-Chat Blogpost by @awan-10 in #3185
[docs] add run command for 13b by @awan-10 in #3187
add news item. by @awan-10 in #3188
DeepSpeed Chat by @tjruwase in #3186
Fix references to figures by @tohtana in #3189
Fix typo by @zhouzaida in #3183
Fix typo by @dawei-wang in #3164
Chatgpt chinese blog by @yaozhewei in #3193
Add Japanese version of ChatGPT-like pipeline blog by @tohtana in #3194
fix hero figure by @conglongli in #3199
feat: Add support for NamedTuple when sharding parameters [#3029] by @alexandervaneck in #3037
fix license badge by @conglongli in #3200
Update AMD workflows by @loadams in #3179
[CPU support] Optionally bind each rank to different cores on host by @delock in #2881

New Contributors

@mzusman made their first contribution in #2828
@FreyaRao made their first contribution in #3023
@sywangyi made their first contribution in #3035
@EeyoreLee made their first contribution in #3127
@VHellendoorn made their first contribution in #3101
@goodship1 made their first contribution in #3031
@zhouzaida made their first contribution in #3183
@dawei-wang made their first contribution in #3164
@alexandervaneck made their first contribution in #3037

Full Changelog: v0.8.3...v0.9.0

Contributors

jeffra, adammoody, and 27 other contributors

Assets 2

20 Mar 18:00

jeffra

v0.8.3

f1e4fb0

v0.8.3: Patch release

What's Changed

[deepspeed/autotuner] Bug fix for skipping mbs on gas by @rahilbathwal5 in #2171
Fix issue between our abstract accelerator and colossalai's version of op_builder by @jeffra in #2963
[zero] prevent poor configs from running w. zero-offload by @jeffra in #2971
Fix Meta Tensor checkpoint load for OPT models by @lekurile in #2990
ckpt: create directories in checkpoint_engine by @adammoody in #2988
Fix buffer size for pipeline parallel and communication schedule by @tohtana in #2862
[docs] add new paper to readme/docs by @jeffra in #3018
fix language by @stas00 in #3019
BF Optimizer Attribute Checks by @jomayeri in #3022
[logger] implement logger.warning_once by @stas00 in #3021
Convert model parameters from generator to list. by @jomayeri in #3017
Improve loss overflow logs by @Quentin-Anthony in #3008
Fix Broken Links by @satpalsr in #3048

New Contributors

@satpalsr made their first contribution in #3048

Full Changelog: v0.8.2...v0.8.3

Contributors

jeffra, adammoody, and 7 other contributors

Assets 2

07 Mar 18:19

jeffra

v0.8.2

db15ef5

v0.8.2: Patch release

What's Changed

add auto-generated PR workflow by @mrwyattii in #2822
Fix typo in auto-sync workflow by @mrwyattii in #2850
Fix example command for building wheel with dev version specified. by @loadams in #2815
Create tensor parallelism blog/tutorial by @molly-smith in #2766
Data efficiency library update by @conglongli in #2866
Make z3 respect comm dtype by @tjruwase in #2807
Automatic Tensor Parallelism Blog Links by @molly-smith in #2877
Check device count before running dist tests by @HeyangQin in #2799
AutoTP tutorial web formatting and news by @molly-smith in #2883
Remove deprecated torch._six imports by @yasyf in #2863
Reduce I/O size by @tjruwase in #2814
add missing license info to top of all source code by @jeffra in #2889
Enable tensor fragments for zero 2 & 3 by @tjruwase in #2727
better eval sampler for val or test dataset by @mayank31398 in #2907
using container when loading inference checkpoints by @HeyangQin in #2875
Fix CPUAdam for when vendor_id_raw is not provided by @FarzanT in #2836
Fix Bloom logits mismatch by @molly-smith in #2851
Fixes AttributeError in #2853 by @saforem2 in #2854
Add MPICH Multinode Runner by @inkcherry in #2839
TP unsupported models and assertions by @molly-smith in #2810
AutoTP Assert Kernel Injection Support by @molly-smith in #2939
Check for local CUDA graphs when enable_cuda_graph=True by @lekurile in #2941
Improve overflow handling by @tjruwase in #2944
[RFC] add device abstraction to allow other device than CUDA be used by @delock in #2221
deepspeed.init_distributed() support for TCP protocols by @noabauma in #2905

New Contributors

@HeyangQin made their first contribution in #2799
@yasyf made their first contribution in #2863
@mayank31398 made their first contribution in #2907
@FarzanT made their first contribution in #2836
@saforem2 made their first contribution in #2854
@noabauma made their first contribution in #2905

Full Changelog: v0.8.1...v0.8.2

Contributors

jeffra, yasyf, and 13 other contributors

Assets 2

17 Feb 22:11

jeffra

v0.8.1

6c85fe6

v0.8.1: Patch release

What's Changed

CUDA optional deepspeed ops by @tjruwase in #2507
Remove CI trigger for push to master by @mrwyattii in #2712
[install] only add deepspeed pkg at install by @jeffra in #2714
Fix nightly tests for new lm-eval release by @mrwyattii in #2713
BF16 optimizer for BF16+ZeRO Stage 1 by @jomayeri in #2706
Fix typo in diffusers transformer block by @mrwyattii in #2718
Inference Refactor (replace_with_policy, model_implementations) by @awan-10 in #2554
Change zero_grad() argument to match pytorch by @loadams in #2741
Automatic tensor parallelism v2 by @molly-smith in #2670
Fixing Optimizer Sanity Check by @jomayeri in #2742
[GatheredParameters] fix memory leak by @stas00 in #2665
Abstract accelerator (step 3) by @delock in #2677
Fix autotuning so that it records Floating Point Operations per second, not microsecond by @dashstander in #2711
fix a misspelled attribute by @stas00 in #2750
[zero] remove misleading dtype log by @jeffra in #2732
Fix softmax backward by @RezaYazdaniAminabadi in #2709
Skip test_bias_gelu unit test if torch < 1.12 by @lekurile in #2754
Conditionally Make Op Building More Verbose by @cmikeh2 in #2759
Bing/formatting correction by @xiexbing in #2764
Add links to new azureML examples by @cassieesvelt in #2756
Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. by @loadams in #2743
Refactor/Pydantify monitoring config by @mrwyattii in #2640
Pin minimum packaging requirement by @carmocca in #2771
Fix for diffusers v0.12.0 by @mrwyattii in #2753
some fix in flops_profiler by @lucasleesw in #2068
fix upsample flops compute by skipping unused kargs by @cli99 in #2773
Fix broken kernel inject bug by @molly-smith in #2776
Fix Checkpoint-loading with Meta-tensor by @RezaYazdaniAminabadi in #2781
Add hjson support for user configs by @mrwyattii in #2783
Reset KV-cache at the beginning of text-generation by @RezaYazdaniAminabadi in #2669
Container param cleanup + remove qkv_merging by @lekurile in #2780
Common location to install libaio-dev by @tjruwase in #2779
Fixing broken link to azureml-examples recipes by @rtanase in #2795
remove outdated comment by @stas00 in #2786
Enable page-locked tensors without CUDA by @tjruwase in #2775
Add container load checkpoint error reporting + refactor by @lekurile in #2792
Add user defined launcher args for PDSH launcher by @loadams in #2804
Fix Slurm launcher user args by @loadams in #2806
Handle hanged tests in CI by @mrwyattii in #2808
Fix inference CI device error by @mrwyattii in #2824
Fix permissions issue with pip upgrade by @mrwyattii in #2823
Fix cpu-only CI hangs by @mrwyattii in #2825
Fix Pipeline Parallel resize unit test by @mrwyattii in #2833
Fix auto TP for duplicate modules with different gems by @molly-smith in #2784
Refactor DS inference API. No longer need replace_method. by @awan-10 in #2831
Port Reza's INT8-quantization fix to container architecture by @lekurile in #2725
Fix gpt-Neox rotary embedding implementation by @RezaYazdaniAminabadi in #2782
Fix for CI failure on system upgrade by @mrwyattii in #2849

New Contributors

@loadams made their first contribution in #2741
@xiexbing made their first contribution in #2764
@carmocca made their first contribution in #2771
@lucasleesw made their first contribution in #2068
@rtanase made their first contribution in #2795

Full Changelog: v0.8.0...v0.8.1

Contributors

jeffra, rtanase, and 17 other contributors

Assets 2

17 Jan 18:46

jeffra

v0.8.0

bf6b980

DeepSpeed v0.8.0

New features

DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality
DeepSpeed Data Efficiency Library by @conglongli in #2585

What's Changed

fix blog link by @conglongli in #2600
Migrate ops tests to new inference_ops marker by @cmikeh2 in #2599
Move layer norm to new schedule by @lokoppakmsft in #2590
[deepspeed/autotuner] Bug fix for binary search for batch size by @rahilbathwal5 in #2162
Fix for older versions of pydantic by @mrwyattii in #2611
Use rocm/pytorch:latest for ROCm Dockerfile by @jithunnair-amd in #2613
skip torch.zeros and tensor.copy_ when model parallel is not used by @guoyejun in #2479
call empty_cache to really free up GPU memory as described in comment by @guoyejun in #2620
Remove GatheredParameters context from replace_with_policy by @lekurile in #2591
fixes #2498 by @clumsy in #2603
Update AVX512 Detection by @cmikeh2 in #2621
Add Megatron CI workflow by @mrwyattii in #2614
[inference] check for unsupported model generate args by @jeffra in #2627
[launcher] parse hostfile via regex and added error checks by @jeffra in #2626
Unit tests setup own venv by @mrwyattii in #2628
Fix #2409: add enable_each_rank_log to deepspeed/launcher/runner.py by @inkcherry in #2571
Fix typo in autotuner.py by @eltociear in #2639
[zero-3] Handle forward parameter return correctly in nested cases by @samyam in #2642
[inference] ds-attention refactor w.r.t. ops by @jeffra in #2623
Fix issue w. bloom int8 when changing tp size by @jeffra in #2645
fix assertion error in zero stage 3 by @GuanhuaWang in #2647
tweaks to ds-attn, distilbert policy, and mup by @jeffra in #2649
[doc] fix min_loss_scale default by @stas00 in #2660
[launcher] fail gracefully if hostname -i doesn't work as expected by @jeffra in #2631
Fix Opt injection by @RezaYazdaniAminabadi in #2541
Abstract accelerator (step 2) by @delock in #2560
Remove unnecessary device synchronization for stage 2 by @li-yi-dong in #2500
[Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by @wkcn in #2661
[fp16] lower initial_scale_power to 16 by @stas00 in #2663
fix Tensor contiguous bug in model_compression by @xiaoxiawu-microsoft in #2671
[inference] ds-mlp refactor w.r.t. ops by @jeffra in #2668
real_accelerator validation check for both accelerator and deepspeed accelerator path by @delock in #2685
fix typo and remove duplicated code in ZeRO stage 1 and 2 by @wkcn in #2655
Add mlflow logging for aml by @cassieesvelt in #2495
Fix import error of op_builder by @tohtana in #2687
Pass training flag to forward call from module config by @lokoppakmsft in #2604
Extend quantization utils features by @lokoppakmsft in #2683
[GatheredParameters] add support for any iterable by @stas00 in #2664
Fix for latest diffusers by @mrwyattii in #2699
exclude benchmarks during install by @jeffra in #2698
Correct loss scale in ZeRO step by @jomayeri in #2695
[ZeRO] non-MoE stage 1 requires CG disabled by @jeffra in #2703
remove print side effect from importing deepspeed by @jeffra in #2704
ZeRO3 handling frozen weights by @tjruwase in #2653

New Contributors

@eltociear made their first contribution in #2639
@li-yi-dong made their first contribution in #2500
@wkcn made their first contribution in #2661
@xiaoxiawu-microsoft made their first contribution in #2671
@cassieesvelt made their first contribution in #2495
@tohtana made their first contribution in #2687

Full Changelog: v0.7.7...v0.8.0

Contributors

clumsy, jeffra, and 22 other contributors

Assets 2

12 Dec 20:52

jeffra

v0.7.7

2076bf2

v0.7.7: Patch release

What's Changed

Update the locator for Megatron-LM by @rapsealk in #2564
use get_global_rank if available by @jeffra in #2567
Add Determined to open-source DL frameworks by @sirredbeard in #2573
Support fp32 gradaccum for bf16 model by @delock in #2566
Drop Maxwell Support by @cmikeh2 in #2574
Fix quantized-inference & Add generic support of checkpoint loading by @RezaYazdaniAminabadi in #2547
Fix MegatronLayerPolicy to have megatron_v2=True by @lekurile in #2579
Update barrier and reduce_scatter_base to conform to PyTorch signatures by @Quentin-Anthony in #2570
Support N-dimension input in quantization kernel by @lokoppakmsft in #2575
Add checkpoint sharding unit tests by @mrwyattii in #2561
Updating docs README by @jomayeri in #2587
Updating API docs by @jomayeri in #2586
Fix issues w. python 3.6 + add py-version checks to CI by @jeffra in #2589
[benchmarks] get mask token from tokenizer by @jeffra in #2592

New Contributors

@rapsealk made their first contribution in #2564
@sirredbeard made their first contribution in #2573

Full Changelog: v0.7.6...v0.7.7

Contributors

jeffra, cmikeh2, and 9 other contributors

Assets 2

01 Dec 20:25

jeffra

v0.7.6

aeda7f9

v0.7.6: Patch release

What's Changed

DeepSpeed inference config. (#2459) by @awan-10 in #2472
Update docs to autogenerate pydantic config model docs by @mrwyattii in #2509
Add max_tokens alias to max_out_tokens arg to maintain backwards compatibility by @lekurile in #2508
Deepspeed quantization library v0.1 by @lokoppakmsft in #2450
Fix backward compatibility for InferenceConfig by @mrwyattii in #2516
Add missing Inference sub-configs by @mrwyattii in #2518
Add note about nvcc/hipcc requirement by @jeffra in #2519
Update codeowners by @jeffra in #2525
Dequantization Utils Library by @cmikeh2 in #2521
Fixes for torch 1.14 due to new torch.numel return type by @jeffra in #2522
Ensure MOE is initialized for SD by @cmikeh2 in #2534
Make DS-Inference config readable from JSON by @mrwyattii in #2537
Add MII tests by @mrwyattii in #2533
Remove mutable default parameter in init_inference() by @aphedges in #2540
Change Where DS/Triton is Used in Stable Diffusion by @cmikeh2 in #2536
Pass down the new DS inference config to replace_transformer_layer. by @awan-10 in #2539
Adding Gradient Accumulation Data Type Config by @jomayeri in #2512
Report progress at gradient accumulation boundary by @ShijieZZZZ in #2553
encoded ds config into command line argument when launching child processes in autotuning by @cli99 in #2524
Add missing MoE fields to inference config for backward compatibility by @mrwyattii in #2556
Abstract accelerator (step 1) by @delock in #2504
Fix invalid check of recorded parameter orders in zero stage3. by @inkcherry in #2550

New Contributors

@ShijieZZZZ made their first contribution in #2553
@delock made their first contribution in #2504
@inkcherry made their first contribution in #2550

Full Changelog: v0.7.5...v0.7.6

Contributors

jeffra, cmikeh2, and 10 other contributors

Assets 2

Releases: deepspeedai/DeepSpeed

v0.9.3: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

v0.9.2: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

v0.9.1: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

DeepSpeed v0.9.0

New features

What's Changed

New Contributors

Contributors

Uh oh!

v0.8.3: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

v0.8.2: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

v0.8.1: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

DeepSpeed v0.8.0

New features

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.7: Patch release

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.6: Patch release

What's Changed

New Contributors

Contributors

Uh oh!