Releases: deepspeedai/DeepSpeed
Releases · deepspeedai/DeepSpeed
v0.9.3: Patch release
What's Changed
- Enable auto TP policy for llama model by @jianan-gu in #3170
- Allow users to use mis-matched CUDA versions by @mrwyattii in #3436
- Hybrid Engine Refactor and Llama Inference Support by @cmikeh2 in #3425
- add sharded checkpoint loading for AutoTP path to reduce the peak mem… by @sywangyi in #3102
- launcher/multinode_runner.py: mapping env variables by @YizhouZ in #3372
- Update automatic-tensor-parallelism.md by @sywangyi in #3198
- Build: Update license in setup by @PabloEmidio in #3484
- Doc corrections by @goodship1 in #3435
- Fix spelling errors in comments and documents by @digger-yu in #3486
- Fix spelling error in function GetMaxTokenLength() by @luliyucoordinate in #3482
- Fix a type error on bf16+Pipeline Parallelism by @ys950902 in #3441
- Fix spelling errors in DeepSpeed codebase by @digger-yu in #3494
- fix spelling error with docs/index.md by @digger-yu in #3443
- delete the line to keep user_zero_stages by @MrZhengXin in #3473
- Update Inference Engine checkpoint loading + meta tensor assertions by @lekurile in #2940
- fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP by @sywangyi in #3457
- Add snip_momentum structured pruning which supports higher sparse ratio by @ftian1 in #3300
- Update README.md by @goodship1 in #3504
- Hybrid Engine Fix Llama by @lekurile in #3505
- fix spelling error with deepspeed/runtime/ by @digger-yu in #3509
- Skip autoTP if tp_size is 1 by @molly-smith in #3449
- Changing monitor loss to aggregate loss over gradient accumulation steps by @jomayeri in #3428
- change actions/checkout@v2 to v3 by @digger-yu in #3526
- fix typo with docs/ by @digger-yu in #3523
- Doc updates by @goodship1 in #3520
- Fix bug in Hybrid Engine by @mrwyattii in #3497
- Fix wrong passing of offload_optimizer_config to DeepSpeedZeRoOffload by @mmhab in #3420
- Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 by @YizhouZ in #2999
- share inflight registry between PartitionedParameterCoordinators by @HeyangQin in #3462
- Syncing FusedAdam with new Apex features by @jomayeri in #3434
- fix typo in comments with deepspeed/ by @digger-yu in #3537
- [ROCm] Hip headers fix by @rraminen in #3532
- [CPU] Support Intel CPU inference by @delock in #3041
- Clone tensors to avoid torch.save bloat by @tjruwase in #3348
- Fix attribute error when loading FusedAdamBuilder() by @rraminen in #3527
- fix typo by @inkcherry in #3559
- Fixing bf16 test by @jomayeri in #3551
- Fix Hybrid Engine for BLOOM by @lekurile in #3580
- Fix op_builder against PyTorch nightly by @malfet in #3596
- data efficiency bug fix, avoid invalid range step size by @conglongli in #3609
- DS init should not broadcast or move zero.Init models by @tjruwase in #3611
- Expose Consecutive Hysteresis to Users by @Quentin-Anthony in #3553
- Align InferenceEngine to store ms in _model_times by @HolyFalafel in #3501
- AISC launcher fixes by @jeffra in #3637
- stage3.py: do not scale if gradient_predivide_factor is 1.0 by @guoyejun in #3630
- Add Ascend NPU accelerator support by @CurryRice233 in #3595
- Skip tests on docs-only changes by @mrwyattii in #3651
- Update megatron.md by @wjessup in #3641
- Typo Correction by @MicahZoltu in #3621
- deepspeed/comm/comm.py: fix typo of warning message by @guoyejun in #3636
- Fix RuntimeError when using ZeRO Stage3 with mpu: #3564 by @eggiter in #3565
- Allow dict datatype for checkpoints (inference) by @mrwyattii in #3007
- fix typo with deepspeed/ by @digger-yu in #3547
- flops_profiler: add option recompute_fwd_factor for the case of activation c… by @guoyejun in #3362
- fix typo deepspeed/runtime by @digger-yu in #3663
- Refactor check_enabled root validator in DeepSpeedMonitorConfig by @bgr8 in #3616
New Contributors
- @jianan-gu made their first contribution in #3170
- @YizhouZ made their first contribution in #3372
- @PabloEmidio made their first contribution in #3484
- @luliyucoordinate made their first contribution in #3482
- @ys950902 made their first contribution in #3441
- @MrZhengXin made their first contribution in #3473
- @ftian1 made their first contribution in #3300
- @mmhab made their first contribution in #3420
- @malfet made their first contribution in #3596
- @HolyFalafel made their first contribution in #3501
- @CurryRice233 made their first contribution in #3595
- @wjessup made their first contribution in #3641
- @MicahZoltu made their first contribution in #3621
- @eggiter made their first contribution in #3565
- @bgr8 made their first contribution in #3616
Full Changelog: v0.9.2...v0.9.3
v0.9.2: Patch release
What's Changed
- MiCS implementation by @zarzen in #2964
- Fix formatting by @mrwyattii in #3343
- [ROCm] Hipify cooperative_groups headers by @rraminen in #3323
- Diffusers 0.15.0 bug fix by @molly-smith in #3345
- Print default values for DeepSpeed --help by @mrwyattii in #3347
- add bf16 cuda kernel support by @dc3671 in #3092
- README.md: Update MosaicML docs link by @kobindra in #3344
- hybrid_engine: check tuple size when fusing lora params by @adammoody in #3311
- fix mpich launcher issue in multi-node by @sywangyi in #3078
- Update DS-Chat issue template by @mrwyattii in #3368
- add deepspeed chat blog links, add tags by @conglongli in #3369
- Fix redundant shared_params in zero_to_fp32.py by @ShijieZZZZ in #3149
- fixing default communication_data_type for bfloat16_enabled and docs by @clumsy in #3370
- Auto TP Tutorial with T5 Example by @molly-smith in #2962
- stage_1_and_2.py: do gradient scale only for fp16 by @guoyejun in #3166
- Fix memory leak in zero2 contiguous gradients by @hablb in #3306
- remove megatron-lm, no longer pip installable by @jeffra in #3389
- Fix pipeline module evaluation when contiguous activation checkpoin… by @hablb in #3005
- doc updates by @goodship1 in #3415
- Save tensors in context of memory_efficient_linear by @tohtana in #3413
- Add HE support for the rest of model containers by @RezaYazdaniAminabadi in #3191
- Update PyTorch Lightning/DeepSpeed examples links by @loadams in #3424
- Fix
PipelineEngine.eval_batch
result by @nrailgun in #3316 - OPT Activation Function Hotfix by @cmikeh2 in #3400
- Add ZeRO 1 support to PP for BF16. by @jomayeri in #3399
- [zero_to_fp32] fix shared param recovery by @stas00 in #3407
- Adagrad support in ZeRO by @jomayeri in #3401
- Update 2020-09-09-sparse-attention.md by @goodship1 in #3432
New Contributors
- @dc3671 made their first contribution in #3092
- @kobindra made their first contribution in #3344
- @hablb made their first contribution in #3306
- @nrailgun made their first contribution in #3316
Full Changelog: v0.9.1...v0.9.2
v0.9.1: Patch release
What's Changed
- Update DS-Chat docs for v0.9.0 by @mrwyattii in #3216
- Update DeepSpeed-Chat docs with latest changes to scripts by @mrwyattii in #3219
- Nested zero.Init() and dynamically defined model class by @tohtana in #2989
- Update torch version check in building sparse_attn by @loadams in #3152
- Fix for Stable Diffusion by @mrwyattii in #3218
- [update] reference in cifar-10 by @dtunai in #3212
- [fp16/doc] correct initial_scale_power default value by @stas00 in #3275
- update link to PL docs by @Borda in #3237
- fix typo in autotuner.py by @eltociear in #3269
- improving int4 asymmetric quantization accuracy by @HeyangQin in #3190
- Update install.sh by @digger-yu in #3270
- Fix cupy install version detection by @mrwyattii in #3276
- [ROCm] temporary workaround till __double2half support enabled in HIP by @bmedishe in #3236
- Fix pydantic and autodoc_pydantic version to <2.0.0 until support is added. by @loadams in #3290
- Add contribution images to readme by @digger-yu in #3282
- remove
torch.cuda.is_available()
check when compiling ops by @jinzhen-lin in #3085 - Update MI200 workflow to install apex with changes from pip by @loadams in #3294
- Add pre-compiling ops test by @loadams in #3277
- Update README.md by @digger-yu in #3315
- Update Dockerfile to use python 3.6 specifically by @bobowwb in #3298
- zero3 checkpoint frozen params by @tjruwase in #3205
- Fix for dist not being initialized when constructing main config by @mrwyattii in #3324
- Fix missing scale attributes for GPTJ by @cmikeh2 in #3256
- Explicitly check for OPT activation function by @cmikeh2 in #3278
New Contributors
- @dtunai made their first contribution in #3212
- @Borda made their first contribution in #3237
- @digger-yu made their first contribution in #3270
- @bmedishe made their first contribution in #3236
- @jinzhen-lin made their first contribution in #3085
- @bobowwb made their first contribution in #3298
Full Changelog: v0.9.0...v0.9.1
DeepSpeed v0.9.0
New features
What's Changed
- [docs] add MCR-DL paper to readme/docs by @Quentin-Anthony in #3066
- Several fixes to unblock CI by @loadams in #3047
- Assert mp_size is factor of model dimensions by @molly-smith in #2891
- [CI] follow-up fixes by @jeffra in #3072
- fix return prev key and value , added strides to from_blob by @mzusman in #2828
- Remove bf16 from inference config dtye enum by @molly-smith in #3010
- Softmax Scheduling Cleanup by @cmikeh2 in #3046
- Fix nebula in save_16bit_model issue by @FreyaRao in #3023
- Allow lists by @satpalsr in #3042
- Goodbye Torch 1.8 by @mrwyattii in #3082
- Empty ZeRO3 partition cache by @tjruwase in #3060
- pre-commit check for torch.cuda in code by @delock in #2981
- Move cuda check into utils by @loadams in #3074
- update yapf version and style settings by @jeffra in #3098
- Fix comms benchmark import issues and support MPI/slurm launching by @Quentin-Anthony in #2932
- Disable Stage 1&2 CPUAdam pathways by @mrwyattii in #3097
- ♻️ replace deprecated functions for communication by @mayank31398 in #2995
- Make fp32 default communication data type by @tjruwase in #2970
- Update DeepSpeed copyright license to Apache 2.0 by @mrwyattii in #3111
- Add Full Apache License by @mrwyattii in #3119
- VL MoE Blog by @yaozhewei in #3120
- Update SD triton version in requirements-sd.txt by @lekurile in #3135
- Fix launch issue by @tjruwase in #3137
- Fix CI badges by @mrwyattii in #3138
- Optimize Softmax Kernel by @molly-smith in #3112
- Use generic O_DIRECT by @tjruwase in #3115
- Enable autoTP for bloom by @sywangyi in #3035
- [cleanup] remove
pass
calls where they aren't needed by @stas00 in #2826 - [ci]
nv-transformers-v100
- use the same torch version as transformers CI by @stas00 in #3096 - Fixes code and tests skipping/asserting incorrectly on torch 2+. by @loadams in #3136
- fix example symlink about DeepSpeed+AzureML by @EeyoreLee in #3127
- Remove Extra Bracket by @VHellendoorn in #3101
- Recover shared parameters by @ShijieZZZZ in #3033
- Fix for Diffusers 0.14.0 by @molly-smith in #3142
- Fix copyright check, add copyright replace script by @mrwyattii in #3141
- Update curriculum-learning.md by @goodship1 in #3031
- Remove benchmark code by @mrwyattii in #3157
- fixing a bug in CPU Adam and Adagrad by @xiexbing in #3109
- op_builder: conditionally compute relative path for hip compiled files by @adammoody in #3095
- zero.Init() should pin params in GPU memory as requested by @tjruwase in #2953
- deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache by @guoyejun in #2803
- Add DeepSpeed-Chat Blogpost by @awan-10 in #3185
- [docs] add run command for 13b by @awan-10 in #3187
- add news item. by @awan-10 in #3188
- DeepSpeed Chat by @tjruwase in #3186
- Fix references to figures by @tohtana in #3189
- Fix typo by @zhouzaida in #3183
- Fix typo by @dawei-wang in #3164
- Chatgpt chinese blog by @yaozhewei in #3193
- Add Japanese version of ChatGPT-like pipeline blog by @tohtana in #3194
- fix hero figure by @conglongli in #3199
- feat: Add support for
NamedTuple
when sharding parameters [#3029] by @alexandervaneck in #3037 - fix license badge by @conglongli in #3200
- Update AMD workflows by @loadams in #3179
- [CPU support] Optionally bind each rank to different cores on host by @delock in #2881
New Contributors
- @mzusman made their first contribution in #2828
- @FreyaRao made their first contribution in #3023
- @sywangyi made their first contribution in #3035
- @EeyoreLee made their first contribution in #3127
- @VHellendoorn made their first contribution in #3101
- @goodship1 made their first contribution in #3031
- @zhouzaida made their first contribution in #3183
- @dawei-wang made their first contribution in #3164
- @alexandervaneck made their first contribution in #3037
Full Changelog: v0.8.3...v0.9.0
v0.8.3: Patch release
What's Changed
- [deepspeed/autotuner] Bug fix for skipping mbs on gas by @rahilbathwal5 in #2171
- Fix issue between our abstract accelerator and colossalai's version of op_builder by @jeffra in #2963
- [zero] prevent poor configs from running w. zero-offload by @jeffra in #2971
- Fix Meta Tensor checkpoint load for OPT models by @lekurile in #2990
- ckpt: create directories in checkpoint_engine by @adammoody in #2988
- Fix buffer size for pipeline parallel and communication schedule by @tohtana in #2862
- [docs] add new paper to readme/docs by @jeffra in #3018
- fix language by @stas00 in #3019
- BF Optimizer Attribute Checks by @jomayeri in #3022
- [logger] implement
logger.warning_once
by @stas00 in #3021 - Convert model parameters from generator to list. by @jomayeri in #3017
- Improve loss overflow logs by @Quentin-Anthony in #3008
- Fix Broken Links by @satpalsr in #3048
New Contributors
Full Changelog: v0.8.2...v0.8.3
v0.8.2: Patch release
What's Changed
- add auto-generated PR workflow by @mrwyattii in #2822
- Fix typo in auto-sync workflow by @mrwyattii in #2850
- Fix example command for building wheel with dev version specified. by @loadams in #2815
- Create tensor parallelism blog/tutorial by @molly-smith in #2766
- Data efficiency library update by @conglongli in #2866
- Make z3 respect comm dtype by @tjruwase in #2807
- Automatic Tensor Parallelism Blog Links by @molly-smith in #2877
- Check device count before running dist tests by @HeyangQin in #2799
- AutoTP tutorial web formatting and news by @molly-smith in #2883
- Remove deprecated
torch._six
imports by @yasyf in #2863 - Reduce I/O size by @tjruwase in #2814
- add missing license info to top of all source code by @jeffra in #2889
- Enable tensor fragments for zero 2 & 3 by @tjruwase in #2727
- better eval sampler for val or test dataset by @mayank31398 in #2907
- using container when loading inference checkpoints by @HeyangQin in #2875
- Fix CPUAdam for when
vendor_id_raw
is not provided by @FarzanT in #2836 - Fix Bloom logits mismatch by @molly-smith in #2851
- Fixes
AttributeError
in #2853 by @saforem2 in #2854 - Add MPICH Multinode Runner by @inkcherry in #2839
- TP unsupported models and assertions by @molly-smith in #2810
- AutoTP Assert Kernel Injection Support by @molly-smith in #2939
- Check for local CUDA graphs when enable_cuda_graph=True by @lekurile in #2941
- Improve overflow handling by @tjruwase in #2944
- [RFC] add device abstraction to allow other device than CUDA be used by @delock in #2221
- deepspeed.init_distributed() support for TCP protocols by @noabauma in #2905
New Contributors
- @HeyangQin made their first contribution in #2799
- @yasyf made their first contribution in #2863
- @mayank31398 made their first contribution in #2907
- @FarzanT made their first contribution in #2836
- @saforem2 made their first contribution in #2854
- @noabauma made their first contribution in #2905
Full Changelog: v0.8.1...v0.8.2
v0.8.1: Patch release
What's Changed
- CUDA optional deepspeed ops by @tjruwase in #2507
- Remove CI trigger for push to master by @mrwyattii in #2712
- [install] only add deepspeed pkg at install by @jeffra in #2714
- Fix nightly tests for new lm-eval release by @mrwyattii in #2713
- BF16 optimizer for BF16+ZeRO Stage 1 by @jomayeri in #2706
- Fix typo in diffusers transformer block by @mrwyattii in #2718
- Inference Refactor (replace_with_policy, model_implementations) by @awan-10 in #2554
- Change zero_grad() argument to match pytorch by @loadams in #2741
- Automatic tensor parallelism v2 by @molly-smith in #2670
- Fixing Optimizer Sanity Check by @jomayeri in #2742
- [GatheredParameters] fix memory leak by @stas00 in #2665
- Abstract accelerator (step 3) by @delock in #2677
- Fix autotuning so that it records Floating Point Operations per second, not microsecond by @dashstander in #2711
- fix a misspelled attribute by @stas00 in #2750
- [zero] remove misleading dtype log by @jeffra in #2732
- Fix softmax backward by @RezaYazdaniAminabadi in #2709
- Skip test_bias_gelu unit test if torch < 1.12 by @lekurile in #2754
- Conditionally Make Op Building More Verbose by @cmikeh2 in #2759
- Bing/formatting correction by @xiexbing in #2764
- Add links to new azureML examples by @cassieesvelt in #2756
- Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. by @loadams in #2743
- Refactor/Pydantify monitoring config by @mrwyattii in #2640
- Pin minimum
packaging
requirement by @carmocca in #2771 - Fix for diffusers v0.12.0 by @mrwyattii in #2753
- some fix in flops_profiler by @lucasleesw in #2068
- fix upsample flops compute by skipping unused kargs by @cli99 in #2773
- Fix broken kernel inject bug by @molly-smith in #2776
- Fix Checkpoint-loading with Meta-tensor by @RezaYazdaniAminabadi in #2781
- Add hjson support for user configs by @mrwyattii in #2783
- Reset KV-cache at the beginning of text-generation by @RezaYazdaniAminabadi in #2669
- Container param cleanup + remove qkv_merging by @lekurile in #2780
- Common location to install libaio-dev by @tjruwase in #2779
- Fixing broken link to azureml-examples recipes by @rtanase in #2795
- remove outdated comment by @stas00 in #2786
- Enable page-locked tensors without CUDA by @tjruwase in #2775
- Add container load checkpoint error reporting + refactor by @lekurile in #2792
- Add user defined launcher args for PDSH launcher by @loadams in #2804
- Fix Slurm launcher user args by @loadams in #2806
- Handle hanged tests in CI by @mrwyattii in #2808
- Fix inference CI device error by @mrwyattii in #2824
- Fix permissions issue with pip upgrade by @mrwyattii in #2823
- Fix cpu-only CI hangs by @mrwyattii in #2825
- Fix Pipeline Parallel resize unit test by @mrwyattii in #2833
- Fix auto TP for duplicate modules with different gems by @molly-smith in #2784
- Refactor DS inference API. No longer need replace_method. by @awan-10 in #2831
- Port Reza's INT8-quantization fix to container architecture by @lekurile in #2725
- Fix gpt-Neox rotary embedding implementation by @RezaYazdaniAminabadi in #2782
- Fix for CI failure on system upgrade by @mrwyattii in #2849
New Contributors
- @loadams made their first contribution in #2741
- @xiexbing made their first contribution in #2764
- @carmocca made their first contribution in #2771
- @lucasleesw made their first contribution in #2068
- @rtanase made their first contribution in #2795
Full Changelog: v0.8.0...v0.8.1
DeepSpeed v0.8.0
New features
- DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality
- DeepSpeed Data Efficiency Library by @conglongli in #2585
What's Changed
- fix blog link by @conglongli in #2600
- Migrate ops tests to new inference_ops marker by @cmikeh2 in #2599
- Move layer norm to new schedule by @lokoppakmsft in #2590
- [deepspeed/autotuner] Bug fix for binary search for batch size by @rahilbathwal5 in #2162
- Fix for older versions of pydantic by @mrwyattii in #2611
- Use rocm/pytorch:latest for ROCm Dockerfile by @jithunnair-amd in #2613
- skip torch.zeros and tensor.copy_ when model parallel is not used by @guoyejun in #2479
- call empty_cache to really free up GPU memory as described in comment by @guoyejun in #2620
- Remove GatheredParameters context from replace_with_policy by @lekurile in #2591
- fixes #2498 by @clumsy in #2603
- Update AVX512 Detection by @cmikeh2 in #2621
- Add Megatron CI workflow by @mrwyattii in #2614
- [inference] check for unsupported model generate args by @jeffra in #2627
- [launcher] parse hostfile via regex and added error checks by @jeffra in #2626
- Unit tests setup own venv by @mrwyattii in #2628
- Fix #2409: add enable_each_rank_log to deepspeed/launcher/runner.py by @inkcherry in #2571
- Fix typo in autotuner.py by @eltociear in #2639
- [zero-3] Handle forward parameter return correctly in nested cases by @samyam in #2642
- [inference] ds-attention refactor w.r.t. ops by @jeffra in #2623
- Fix issue w. bloom int8 when changing tp size by @jeffra in #2645
- fix assertion error in zero stage 3 by @GuanhuaWang in #2647
- tweaks to ds-attn, distilbert policy, and mup by @jeffra in #2649
- [doc] fix
min_loss_scale
default by @stas00 in #2660 - [launcher] fail gracefully if hostname -i doesn't work as expected by @jeffra in #2631
- Fix Opt injection by @RezaYazdaniAminabadi in #2541
- Abstract accelerator (step 2) by @delock in #2560
- Remove unnecessary device synchronization for stage 2 by @li-yi-dong in #2500
- [Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by @wkcn in #2661
- [fp16] lower
initial_scale_power
to16
by @stas00 in #2663 - fix Tensor contiguous bug in model_compression by @xiaoxiawu-microsoft in #2671
- [inference] ds-mlp refactor w.r.t. ops by @jeffra in #2668
- real_accelerator validation check for both accelerator and deepspeed accelerator path by @delock in #2685
- fix typo and remove duplicated code in ZeRO stage 1 and 2 by @wkcn in #2655
- Add mlflow logging for aml by @cassieesvelt in #2495
- Fix import error of op_builder by @tohtana in #2687
- Pass training flag to forward call from module config by @lokoppakmsft in #2604
- Extend quantization utils features by @lokoppakmsft in #2683
- [GatheredParameters] add support for any iterable by @stas00 in #2664
- Fix for latest diffusers by @mrwyattii in #2699
- exclude benchmarks during install by @jeffra in #2698
- Correct loss scale in ZeRO step by @jomayeri in #2695
- [ZeRO] non-MoE stage 1 requires CG disabled by @jeffra in #2703
- remove print side effect from importing deepspeed by @jeffra in #2704
- ZeRO3 handling frozen weights by @tjruwase in #2653
New Contributors
- @eltociear made their first contribution in #2639
- @li-yi-dong made their first contribution in #2500
- @wkcn made their first contribution in #2661
- @xiaoxiawu-microsoft made their first contribution in #2671
- @cassieesvelt made their first contribution in #2495
- @tohtana made their first contribution in #2687
Full Changelog: v0.7.7...v0.8.0
v0.7.7: Patch release
What's Changed
- Update the locator for Megatron-LM by @rapsealk in #2564
- use get_global_rank if available by @jeffra in #2567
- Add Determined to open-source DL frameworks by @sirredbeard in #2573
- Support fp32 gradaccum for bf16 model by @delock in #2566
- Drop Maxwell Support by @cmikeh2 in #2574
- Fix quantized-inference & Add generic support of checkpoint loading by @RezaYazdaniAminabadi in #2547
- Fix MegatronLayerPolicy to have megatron_v2=True by @lekurile in #2579
- Update barrier and reduce_scatter_base to conform to PyTorch signatures by @Quentin-Anthony in #2570
- Support N-dimension input in quantization kernel by @lokoppakmsft in #2575
- Add checkpoint sharding unit tests by @mrwyattii in #2561
- Updating docs README by @jomayeri in #2587
- Updating API docs by @jomayeri in #2586
- Fix issues w. python 3.6 + add py-version checks to CI by @jeffra in #2589
- [benchmarks] get mask token from tokenizer by @jeffra in #2592
New Contributors
- @rapsealk made their first contribution in #2564
- @sirredbeard made their first contribution in #2573
Full Changelog: v0.7.6...v0.7.7
v0.7.6: Patch release
What's Changed
- DeepSpeed inference config. (#2459) by @awan-10 in #2472
- Update docs to autogenerate pydantic config model docs by @mrwyattii in #2509
- Add max_tokens alias to max_out_tokens arg to maintain backwards compatibility by @lekurile in #2508
- Deepspeed quantization library v0.1 by @lokoppakmsft in #2450
- Fix backward compatibility for InferenceConfig by @mrwyattii in #2516
- Add missing Inference sub-configs by @mrwyattii in #2518
- Add note about nvcc/hipcc requirement by @jeffra in #2519
- Update codeowners by @jeffra in #2525
- Dequantization Utils Library by @cmikeh2 in #2521
- Fixes for torch 1.14 due to new torch.numel return type by @jeffra in #2522
- Ensure MOE is initialized for SD by @cmikeh2 in #2534
- Make DS-Inference config readable from JSON by @mrwyattii in #2537
- Add MII tests by @mrwyattii in #2533
- Remove mutable default parameter in
init_inference()
by @aphedges in #2540 - Change Where DS/Triton is Used in Stable Diffusion by @cmikeh2 in #2536
- Pass down the new DS inference config to replace_transformer_layer. by @awan-10 in #2539
- Adding Gradient Accumulation Data Type Config by @jomayeri in #2512
- Report progress at gradient accumulation boundary by @ShijieZZZZ in #2553
- encoded ds config into command line argument when launching child processes in autotuning by @cli99 in #2524
- Add missing MoE fields to inference config for backward compatibility by @mrwyattii in #2556
- Abstract accelerator (step 1) by @delock in #2504
- Fix invalid check of recorded parameter orders in zero stage3. by @inkcherry in #2550
New Contributors
- @ShijieZZZZ made their first contribution in #2553
- @delock made their first contribution in #2504
- @inkcherry made their first contribution in #2550
Full Changelog: v0.7.5...v0.7.6