Skip to content

Releases: NVIDIA/recsys-examples

v25.11

10 Dec 02:32
7492d4b

Choose a tag to compare

What's Changed

Features & Enhancements

  • Counter table interface and ScoredHashTable Implementation by @jiashuy in #229
  • Embedding admission strategy by @z52527 in #236
  • Optimize memory waste in segmented_unique by @z52527 in #244

Bug Fixes

  • Fix dtype mismatch of offset and table_range. by @jiashuy in #227
  • fix preprocessor local() error in python 3.10 by @shijieliu in #228
  • Add new handler despite of the exsiting ones by @JacoCheung in #233
  • Fix LFU test failed in incremental_dump by @jiashuy in #242
  • Fix default parameter initialization of KVCounter by @jiashuy in #253

Misc

Full Changelog: v25.10...v25.11

v25.10

11 Nov 02:01
d043535

Choose a tag to compare

What's Changed

Features & Enhancements

But Fixs

  • Fix LFU mode frequency count bug by @z52527 in #176
  • Fix config bug when using torchrec's STBE in benchmark by @jiashuy in #193
  • Fix IMA in incremental dump and test the dumped embeddings by @jiashuy in #211
  • Fix rab num heads by @JacoCheung in #222
  • Fix IMA caused by wrong worker id for device of which max threads is … by @jiashuy in #220

Misc

Full Changelog: v25.09...v25.10

v25.09

20 Oct 10:00
e065874

Choose a tag to compare

What's Changed

Features & Enhancements

Bug Fixs

  • fix DynamicEmbDump - handle long strings in broadcast_string by @fshhr46 in #164
  • fix: consider mask when calc hstu attn flops by @shijieliu in #177
  • export fix hstu ima when num_candidates = seqlen by @shijieliu in #183

Misc

New Contributors

Full Changelog: v25.08...v25.09

v25.08

08 Sep 01:39
2947e15

Choose a tag to compare

What's Changed

Features & Enhancements

  • Refactor dyanmicemb with Cache&Storage. by @jiashuy in #128
  • Support Kuairand dataset inference with alignment to training by @geoffreyQiu in #122
  • Support eval mode for dynamicemb and move insert in backward to forward for use_index_dedup=True by @shijieliu in #136
  • export hstu arbitrary mask by @shijieliu in #148
  • Optimize TP HSTU layer by @JacoCheung in #132

Bug fixs

Misc

New Contributors

Full Changelog: v25.07...v25.08

v25.07

01 Aug 09:35
6a5be94

Choose a tag to compare

What's Changed

Features & Enhancements

Bug fixs

  • fix noncontiguous input for dynamicemb by @shijieliu in #99
  • Fix dynamicemb example's local rank bug on multi-node by @z52527 in #95
  • [Fix] retrieval shifting prediction embedding bug by @shijieliu in #114

Full Changelog: v25.06...v25.07

v25.06

04 Jul 14:24
5652241

Choose a tag to compare

What's Changed

Features & Enhancements

LFU Eviction Strategy for Dynamic Embeddings
Added a new Least Frequently Used (LFU) eviction strategy to the dynamicemb module, improving memory management and embedding efficiency.
(Contributed by @z52527 — (#52))

LayerNorm Recomputation for Fused HSTU Layer
Support for recomputing LayerNorm in the fused HSTU layer to optimize memory usage during training.
(Contributed by @JacoCheung — (#59))

Embedding and Optimizer State Insertion to HKV During Backward Pass
When use_index_dedup is enabled, embeddings and optimizer states are now inserted into the HKV during the backward pass, improving training efficiency.
(Contributed by @jiashuy — (#62))

Support for Non-Contiguous Input/Output in HSTU MHA and SiLU Recomputation
Enabled handling of non-contiguous tensors for multi-head attention and SiLU recomputation within HSTU layers.
(Contributed by @JacoCheung — (#64))

Customized CUDA Operation for Concatenating 2D Jagged Tensors
Introduced a new CUDA operator concat_2d_jagged_tensors to efficiently concatenate jagged tensors in 2D.
(Contributed by @z52527 — (#42))

Support for Training Pipeline
Added support for a streamlined training pipeline to facilitate easier model training and experimentation.
(Contributed by @JacoCheung — (#68))

Bug Fixes

Fixed HSTU Preprocess and Postprocess CI Issues
Resolved continuous integration issues related to HSTU preprocessing and postprocessing steps.
(Contributed by @shijieliu — (#76))

Documentation

Updated HSTU Installation Instructions
Clarified and expanded the README installation guide for the HSTU module to improve user onboarding.
(Contributed by @z52527 — (#84))

Dependency Updates

Stable Dependency Upgrades
Updated key dependencies to stable versions:
torchrec updated to 1.2.0
fbgemm_gpu updated to 1.2.0
mcore updated to 0.12.1
(Contributed by @shijieliu and @JacoCheung — (#74), (#75))

v25.05

29 May 13:07
a247bd4

Choose a tag to compare

Changelog

Dynamicemb example #16 #31 #58
EmbeddingBagCollection support in Dynamicemb #20
Dynamicemb functionality enhancement #45 #46 #53

HSTU cutlass kernel support contextual features in hopper backward #51

Decouple sharding and model defination in hstu example #37
Fused hstu layer #43
Fix kuairand dataset convergency issue #34
Doc enhancement #39

Full Changelog: https://github.com/NVIDIA/recsys-examples/commits/v25.05