Releases · ModelEngine-Group/unified-cache-management

01 Apr 10:08

yuanzhg078

v0.5.0rc1

82467ce

v0.5.0rc1 Pre-release

Pre-release

Highlight

UCM adopts a more advanced core model architecture, with expanded support for GLM-4.x, GLM-5, and Minimax 2.5, available on both CUDA and Ascend platforms (See Latest Feature and Model Support Matrix)
Added garbage collection support for POSIX store, improving storage lifecycle management and resource utilization. (#777)
Optimized GSA on-device execution by fusing operators and resolving multiple performance-related issues, leading to better runtime efficiency. (#861, #862)
Store now supports configurable CPU affinity, enabling more efficient KVCache dump and load operations. (#852)
Improved layer-wise KV loading by introducing sequential per-layer scheduling, allowing better overlap between data loading and forward execution for enhanced throughput. (#783)

What's Changed

[Feat] Adding an environment variable input by @Menglths in #837
[opt] Add parallel size check in CI by @flesher0813 in #844
[fix]Add timeout in PosixStore working thread by @qyh111 in #849
[feat] Ascend: add mmap-based Host memory for O_DIRECT support by @mag1c-h in #850
[opt] Remove redundant step(Install GTest) from the workflow. by @mag1c-h in #851
[opt] Cancel the Load/Dump task proactively after it times out. by @mag1c-h in #853
[Feat] Add mindie-llm support by @nrj868 in #848
[opt] Cancel all timed-out tasks at once. by @mag1c-h in #859
[bugfix] Update seq_lens_list only on NPU path by @wangwenxin0312 in #860
[feat] Store provides the ability to configure CPU affinity. by @mag1c-h in #852
[opt] Optimize GSA by fusing operators by @Fengli5355 in #861
[feat] Support kvcsstore by @ayaka836 in #855
[feat] Add NUMA-aware CPU core split for vllm worker and store threads by @wangwenxin0312 in #854
[Bugfix] fix offline patch by @Infinite666 in #864
[fix] Update patch to track mindie changes by @nrj868 in #863
[CI] Add gsa online test by @dante159753 in #857
[opt] stack-protector on EmptyStore by @mag1c-h in #866
[opt] Enhance NPU CPU affinity resolution with NUMA fallback by @wangwenxin0312 in #865
[bugfix] Get all CPUs for the device's local socket by @wangwenxin0312 in #869
add timeout for PR gate pipeline by @dante159753 in #862
[refactor] Remove NVML-based CPU affinity setting by @wangwenxin0312 in #870
[feat] Add tensor parallel size support and update GPU memory utilization for online inference tests by @hmy98213 in #874
[fix]Submit layerwise KV load tasks one layer at a time by @qyh111 in #783
[feat] log with rate limit by @Lijiachen1018 in #821
[Feat] adapt for DSA model on CUDA platform by @sumingZero in #871
[Build] Update UCM Dockerfiles for vLLM/vLLM-Ascend v0.17.0 by @yuanzhg078 in #876
[Usage] Move use layerwise and hit ratio into config file by @harrisonyhq in #784
[test] multi-processor test on AIO and Shm by @mag1c-h in #873
[Feature] Garbage Collection by @UESTC-AHao in #777
[CI] add test set config by @dante159753 in #877
[bugfix] Fix MLA block_table row mapping by @wangwenxin0312 in #882
[doc] Update support matrix. by @yuanzhg078 in #880
release 0.5.0rc1 by @yuanzhg078 in #881
[doc] Update framework version compatibility notes by @yuanzhg078 in #885

New Contributors

@nrj868 made their first contribution in #848
@hmy98213 made their first contribution in #874

Full Changelog: v0.4.0...v0.5.0rc1

Contributors

dante159753, Infinite666, and 14 other contributors

Assets 2

20 Mar 10:13

sumingZero

v0.4.0

bff8e9b

v0.4.0 Latest

Latest

Highlight

Support SGLang

UCM is now integrated with SGLang, enabling prefix cache offloading to Posix Store to reduce redundant computation and lower TTFT
(see Quickstart-SGLang: https://ucm.readthedocs.io/en/latest/getting-started/quickstart_sglang.html)
(#757)

Refactor PipelineStore for Scalability and Performance

Refactor PipelineStore into a modular, plugin-based architecture with automatic registration and runtime loading (#689)
Improve overall performance through optimized Store implementations (e.g., cache store, posix store) and execution flow (#722, #744, #787)

UCM Connector

UCM now additionally supports advanced parallel paradigms, including PCP / DCP and PP, enabling more flexible and scalable distributed execution (#750)
Improve UCM connector performance by introducing optional event synchronization control (#768)

Inference Enhancement Features

GSAOnDevice sparse attention algorithm has been upgraded with improved performance and accuracy, now fully supporting vLLM / vLLM-Ascend 0.11.0 (#659, #746, #729)
Add support for Rerope in vLLM version 0.11.0 (#686)
Enhance UCM logger compatibility (#760)

Document

Add feature and model support matrix
Extend UCM Store with scalable storage, persistence, and efficient data handling
(see: https://ucm.readthedocs.io/en/latest/developer-guide/extending_store.html)

What's Changed

[fix] Adapt ESA to the LayerWiseConnector by @wangwenxin0312 in #681
[doc] Add Code of Conduct by @yuanzhg078 in #684
[Opt] GsaOnDevice cuda bugfix & optimization by @wangwenxin0312 in #659
[CI] Modify pull request template by @yuanzhg078 in #687
[Feature] rewrite logger module by @Lijiachen1018 in #608
[refactor] Rename global rank and remove broadcast function by @harrisonyhq in #685
[fix] clean code by @Lijiachen1018 in #688
[opt] Refactor PipelineStore for Enhanced Scalability by @mag1c-h in #689
[fix] Fix logger by @Lijiachen1018 in #690
[doc] How to extend UCM Store by @mag1c-h in #692
[CI] logger use zlibstatic by @Lijiachen1018 in #698
[Bugfix] Cherry-pick modify worker_id to distinguish diff workers(#691) by @flesher0813 in #701
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #699
[feat] rerope feature for vllm0.11.0 by @xinSky00 in #686
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #707
[bugfix] fix debug log printing by @Lijiachen1018 in #706
[bugfix] Fixed the issue of invalid LocalBuffer pointers in PCStore by @mag1c-h in #715
[bugfix] rerope feature for vllm0.9.2 and git apply merging by @xinSky00 in #708
[CICD] run e2e test in docker by @dante159753 in #712
[Feature] Add readme and dataset in performance and evaluation test by @zzycode1005 in #721
[bugfix] Adaptive modification of llmperf by @Menglths in #719
[Feat] sparse patch for vllm-ascend v0.11.0 by @Infinite666 in #718
[Bugfix]Fix garbled output when tp > 1 by @qyh111 in #716
[perf] Copy Bandwidth Optimize: Multi-Stream parallelism supported in CacheStore by @mag1c-h in #722
[Feat] sparse patch for gsa on device(GQA) va0.11.0rc1 by @Infinite666 in #726
[Feat]Add layerwise and log_path config in run.sh by @qyh111 in #724
[opt] Default depth of the waiting queue needs to be increased by nShard times for layer-wise by @mag1c-h in #731
[Feat] Reuse-aware layer skipping under dynamic KV sparsification by @tedi20 in #725
[opt] Increase the default running queue depth to support greater concurrent requests. by @qyh111 in #733
[Feat]: Monkey patch framework for vllm 0.11.0, fix graph mode + UCM bugs by @NaganooMei in #735
[Feat] Add csrc/ascend NPU custom ops for GSA by @leideng in #729
[feat] Variable length IO supported in CacheStore by @mag1c-h in #734
[Opt] Enable concurrent prefix lookup for posixstore by @sumingZero in #739
[CI] refine docker file to use in yellow field by @dante159753 in #741
[Feat]: Implement load failure recovery via monkey patch by @NaganooMei in #738
[Opt]Split the thread pool into separate load and dump pools to prevent them from interfering with each other. by @qyh111 in #744
[opt] Print TaskId in the CacheStore Error Log by @mag1c-h in #742
[opt]Adapt variable io size by @qyh111 in #745
[Opt] Add log timestamp in run_vllm.sh by @qyh111 in #747
[bugfix & opt] gsaOnDevice for CUDA Graph mode by @wangwenxin0312 in #732
[test & bugfix] fix low dump performance in posixstore e2e test by @NaganooMei in #751
[Fix] Modify the config files of gsaondevice. by @AooooooA-C in #749
[Test] Remove memory manager abstraction in PosixStore e2e test by @NaganooMei in #753
[opt] CUDA Hamming Distance Kernel Optimization for GQA by @wangwenxin0312 in #755
[fix] fix zlib gitcode url by @Lijiachen1018 in #758
[Feature] Integrate UnifiedCache (UCM) into SGLang for Multi-Level Caching System by @pyxyzc in #757
chore(test): Ensure that unnecessary import failures do not affect test execution by @Potterluo in #754
[feat] GSAOnDevice for MLA Models Like DeepSeek V2/V3 in Ascend NPU by @leideng in #746
[Feat] sparse patch for gsa on device(MLA) va0.11.0 by @Infinite666 in #761
[Fix] fix save_speed core dump and loaded blocks num when task failed by @flesher0813 in #763
Fix batch_size_for_hamming bug when slice is disabled (vllm-ascend 0.11.0) by @leideng in #765
[Feat] adapt dcp&pcp by @flesher0813 in #750
[Fix] Add init.py for rerope. by @AooooooA-C in #769
[Refactor]monkey patch sparse feature in v0.11.0 by @ayaka836 in #743
[Opt] update deepseek r1 config by @leideng in #770
[feat] Introduce platform-specific sparse trigger thresholds for GPU and NPU by @wangwenxin0312 in #762
[opt] Define UCM_ROOT_DIR to ensure safety when used UCM as a sub-repository by @mag1c-h in #772
[opt] enable Ascend register pin optimization by @mag1c-h in #775
[fix] remove imports that specific to platform by @dante159753 in #771
[opt] supports lo...

Contributors

leideng, dante159753, and 23 other contributors

Assets 2

30 Jan 08:47

flesher0813

v0.3.0

8dd98d1

v0.3.0

HighLights

Refinement of PipelineStore Architecture and Enhancement of Core Capabilities #653 #711
Now supports 3FS for scalable and efficient storage backends #622
Features the new GSAOnDevice sparse attention algorithm, enabling high-performance HBM utilization across both CUDA and Ascend platforms.#647 #638
Aligned CacheBlend with the new UCM storage and sparse engine updates to support vLLM 0.9.2. #664

Known Issues

Layerwise is not supported when using vllm 0.11.0
- Currently, installing with pip install uc-manager does not support using vllm 0.11.0.
- If you need to use vLLM 0.11.0+ with UCM layerwise, please refer to vllm-project/vllm#26675 for modifications.

What's Changed

[bugfix]cherry-pick from 0.2.0release Fix KeyError by @qyh111 in #573
[bugfix] cherry-pick from 0.2.0release patch update by @wangwenxin0312 in #574
[fix]cherry pick from 0.2.0-release fix monitor issue (#572) by @qyh111 in #575
[bugfix] build hamming dist by @wangwenxin0312 in #577
[feat]Update data file layout to adapt to garbage collection by @qyh111 in #579
[bugfix]cherry pick from 0.2.0-release sparse patch & cmake by @wangwenxin0312 in #581
[bugfix] kvcomp config by @wangwenxin0312 in #584
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #589
feature for triton rerope by @xinSky00 in #497
[bugfix] kvcomp for qwen by @wangwenxin0312 in #594
[bugfix] share buffer used out (cherry-picked from #592) by @mag1c-h in #598
[fix]cherry-pick clean code and set local_rank_size to tp_size (#596) by @qyh111 in #600
[misc] split dependency preparation logic into individual dependency files for enhanced configuration flexibility by @mag1c-h in #597
[fix]fix clean code (#601) by @qyh111 in #602
Modify blend and rerope docs by @xinSky00 in #593
[docs] Modify blend introduction by @wuhuxiao in #605
add qiongwu as codeownner by @Infinite666 in #610
KVComp in NPU -- HBM version by @leideng in #599
[bugfix] bugfix in PCStore, cherry-pick from release by @mag1c-h in #609
[docs]Add doc for pipeline store by @qyh111 in #607
[fix] remove request_succeed_dumped_blocks() in monkey patch by @xinSky00 in #613
[fix]Sync changes from the release branch to develop. including docs、version and dockerfile by @qyh111 in #621
[feat] Cherry-pick updates from 0.2.0-release to develop (patches and docs) by @wangwenxin0312 in #623
[bugfix] ] Cherry-pick updates from 0.2.0-release (hamming compile) by @wangwenxin0312 in #625
[doc]rename pipline_store to pipeline_store by @qyh111 in #626
[bugfix] fix register_kv_caches patch by @Clarence-1103 in #629
Unify xSA name as GSA by @leideng in #631
[Feature] 3FS Store by @UESTC-AHao in #622
[optimize]Optimized LLMPerf Test Cases by @Potterluo in #634
[Doc] 3FS Document by @UESTC-AHao in #637
[Feat] Basic scripts for deployment best practices by @sumingZero in #556
[feature]Add LLM connection base components and OpenAI connector by @Potterluo in #636
[Bugfix] Fix 3FS by @UESTC-AHao in #650
[feat] PipelineStore Architecture Refresh and Capability Enhancement by @mag1c-h in #653
[doc] Add contributing guide by @yuanzhg078 in #648
[doc]Implement the function of a kv cache calculator html in User Guide by @Potterluo in #652
[Opt] New gsa config by @leideng in #646
[Feat] Support C++/Python to use same metrics singleton within a process by @flesher0813 in #654
[feat]Add Layerwise Connector by @qyh111 in #656
[Fix] Modify ucm_connector to adapt metrics by @flesher0813 in #658
[doc] Update quickstart section in README_zh by @yuanzhg078 in #663
[Feat] Update sparse method patches for vllm 0.11.0 by @AooooooA-C in #638
[CI] add pr gate workflow by @dante159753 in #662
[Opt] Gsa npu performance optimize by @leideng in #647
[misc] Reduce gpu utilization to 6GB in test for 1.5B model by @dante159753 in #665
[feat] add monkey patch for gsa on device v0.9.2 by @Clarence-1103 in #618
[Fix] coredump if add new c++ metrics by @flesher0813 in #666
[opt] adapt cache blend for store and sparse's new version by @wuhuxiao in #664
[Doc] Update documents related to sparse. by @AooooooA-C in #672
[CI] use requirements file to prepare test env by @dante159753 in #673
[test]Evaluate model performance and accuracy with UCM by @ayaka836 in #642
[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism by @sumingZero in #670
[CI] remove logger, check branch up-to-date, fast fail e2e test by @dante159753 in #674
release 0.3.0 by @flesher0813 in #677
[bugfix] Fix compilation error due to missing atomic include by @harrisonyhq in #693
[Bugfix] Modify worker_id set to separate different worker by @flesher0813 in #691
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #700
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #711

New Contributors

@Infinite666 made their first contribution in #610
@dante159753 made their first contribution in #662
@ayaka836 made their first contribution in #642

Full Changelog: v0.2.0...v0.3.0

Contributors

leideng, dante159753, and 15 other contributors

Assets 2

05 Jan 12:28

qyh111

v0.2.0

39d46c7

v0.2.0

Hightlights

Support Model Window Extrapolation:Rectified Rotary Position Embeddings (ReRoPE)(#497)
Support sparse attention algorithms in HBM on both CUDA GPUs and Ascend NPUs. It sparsifies attention by hashing KV states and using Hamming distance Top-K selection.(#559)
Add Pipeline Store composed of Cache Store and POSIX Store(#553).
Improved KV cache transfer performance for NfsStore.(#393)

Known Issues

Sparse is not supported when installing via pip
- Currently, installing with pip install uc-manager does not support Sparse.
- Before installing via pip, please make sure to set the platform explicitly:
```
export PLATFORM=xxx
```
- To use Sparse, please install via the Docker image or build from source.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520
[bugfix]fix setup.py (#520) by @qyh111 in #521
feat(test): Add PostgreSQL support and optimize database write logic by @Potterluo in #507
[fix] move init to intergration/vllm directory by @Lijiachen1018 in #535
[Fix]Add PLATFORM reminder by @zhou-haitao in #526
cherry-pick from 0.1.0-release by @Lijiachen1018 in #552
[Feat] New Store Impl: CacheStore - PosixStore - PipelineStore by @mag1c-h in #553
[Perf] parallel block-existence checks + timeout exception by @mag1c-h in #550
[feat] Shard block files into subdirs by hash prefix, with opt-out switch by @mag1c-h in #561
[feat]use numpy to calculate addrs by @qyh111 in #564
[Bugfix] use-after-free in LookupBatch by @mag1c-h in #565
[Bugfix] skip fresh shm files to avoid race between multiple instances by @mag1c-h in #566
[Bugfix] Fix incorrect fallback in GetHostBuffer: use MakeHostBuffer instead of MakeDeviceBuffer by @mag1c-h in #568
[feat] kvcomp on device by @wangwenxin0312 in #559
[fix]Add exception handling by @qyh111 in #569
[bugfix]Fix KeyError when VLLM_HASH_ATTENTION environment variable is not set by @qyh111 in #570
[bugfix] patch update by @wangwenxin0312 in #571
[fix]fix monitor issue by @qyh111 in #572
[bugfix] build hamming dist by @wangwenxin0312 in #578
[feat] Update data file layout to adapt to garbage collection by @qyh111 in #576
[bugfix] sparse patch & cmake by @wangwenxin0312 in #580
[build]fix spdlog use ext fmt by @Lijiachen1018 in #585
[bugfix] kvcomp fix by @wangwenxin0312 in #586
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #588
[bugfix] share buffer used out by @mag1c-h in #592
[bugfix] kvcomp for qwen by @wangwenxin0312 in #595
[fix]clean code and set local_rank_size to tp_size by @qyh111 in #596
[fix]fix clean code by @qyh111 in #601
[Bugfix] update block dir permission & double-free fix by @mag1c-h in #603
[bugfix] double-release shared-block while make reader failed by @mag1c-h in #604
[docs]add doc for pipeline store by @qyh111 in #612
[feat] cherry-pick to 0.2.0-release to add rerope by @xinSky00 in #614
fix ascend patch and change version by @qyh111 in #615
add patch in dokerfile-npu by @qyh111 in #617
[feat] cherry-pick KVComp in NPU -- HBM version into the 0.2.0-release branch by @wangwenxin0312 in #619
[feat] update all patch and docs by @wangwenxin0312 in #620
[bugfix] hamming compile by @wangwenxin0312 in #624

New Contributors

@zzycode1005 made their first contribution in #462

Full Changelog: v0.1.2...v0.2.0

Contributors

mag1c-h, Lijiachen1018, and 9 other contributors

Assets 2

13 Dec 13:43

qyh111

v0.2.0rc1

bad9354

v0.2.0rc1 Pre-release

Pre-release

Hightlights

Improved Prefix Cache offload/load performance.
Support Cache Blend.

Core:

Support Cache Blend in (#467)
Add V1 Store Interface in (#510) and (#518)

Known Issues

When using the Ascend platform:
- Broadcasting is not supported.
- load_only_first_rank must be set to false in the configuration.
When compiling from source, make sure to set the PLATFORM environment variable.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520

New Contributors

@zzycode1005 made their first contribution in #462
@wuhuxiao made their first contribution in #467

Full Changelog: v0.1.2...v0.2.0rc1

Contributors

Lijiachen1018, ygwpz, and 5 other contributors

Assets 2

10 Dec 07:56

Lijiachen1018

v0.1.2

aa31619

v0.1.2

Some small fixes in this release.

[Docs] Documents are now easier to read.
[Docs] PD disaggregation documentation update : Update the PD disaggregation documentation to remove the --enforce-eager argument when starting the vllm service, so that graph mode is enabled by default at startup.
[Feat] Completely remove UCconnector, please use UCMConnector from now on.
[Feat] UCM supports recovery form load failure：Implement the get_block_ids_with_load_errors interface in the KVConnectorBase_V1 class, enabling vLLM to reexecute inference for requests whose KV cache failed to load from UCM.
[Build] Use pip install uc-manager==0.1.2 and the install will build from source for both vllm and vllm-ascend.
[Build] Sparse module are now built and used only if set environment variable export ENABLE_SPARSE=TRUE.

What's Changed

[cleancode]rm video by @Lijiachen1018 in #459
[fix] pick fixes from Release to develop by @Lijiachen1018 in #465
[cleancode]remove uc connector by @Lijiachen1018 in #460
[build] project docs for pypi by @Lijiachen1018 in #466
[build]build sparse only if enabled by @Lijiachen1018 in #470
[Misc] fetch dependence from gitcode as backup by @mag1c-h in #469
[docs] renew docs by @Lijiachen1018 in #476
release v0.1.1 by @Lijiachen1018 in #478
feat: add MetaX MACA device support for PC by @simshi in #387
[Docs] PD disaggregation documentation update by @sumingZero in #479
[Feat] UCM supports recovery form load failure by @sumingZero in #477
[feat]Add configurable scattergatter by @qyh111 in #483
[bugfix]add synchronize on ascend platform by @qyh111 in #485
[build] fix build by source distribution by @Lijiachen1018 in #484
release v0.1.2 by @Lijiachen1018 in #491
develop merge into main by @ygwpz in #492
[docs] fix links in docs and add clarifications by @Lijiachen1018 in #499

New Contributors

@simshi made their first contribution in #387

Full Changelog: v0.1.0...v0.1.2

Contributors

simshi, mag1c-h, and 4 other contributors

Assets 2

02 Dec 08:42

Lijiachen1018

v0.1.0

5ba2684

v0.1.0

We are excited to announce the first official release of Unified Cache Manager.

Hightlights

Offload Prefix Cache to storage.
Homogeneous/ Heterogeneos PD disaggregation.
Training-Free sparsity in accelerating inference.（vllm==0.9.2, vllm-ascend==0.9.2rc1）in #199, #335, #190, #451

Core:

Garbage collection for store in #315 and #312
Adapt to vllm and vllm-ascend in #13, #292, #415 and #362
UCM supports metrics display online via Grafana and Promethues in #414 and docs in #416

Known Issues

If using Ascend platform, please be mind of

not compatible with broadcast
load_only_first_rank: false in config

Others

Update documents
Tools for performance tuning, hyperparameter optimization in #418

What's Changed

[opt] Share Infra implementation and unify status codes by @mag1c-h in #399
[bugfix] Fix ESA to be compatible with the latest NFSStore. by @wangwenxin0312 in #401
release v0.1.0rc4 by @Lijiachen1018 in #402
[opt] Remove unused cc impl of dramstore by @mag1c-h in #406
[Fix]remove dram docs and modify quick-start doc by @hero0307 in #411
[Feature] Added performance testing tool based on the PyTest testing framework by @Menglths in #295
[Misc] Add cpp-linter.yml by @mag1c-h in #422
[docs]add metrics doc by @hero0307 in #416
[perf] Modify CUDA SIMD and add Triton hash encoder by @Clarence-1103 in #408
[bugfix] batch trans on cuda with SM return 700 error by @mag1c-h in #434
[Misc] set default logger backend to spdlog by @mag1c-h in #440
[rebase]Dev-ucm-v1 rebase to develop by @Lijiachen1018 in #453
[cleancode] remove dramstore by @Lijiachen1018 in #455
Fix metrics by @Lijiachen1018 in #456

New Contributors

@Menglths made their first contribution in #295

Full Changelog: v0.1.0rc4...v0.1.0

Contributors

mag1c-h, Lijiachen1018, and 4 other contributors

Assets 4

22 Nov 10:16

Lijiachen1018

v0.1.0rc4

5779ce9

v0.1.0rc4 Pre-release

Pre-release

What's Changed

[feat] ucmtrans: Unify API for Device-Host Memory Transfers by @mag1c-h in #379
[feat] Add support for Ascend device memory transfers by @mag1c-h in #382
[Fix] fix build, fix no save kv layer by @Lijiachen1018 in #390
[feat] Add pcstore for enhanced PrefixCache performance by @FangRun2 in #393
[fix] fix ascend attention by @Lijiachen1018 in #394
release v0.1.0rc3 by @Lijiachen1018 in #395
[fix] fix sparse attention by @Lijiachen1018 in #397

New Contributors

@FangRun2 made their first contribution in #393

Full Changelog: v0.1.0rc2...v0.1.0rc4

Contributors

mag1c-h, Lijiachen1018, and FangRun2

Assets 3

19 Nov 08:01

Lijiachen1018

v0.1.0rc2

16ed5da

v0.1.0rc2 Pre-release

Pre-release

What's Changed

[docs] update docs for v0.1.0rc1 by @Lijiachen1018 in #365
[bug fix] Dev patch fix for sparse by @Lijiachen1018 in #371
[build] auto patch for ascend by @Lijiachen1018 in #372
feat: add Mthreads MUSA device support -stage 1 by @superleo in #370
release v0.1.0rc2 by @Lijiachen1018 in #373
prefetch bug by @zbb200819 in #360
[Feat]Adapt to vllm-ascend0.9.1 and vllm-ascend0.11.0 by @hero0307 in #362
[bugfix] add cmake option to bypass NUMA binding by @Clarence-1103 in #368
[Feat] Update the data items saved by trace replay by @sumingZero in #366

New Contributors

@superleo made their first contribution in #370

Full Changelog: v0.1.0rc1...v0.1.0rc2

Contributors

superleo, zbb200819, and 4 other contributors

Assets 3

17 Nov 12:21

Lijiachen1018

v0.1.0rc1

754f7ba

v0.1.0rc1 Pre-release

Pre-release

Support Features

Prefix Cache
Sparse Attention
Sparse Attention Offload
PD Disaggregation

What's Changed

remove impl by @flesher0813 in #11
adapt vllm v0.9.2 by @flesher0813 in #13
[Doc] Outline of the document by @ygwpz in #15
remove impl test and add uc connector test by @flesher0813 in #14
[Doc] Installation of ucm by @flesher0813 in #17
[Feature] Add DRAM Connector for uc_connector by @harrisonyhq in #18
[doc] add readme and license by @ygwpz in #24
[Feature] Add Dockerfiles by @flesher0813 in #20
[Feature]Nfsstore by @propanone1006 in #23
[doc] change docs outline by @ygwpz in #32
[Feature] Add Cmake build command in setup.py by @harrisonyhq in #34
[fixbug] fix issue#25 issue#31 and issue#33 by @flesher0813 in #30
[Fix][Docs] Make example runnable and add performance data (closes #37 #29 #42) by @harrisonyhq in #41
[Feat] Move kv_block_size to config by @harrisonyhq in #43
[feature][docs]finish nfs store and add docs by @qyh111 in #44
[doc] Add export of device type in installation;[Fix] fix version invalid#45 #46 by @harrisonyhq in #47
add perf data in readme by @ygwpz in #49
[Feat] Merge 0.0.1 back into develop by @flesher0813 in #50
[bugfix] fix issue#26 and issue#36 by @ygwpz in #55
[Doc] Add vllm institution by @flesher0813 in #61
[CI][Fix] update issue and pr template, fix issue #57, cherry-pick main by @flesher0813 in #65
[Doc] update install doc using patch to build from source code by @flesher0813 in #68
[Feat] Merge 0.0.1 back into develop by @ygwpz in #72
[Style] Fix codestyle problems and typo in develop by @harrisonyhq in #75
[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework by @hek14 in #79
[Fix] Fix cant find cmake error when using pip install -e . by @harrisonyhq in #80
Revert "[Feature] add ucm_sparse v1.0: unified sparse attention algorithm framework " by @ygwpz in #82
[Feature] add Mooncake Store by @propanone1006 in #86
[Fix bug] Simplify docker build and installation.md by @flesher0813 in #87
[BUG]adapt deepseek by @qyh111 in #89
[Feature][P/D] add example for disaggregated prefill by @flesher0813 in #90
[Perf] Pipelined ucmnfsstore by @mag1c-h in #97
Revert "[Feature] add Mooncake Store" by @ygwpz in #98
[Fix bug] fix uc_connector ut and change hash generation method by @hero0307 in #101
[Fix] Fix .so build error by @harrisonyhq in #104
[Fix] Fix ascend compile error by @mag1c-h in #106
[Perf]Modify start_load_kv by @qyh111 in #103
[Fix] Fix duplicate create/commit errors upon preemption by @flesher0813 in #109
[Feat] Adapt for vllm 0.9.1 by @sumingZero in #113
[Feature] [Doc] UCMSparse framework by @hek14 in #112
[fix] remove redundant code and files/rename file names by @NaganooMei in #118
[Fix] Fix spelling issues with PR templates by @propanone1006 in #119
remove load_tasks by @NaganooMei in #121
[bugfix] bugfix in ucmnfsstore by @mag1c-h in #123
[doc]Add config parameter by @UESTC-AHao in #130
[bugfix]fix rank handing in multi-node pp setup by @qyh111 in #129
[Feat]Support UCM Sparse on cuda by @harrisonyhq in #126
[Feature] Add mooncake store by @hufumans in #117
[bugfix]modify mla dump by @zhou-haitao in #128
[feature] non-blocking interfaces are provided to check whether the transmission task is completed by @mag1c-h in #139
[feature] return error if block exists while batch creation. by @mag1c-h in #138
[feature]modify create interface by @hufumans in #145
[Doc] change logo and rearange docs by @flesher0813 in #156
0.0.2 release merge develop by @ygwpz in #158
[doc][feature] change code directory by @ygwpz in #161
[fix] modify patch and workflow by @NaganooMei in #163
[Feat] Support load async by @flesher0813 in #166
[Feat]Support load async and load failure by @flesher0813 in #165
[Feature]refactor ucconnector by @qyh111 in #167
[feature] upload retake codes by @truthstriver in #172
[bugfix]Resolve the issue of the first-round commit failure under dsv2 by @zhou-haitao in #186
[Feat] Add KVComp sparse attention implementation in UCM by @leideng in #182
[perf]prepare offset in advance by @qyh111 in #188
[feature] GSA by @HaoLi980405 in #190
[bugfix]fix pp problem and remove err logs when duplicate create by @qyh111 in #191
[Fix] Fix bug: check task returns -50005 during async load by @sumingZero in #192
[bugfix]gsa fix reslotmapping bug by @HaoLi980405 in #194
[bugfix]gsa fix running reqs exceed 30 bug by @HaoLi980405 in #195
[doc] design doc directory by @ygwpz in #197
[Perf]kv_block_size as well as transferIoSize are calculated rather than configured by @UESTC-AHao in #196
[Feat] add cuda topk and gsa descriptions by @HaoLi980405 in #198
[Fix] Fix workflow image space error in action by @harrisonyhq in #203
[bugfix]roll back dataoffset by @qyh111 in #201
[bugfix] fix whl install gsa error and gsa kpre reslotmapping out of range by @HaoLi980405 in #204
[Fix][Doc] Modify sparse docs by @flesher0813 in https://gi...

Contributors

leideng, yxkyong, and 25 other contributors

Assets 3

Releases: ModelEngine-Group/unified-cache-management

v0.5.0rc1

Highlight

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

Highlight

Document

What's Changed

Contributors

Uh oh!

v0.3.0

HighLights

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

Hightlights

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0rc1

Hightlights

Core:

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Hightlights

Core:

Known Issues

Others

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0rc4

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0rc2

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0rc1

Support Features

What's Changed

Contributors

Uh oh!