LMCache · marcobarlo · May 28, 2026 · May 29, 2026 · Jun 5, 2026 · Jun 8, 2026
@@ -116,7 +116,7 @@ jobs:
 
           echo ">>> Installing LMCache Core..."
           # TODO (gingfung): use strategy matrix for different version tests
-          pip install lmcache==0.4.4
+          pip install lmcache==0.4.5
 
           echo ">>> Building LMCache-Ascend..."
           export CPLUS_INCLUDE_PATH=/usr/include/c++/12:/usr/include/c++/12/backward:/usr/include/c++/12/`uname -i`-openEuler-linux/:$CPLUS_INCLUDE_PATH
@@ -329,7 +329,7 @@ jobs:
 
           echo ">>> Installing LMCache Core..."
           # TODO (gingfung): use strategy matrix for different version tests
-          pip install lmcache==0.4.4
+          pip install lmcache==0.4.5
 
           echo ">>> Building LMCache-Ascend..."
           export CPLUS_INCLUDE_PATH=/usr/include/c++/12:/usr/include/c++/12/backward:/usr/include/c++/12/`uname -i`-openEuler-linux/:$CPLUS_INCLUDE_PATH

@@ -31,14 +31,14 @@ LMCache-Ascend is a community maintained plugin for running LMCache on the Ascen
 
 To use LMCache-Ascend on the NPU hardware, please make sure the following prerequisites are satisfied.
 
-- **Hardware**: Atlas 800I A2 Inference series. (A3 Inference/Training and 300I Duo are experimental).
+- **Hardware**: Atlas A2/A3 series. (300I Duo are experimental).
 - **OS**: Linux-based.
 - **Software**:
   - **Python**: >= 3.10
   - **CANN Toolkit**: >= 8.2.RC1
   - **Ascend Driver**: >= 24.1.0
   - **PyTorch**: >= 2.7.1
-  - **vLLM**: >=v0.11.0 & **vLLM-Ascend**: >=v0.11.0
+  - **vLLM-Ascend**: >= v0.11.0
 
 ### Compatibility Matrix
 
@@ -47,31 +47,27 @@ Please ensure your environment matches the versions below.
 #### For PyTorch / vLLM
 | LMCache-Ascend | LMCache | vLLM Version |
 | :--- | :--- | :--- |
-| **main** | **v0.4.4** | **>=v0.14.0** |
-| **v0.4.3** | **v0.4.3** | **>=v0.14.0** |
+| **main** | **v0.4.5** | **>=v0.14.0** |
+| **v0.4.4** | **v0.4.4** | **>=v0.14.0** |
 
 #### For PyTorch / SGLang
 | LMCache-Ascend | LMCache | SGLang Version |
 | :--- | :--- | :--- |
-| **main** | **v0.4.4** | **0.5.8** |
-| **v0.4.3** | **v0.4.3** | **0.5.8** |
+| **main** | **v0.4.5** | **0.5.8** |
+| **v0.4.4** | **v0.4.4** | **0.5.8** |
 
 #### For MindSpore
 | LMCache-Ascend | LMCache | vLLM Version |
 | :--- | :--- | :--- |
-| **main** | **v0.4.4** | **v0.11.0** |
-| **v0.4.3** | **v0.4.3** | **v0.11.0** |
-
-> **Note**: If you require legacy support for vLLM 0.9.2, you must use PyTorch 2.5.1. See the [Compatibility Matrix](#compatibility-matrix) above.
+| **main** | **v0.4.5** | **v0.11.0** |
+| **v0.4.4** | **v0.4.4** | **v0.11.0** |
 
 
 ## Getting Started
 
 ### for vLLM-Ascend
 
-You can choose `Manual Installation` or `Build Docker Image`.
-
-#### Manual Installation
+#### Installation
 1. Prepare Base Environment
 
 It is recommended to use the official [vLLM-Ascend image](https://quay.io/repository/ascend/vllm-ascend?tab=tags) as a base:
@@ -107,52 +103,17 @@ quay.io/ascend/vllm-ascend:v0.18.0
 
 - from pip
 ```bash
-NO_CUDA_EXT=1 pip install lmcache==0.4.3
+NO_CUDA_EXT=1 pip install lmcache==0.4.4
 ```
 
 3. Install LMCache-Ascend Repo
 
 ```bash
-git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
+git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
 cd LMCache-Ascend
 pip install -v --no-build-isolation -e .
 ```
 
-#### Build Docker Image
-
-Build the image using the provided Dockerfile:
-```bash
-git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
-cd LMCache-Ascend
-docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.4.3-vllm-ascend-v0.18.0-openeuler .
-```
-
-Once that is built, run it with the following cmd
-```bash
-docker run -it \
---shm-size=200g --privileged --net=host \
---cap-add=SYS_RESOURCE \
---cap-add=IPC_LOCK \
---device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \
---device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \
---device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc \
--v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
--v /etc/hccn.conf:/etc/hccn.conf \
--v /usr/bin/hccn_tool:/usr/bin/hccn_tool \
--v /var/log/npu:/var/log/npu \
--v /usr/local/dcmi:/usr/local/dcmi \
--v /etc/localtime:/etc/localtime \
--v /etc/ascend_install.info:/etc/ascend_install.info \
--v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
--v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--v /usr/src/kernels:/usr/src/kernels:ro \
--v /data:/data \
---name lmcache-ascend-test \
---entrypoint /bin/bash \
-lmcache-ascend:v0.4.3-vllm-ascend-v0.18.0-openeuler
-
-```
-
 For further info about deployment notes, please refer to the [guide about deployment](docs/deployment.md)
 
 #### Usage
@@ -169,24 +130,23 @@ vllm serve /data/models/Qwen/Qwen3-32B \
 --max-num-batched-tokens 32768 \
 --host 0.0.0.0 \
 --port 8100 \
---kv-transfer-config '{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both","kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
+--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
 
 ```
 
 ##### Offline
 ```python
 ktc = KVTransferConfig(
-        kv_connector="LMCacheAscendConnectorV1Dynamic",
+        kv_connector="LMCacheAscendConnector",
         kv_role="kv_both",
-        kv_connector_module_path="lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
     )
 ```
 
 > **Note**: For vllm-ascend versions >=0.17.0rc1, you can specify `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`
 
 ### for SGLang
 
-#### Manual Installation
+#### Installation
 1. Prepare Base Environment
 
 It is recommended to use the official [Ascend SGLang image](https://quay.io/repository/ascend/sglang?tab=tags) as a base:
@@ -201,13 +161,13 @@ docker run -it --privileged --net=host --name lmcache-sglang-dev quay.io/ascend/
 
 - from pip
 ```bash
-NO_CUDA_EXT=1 pip install lmcache==0.4.3
+NO_CUDA_EXT=1 pip install lmcache==0.4.4
 ```
 
 3. Install LMCache-Ascend Repo
 
 ```bash
-git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
+git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
 cd LMCache-Ascend
 pip install -v --no-build-isolation -e .
 ```
@@ -229,53 +189,9 @@ python \
     --port 8100
 ```
 
-## Getting Started With MindSpore
-
-### Docker
+### for vLLM-MindSpore
 
-1. Clone LMCache-Ascend Repo
-Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.
-
-```bash
-cd /workspace
-git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
-```
-
-2. Build Docker Image
-```bash
-cd /workspace/LMCache-Ascend
-docker build -f docker/mindspore/Dockerfile.a2.openEuler -t lmcache-ascend:v0.4.3-mindspore2.7.1.post1-openeuler .
-```
-
-3. Start Container
-Once that is built, run it with the following cmd
-```bash
-docker run -itd \
-    --shm-size 200g --privileged \
-    --net=host \
-    --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \
-    --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \
-    --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc \
-    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-    -v /var/log/npu/:/var/log/npu \
-    -v /usr/local/dcmi:/usr/local/dcmi \
-    -v /etc/ascend_install.info:/etc/ascend_install.info \
-    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-    -v /lib/modules:/lib/modules:ro \
-    -v /usr/src/kernels:/usr/src/kernels:ro \
-    -v /mnt/storage1/data:/data \
-    -v /home:/home \
-    --name lmcache-ascend-ms \
-    --entrypoint /bin/bash \
-    lmcache-ascend:v0.4.3-mindspore2.7.1.post1-openeuler
-
-docker exec -it -u root lmcache-ascend-ms bash
-```
-
-For further info about deployment notes, please refer to the [guide about deployment](docs/deployment.md)
-
-### Manual Installation
+#### Installation
 
 1. Start the base container
 ```bash
@@ -305,22 +221,22 @@ docker exec -it -u root lmcache-ascend-ms bash
 2. Install LMCache
 
 ```bash
-NO_CUDA_EXT=1 pip install lmcache==0.4.3 --no-deps
+NO_CUDA_EXT=1 pip install lmcache==0.4.4 --no-deps
 ```
 
 3. Install LMCache-Ascend
 
 ```bash
-git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
+git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
 cd LMCache-Ascend
 USE_MINDSPORE=1 pip install -r requirement_ms.txt --no-build-isolation -v -e .
 ```
 
-### Usage
+#### Usage
 
-We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
+We use the in-process ``LMCacheAscendConnector`` from vllm-ascend (patched at ``import lmcache_ascend``). Configure KV transfer as follows.
 
-#### Online serving
+##### Online serving
 ```bash
 python \
     -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server \
@@ -329,15 +245,14 @@ python \
     --trust-remote-code \
     --disable-log-requests \
     --block-size 128 \
-    --kv-transfer-config '{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both", "kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
+    --kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
 ```
 
-#### Offline
+##### Offline
 ```python
 ktc = KVTransferConfig(
-        kv_connector="LMCacheAscendConnectorV1Dynamic",
+        kv_connector="LMCacheAscendConnector",
         kv_role="kv_both",
-        kv_connector_module_path="lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
     )
 ```
 

@@ -192,7 +192,7 @@ case "$MODE" in
     "both")
         echo "Sequentially running all benchmarks..." >&2
 
-        KV_TRANSFER_CONFIG='{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both", "kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
+        KV_TRANSFER_CONFIG='{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
 
         current_port=$BASE_PORT
 

@@ -58,13 +58,11 @@
 def build_llm_with_lmcache(model: str, max_model_len: int = 32000, blend: bool = True):
     """Build LLM with LMCache for offline serving"""
 
-    LMCACHE_CONNECTOR = "LMCacheAscendConnectorV1Dynamic"
-    CONNECTOR_PATH = "lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
+    LMCACHE_CONNECTOR = "LMCacheAscendConnector"
 
     ktc = KVTransferConfig(
         kv_connector=LMCACHE_CONNECTOR,
         kv_role="kv_both",
-        kv_connector_module_path=CONNECTOR_PATH,
     )
 
     llm_args = EngineArgs(