GitHub - marcobarlo/LMCache-Ascend: LMCache on Ascend

LMCache-Ascend Plugin

| About Ascend | LMCache Blog | Documentation | Slack

Overview

LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.

Prerequisites

To use LMCache-Ascend on the NPU hardware, please make sure the following prerequisites are satisfied.

Hardware: Atlas 800I A2 Inference series. The rest of the series like A3 Inference/Training and 300I Duo are experimental.
OS: Linux-based.
Software:
- Python: >= 3.10, <= 3.11
- CANN Toolkit: >= 8.2rc1
- Ascend Driver: >= 24.1
- PyTorch: == 2.5.1, Torch-npu: == 2.5.1.post1.dev20250619
- vLLM: v0.9.2 & vLLM-Ascend: v0.9.2rc1

Getting Started

Clone LMCache-Ascend Repo

Our repo contains a kvcache ops submodule for ease of maintainence, therefore we recommend cloning the repo with submodules.

cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git

Docker

cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.3-vllm-ascend-v0.9.2rc1-910b-cann-8.2rc1-py3.11-openeuler-22.03 .

Once that is built, run it with the following cmd

DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
    --privileged \
    --cap-add=SYS_RESOURCE \
    --cap-add=IPC_LOCK \
    --port 8000:8000 \
    --port 8001:8001 \
    --name lmcache-ascend-dev \
    -e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_TOTAL_MEMORY_GB=32 \
    -e VLLM_TARGET_DEVICE=npu \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /etc/localtime:/etc/localtime \
    -v /var/log/npu:/var/log/npu \
    -v /dev/davinci_manager:/dev/davinci_manager \
    -v /dev/devmm_svm:/dev/devmm_svm \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /etc/hccn.conf:/etc/hccn.conf \
    lmcache-ascend:v0.3.3-vllm-ascend-v0.9.2rc1-910b-cann-8.2rc1-py3.11-openeuler-22.03 \
    /bin/bash

For further info about deployment notes, please refer to the guide about deployment

Manual Installation

Assuming your working directory is /workspace.

Clone and Install vLLM Repo

VLLM_REPO=https://github.com/vllm-project/vllm.git
VLLM_TAG=v0.9.2
git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /workspace/vllm
# NOTE: There is an Ascend Triton but we don't currently support it properly.
VLLM_TARGET_DEVICE="empty" python3 -m pip install -e /workspace/vllm/ --extra-index https://download.pytorch.org/whl/cpu/ && \
    python3 -m pip uninstall -y triton

Clone and Install vLLM Ascend Repo

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

VLLM_ASCEND_REPO=https://github.com/vllm-project/vllm-ascend.git
VLLM_ASCEND_TAG=v0.9.2rc1
git clone --depth 1 $VLLM_ASCEND_REPO --branch $VLLM_ASCEND_TAG /workspace/vllm-ascend
# apply patch to v0.9.2rc1
cd /workspace/vllm-ascend && \
    git apply -p1 /workspace/LMCache-Ascend/docker/kv-connector-v1.diff

export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
python3 -m pip install -v -e /workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/

Clone and Install LMCache Repo

LMCACHE_REPO=https://github.com/LMCache/LMCache.git
LMCACHE_TAG=v0.3.3
git clone --depth 1 $LMCACHE_REPO --branch $LMCACHE_TAG /workspace/LMCache
# our build is based on arm64
sed -i "s/^infinistore$/infinistore; platform_machine == 'x86_64'/" /workspace/LMCache/requirements/common.txt
export NO_CUDA_EXT=1 && python3 -m pip install -v -e /workspace/LMCache

Install LMCache-Ascend Repo

cd /workspace/LMCache-Ascend
python3 -m pip install -v --no-build-isolation -e .

Usage

We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.

Online serving

python \
    -m vllm.entrypoints.openai.api_server \
    --port 8100 \
    --model /data/models/Qwen/Qwen3-32B \
    --trust-remote-code \
    --disable-log-requests \
    --block-size 128 \
    --kv-transfer-config '{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both", "kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'

Offline

ktc = KVTransferConfig(
        kv_connector="LMCacheAscendConnectorV1Dynamic",
        kv_role="kv_both",
        kv_connector_module_path="lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
    )

FAQ

Why do I have HostRegisterError ?

If you encounter the Host Register Error within a container environment, please make sure you add the IPC_LOCK capabilities.
Otherwise, please check your driver version is >= 24.0

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
csrc		csrc
docker		docker
docs		docs
examples/blending		examples/blending
lmcache_ascend		lmcache_ascend
tests		tests
third_party		third_party
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirement.txt		requirement.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LMCache-Ascend Plugin

Overview

Prerequisites

Getting Started

Clone LMCache-Ascend Repo

Docker

Manual Installation

Usage

Online serving

Offline

FAQ

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LMCache-Ascend Plugin

Overview

Prerequisites

Getting Started

Clone LMCache-Ascend Repo

Docker

Manual Installation

Usage

Online serving

Offline

FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages