Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ jobs:

echo ">>> Installing LMCache Core..."
# TODO (gingfung): use strategy matrix for different version tests
pip install lmcache==0.4.4
pip install lmcache==0.4.5

echo ">>> Building LMCache-Ascend..."
export CPLUS_INCLUDE_PATH=/usr/include/c++/12:/usr/include/c++/12/backward:/usr/include/c++/12/`uname -i`-openEuler-linux/:$CPLUS_INCLUDE_PATH
Expand Down Expand Up @@ -329,7 +329,7 @@ jobs:

echo ">>> Installing LMCache Core..."
# TODO (gingfung): use strategy matrix for different version tests
pip install lmcache==0.4.4
pip install lmcache==0.4.5

echo ">>> Building LMCache-Ascend..."
export CPLUS_INCLUDE_PATH=/usr/include/c++/12:/usr/include/c++/12/backward:/usr/include/c++/12/`uname -i`-openEuler-linux/:$CPLUS_INCLUDE_PATH
Expand Down
137 changes: 26 additions & 111 deletions README.md

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should remove ref to the Dynamic Connector once we made support in the in process one.

Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ LMCache-Ascend is a community maintained plugin for running LMCache on the Ascen

To use LMCache-Ascend on the NPU hardware, please make sure the following prerequisites are satisfied.

- **Hardware**: Atlas 800I A2 Inference series. (A3 Inference/Training and 300I Duo are experimental).
- **Hardware**: Atlas A2/A3 series. (300I Duo are experimental).
- **OS**: Linux-based.
- **Software**:
- **Python**: >= 3.10
- **CANN Toolkit**: >= 8.2.RC1
- **Ascend Driver**: >= 24.1.0
- **PyTorch**: >= 2.7.1
- **vLLM**: >=v0.11.0 & **vLLM-Ascend**: >=v0.11.0
- **vLLM-Ascend**: >= v0.11.0

### Compatibility Matrix

Expand All @@ -47,31 +47,27 @@ Please ensure your environment matches the versions below.
#### For PyTorch / vLLM
| LMCache-Ascend | LMCache | vLLM Version |
| :--- | :--- | :--- |
| **main** | **v0.4.4** | **>=v0.14.0** |
| **v0.4.3** | **v0.4.3** | **>=v0.14.0** |
| **main** | **v0.4.5** | **>=v0.14.0** |
| **v0.4.4** | **v0.4.4** | **>=v0.14.0** |

#### For PyTorch / SGLang
| LMCache-Ascend | LMCache | SGLang Version |
| :--- | :--- | :--- |
| **main** | **v0.4.4** | **0.5.8** |
| **v0.4.3** | **v0.4.3** | **0.5.8** |
| **main** | **v0.4.5** | **0.5.8** |
| **v0.4.4** | **v0.4.4** | **0.5.8** |

#### For MindSpore
| LMCache-Ascend | LMCache | vLLM Version |
| :--- | :--- | :--- |
| **main** | **v0.4.4** | **v0.11.0** |
| **v0.4.3** | **v0.4.3** | **v0.11.0** |

> **Note**: If you require legacy support for vLLM 0.9.2, you must use PyTorch 2.5.1. See the [Compatibility Matrix](#compatibility-matrix) above.
| **main** | **v0.4.5** | **v0.11.0** |
| **v0.4.4** | **v0.4.4** | **v0.11.0** |


## Getting Started

### for vLLM-Ascend

You can choose `Manual Installation` or `Build Docker Image`.

#### Manual Installation
#### Installation
1. Prepare Base Environment

It is recommended to use the official [vLLM-Ascend image](https://quay.io/repository/ascend/vllm-ascend?tab=tags) as a base:
Expand Down Expand Up @@ -107,52 +103,17 @@ quay.io/ascend/vllm-ascend:v0.18.0

- from pip
```bash
NO_CUDA_EXT=1 pip install lmcache==0.4.3
NO_CUDA_EXT=1 pip install lmcache==0.4.4
```

3. Install LMCache-Ascend Repo

```bash
git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
cd LMCache-Ascend
pip install -v --no-build-isolation -e .
```

#### Build Docker Image

Build the image using the provided Dockerfile:
```bash
git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
cd LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.4.3-vllm-ascend-v0.18.0-openeuler .
```

Once that is built, run it with the following cmd
```bash
docker run -it \
--shm-size=200g --privileged --net=host \
--cap-add=SYS_RESOURCE \
--cap-add=IPC_LOCK \
--device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \
--device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \
--device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /usr/bin/hccn_tool:/usr/bin/hccn_tool \
-v /var/log/npu:/var/log/npu \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /etc/localtime:/etc/localtime \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /usr/src/kernels:/usr/src/kernels:ro \
-v /data:/data \
--name lmcache-ascend-test \
--entrypoint /bin/bash \
lmcache-ascend:v0.4.3-vllm-ascend-v0.18.0-openeuler

```

For further info about deployment notes, please refer to the [guide about deployment](docs/deployment.md)

#### Usage
Expand All @@ -169,24 +130,23 @@ vllm serve /data/models/Qwen/Qwen3-32B \
--max-num-batched-tokens 32768 \
--host 0.0.0.0 \
--port 8100 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both","kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'

```

##### Offline
```python
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnectorV1Dynamic",
kv_connector="LMCacheAscendConnector",
kv_role="kv_both",
kv_connector_module_path="lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
)
```

> **Note**: For vllm-ascend versions >=0.17.0rc1, you can specify `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`

### for SGLang

#### Manual Installation
#### Installation
1. Prepare Base Environment

It is recommended to use the official [Ascend SGLang image](https://quay.io/repository/ascend/sglang?tab=tags) as a base:
Expand All @@ -201,13 +161,13 @@ docker run -it --privileged --net=host --name lmcache-sglang-dev quay.io/ascend/

- from pip
```bash
NO_CUDA_EXT=1 pip install lmcache==0.4.3
NO_CUDA_EXT=1 pip install lmcache==0.4.4
```

3. Install LMCache-Ascend Repo

```bash
git clone --recurse-submodules -b v0.4.3 https://github.com/LMCache/LMCache-Ascend.git
git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
cd LMCache-Ascend
pip install -v --no-build-isolation -e .
```
Expand All @@ -229,53 +189,9 @@ python \
--port 8100
```

## Getting Started With MindSpore

### Docker
### for vLLM-MindSpore

1. Clone LMCache-Ascend Repo
Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.

```bash
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
```

2. Build Docker Image
```bash
cd /workspace/LMCache-Ascend
docker build -f docker/mindspore/Dockerfile.a2.openEuler -t lmcache-ascend:v0.4.3-mindspore2.7.1.post1-openeuler .
```

3. Start Container
Once that is built, run it with the following cmd
```bash
docker run -itd \
--shm-size 200g --privileged \
--net=host \
--device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \
--device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \
--device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /var/log/npu/:/var/log/npu \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /lib/modules:/lib/modules:ro \
-v /usr/src/kernels:/usr/src/kernels:ro \
-v /mnt/storage1/data:/data \
-v /home:/home \
--name lmcache-ascend-ms \
--entrypoint /bin/bash \
lmcache-ascend:v0.4.3-mindspore2.7.1.post1-openeuler

docker exec -it -u root lmcache-ascend-ms bash
```

For further info about deployment notes, please refer to the [guide about deployment](docs/deployment.md)

### Manual Installation
#### Installation

1. Start the base container
```bash
Expand Down Expand Up @@ -305,22 +221,22 @@ docker exec -it -u root lmcache-ascend-ms bash
2. Install LMCache

```bash
NO_CUDA_EXT=1 pip install lmcache==0.4.3 --no-deps
NO_CUDA_EXT=1 pip install lmcache==0.4.4 --no-deps
```

3. Install LMCache-Ascend

```bash
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
git clone --recurse-submodules -b v0.4.4 https://github.com/LMCache/LMCache-Ascend.git
cd LMCache-Ascend
USE_MINDSPORE=1 pip install -r requirement_ms.txt --no-build-isolation -v -e .
```

### Usage
#### Usage

We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
We use the in-process ``LMCacheAscendConnector`` from vllm-ascend (patched at ``import lmcache_ascend``). Configure KV transfer as follows.

#### Online serving
##### Online serving
```bash
python \
-m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server \
Expand All @@ -329,15 +245,14 @@ python \
--trust-remote-code \
--disable-log-requests \
--block-size 128 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both", "kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
```

#### Offline
##### Offline
```python
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnectorV1Dynamic",
kv_connector="LMCacheAscendConnector",
kv_role="kv_both",
kv_connector_module_path="lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
)
```

Expand Down
2 changes: 1 addition & 1 deletion benchmark/v1/rag/online.sh
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ case "$MODE" in
"both")
echo "Sequentially running all benchmarks..." >&2

KV_TRANSFER_CONFIG='{"kv_connector":"LMCacheAscendConnectorV1Dynamic","kv_role":"kv_both", "kv_connector_module_path":"lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"}'
KV_TRANSFER_CONFIG='{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'

current_port=$BASE_PORT

Expand Down
4 changes: 1 addition & 3 deletions benchmark/v1/rag/rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,11 @@
def build_llm_with_lmcache(model: str, max_model_len: int = 32000, blend: bool = True):
"""Build LLM with LMCache for offline serving"""

LMCACHE_CONNECTOR = "LMCacheAscendConnectorV1Dynamic"
CONNECTOR_PATH = "lmcache_ascend.integration.vllm.lmcache_ascend_connector_v1"
LMCACHE_CONNECTOR = "LMCacheAscendConnector"

ktc = KVTransferConfig(
kv_connector=LMCACHE_CONNECTOR,
kv_role="kv_both",
kv_connector_module_path=CONNECTOR_PATH,
)

llm_args = EngineArgs(
Expand Down
Loading