milvus的GPU向量索引性能不符合预期, 测试效果没有达到宣传的加速效果 #33873

shenguoquan · 2024-06-14T07:40:33Z

shenguoquan
Jun 14, 2024

测试背景
Milvus 2.4版本支持多种 GPU 索引类型，以加速搜索性能和效率，特别是在高吞吐量、低延迟和高召回率的场景中。GPU 加速可以极大地提高 Milvus 的搜索性能和效率，其支持的GPU索引类型如下:
● GPU_CAGRA
● GPU_IVF_FLAT

测试目的
为了验证Milvus官网披露的GPU性能加速在索引构建+索引搜索方面的表现, 目前准备在向量测试数据集Cohere 1M数据量维度在768场景在保证在相同召回率情况下, 测试CPU HNSW, CPU IVF_FLAT, GPU IVF_FLAT, GPU CAGRA各种索引的构建时间, QPS等指标性能。

测试机器

硬件类型	ECS类型	CPU个数	内存	GPU显存	价格(按量付费/h)
T4	ecs.gn6i-c8g1.2xlarge	8core	31GB	16GB	10.506636
CPU	ecs.g7.2xlarge	8core	32GB	---	3.570646

测试环境
Milvus standalone CPU 单机部署资源是8C32GB
Milvus standalone GPU单机部署资源是8C31GB+T4 GPU卡
都是通过docker compose up -d 拉取来的测试环境

测试工具
ann benchmark

测试方案
1.索引构建

索引类型	索引参数	索引构建的时间
CPU HNSW	M: 16efConstruction: 100 距离计算: L2	280.653s
CPU IVF_FLAT	nlist : 128 距离计算: L2	185.640s
GPU CAGRA	intermediate_graph_degree: 64graph_degree: 32build_algo: NN_DESCENT 距离计算: L2	259.978s
GPU IVF_FLAT	nlist : 128 距离计算: L2	89.418

索引查询:
多并发测试
测试的数据集是:Cohere
● 100w数据，768维
● Milvus CPU HNSW索引的构建参数: M: 16 efConstruction: 100
● Milvus GPU CAGRA索引的构建参数: intermediate_graph_degree: 64 graph_degree: 32 build_algo: NN_DESCENT

CPU HNSW：M = 16 efConstruction = 100 ef = 400 性能如下:

并发数	QPS
1并发	240.788
5并发	583.788
10并发	597.017
15并发	610.527

CPU IVF FLAT: nprobe: 32 性能如下:

并发数	QPS
1并发	32.224
5并发	60.729
10并发	58.702

GPU CAGRA: intermediate_graph_degree: 64 graph_degree: 32 build_algo: NN_DESCENT itopk_size:128 search_width:16 性能如下:

并发数	QPS
1并发	388.268
5并发	1388.945
10并发	1534.215
15并发	1692.114

GPU IVF FLAT: nprobe: 32 性能如下:

并发数	QPS
1并发	190.606
5并发	310.934
10并发	321.725

测试结论:
Milvus的索引构建速度GPU CAGRA > CPU HNSW, 但是CAGRA相对HNSW只有提速非常有限。
单从查询性能QPS来看, GPU CAGRA > HNSW。目前从测试数据看, GPU CAGRA的性能表现是HNSW的2.5倍。也远没有达到宣传稿上的加速效果.

Presburger · 2024-06-14T09:00:47Z

Presburger
Jun 14, 2024

is number of query equal to 1?

1 reply

shenguoquan Jun 16, 2024
Author

you mean search batch size ?

liliu-z · 2024-06-14T09:07:40Z

liliu-z
Jun 14, 2024

The bottleneck of index building is Milvus DiskIO instead of index building. And the acceleration in the blog is about pure index building.
You need higher concurrency for GPU, try to promote to 500.

Making the segment size smaller can also help the performance.

6 replies

shenguoquan Jun 18, 2024
Author

The bottleneck of index building is Milvus DiskIO instead of index building. And the acceleration in the blog is about pure index building.

You need higher concurrency for GPU, try to promote to 500.

Making the segment size smaller can also help the performance.

The speedup mentioned in your blog compares the index construction time between the GPU and the CPU. Does it include the data transfer time from memory to GPU memory and back from GPU memory to main memory?

shenguoquan Jun 18, 2024
Author

The bottleneck of index building is Milvus DiskIO instead of index building. And the acceleration in the blog is about pure index building.

You need higher concurrency for GPU, try to promote to 500.

Making the segment size smaller can also help the performance.

Does the speedup mentioned in your blog comparing the GPU's query QPS and the CPU's QPS only record the GPU's kernel query time, or does it compare the end-to-end time?

xiaofan-luan Jun 19, 2024
Maintainer

it is the end to end. If you are user we are glad to help you to tune the performance. If you are a competitor try to analysis milvus maybe you need to figure this out by yourself. thanks

shenguoquan Jun 19, 2024
Author

I am currently a customer using vector databases and I'm interested in setting up an open-source Milvus vector database, particularly because Milvus offers GPU acceleration, which is highly appealing to me. However, despite following the setup instructions on your official website, I am not able to achieve the performance you have advertised. Therefore, I am seeking your assistance

xiaofan-luan Jun 19, 2024
Maintainer

I am currently a customer using vector databases and I'm interested in setting up an open-source Milvus vector database, particularly because Milvus offers GPU acceleration, which is highly appealing to me. However, despite following the setup instructions on your official website, I am not able to achieve the performance you have advertised. Therefore, I am seeking your assistance

please send email [email protected] and we'd like to help offline

Presburger · 2024-06-14T09:08:44Z

Presburger
Jun 14, 2024

For the GPU, which is a high-latency, high-throughput device, we first need to align our recall requirements. Then, we need to increase the level of concurrency. Based on our experience, the concurrency needs to be increased to 64 or even higher.

2 replies

shenguoquan Jun 16, 2024
Author

how to configure? or give me some guide?

xiaofan-luan Jun 16, 2024
Maintainer

you just need more client to issue request

xiaofan-luan · 2024-06-16T13:11:43Z

xiaofan-luan
Jun 16, 2024
Maintainer

may we know more details about your use case? We'd definitely like to offer more help on setup

1 reply

shenguoquan Jun 18, 2024
Author

Milvus standalone CPU

docker compose up -d

docker-compose.yml as followings:
version: '3.5'

services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- /root/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3

minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- /root/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.1
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- /root/volumes/milvus:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
deploy:
resources:
limits:
cpus: '8'
memory: '32g'
depends_on:
- "etcd"
- "minio"

networks:
default:
name: milvus

Milvus standalone GPU:

docker compose up -d

docker-compose.yml:

version: '3.5'

services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- /root/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3

minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- /root/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.1-gpu
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- /root/volumes/milvus:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
deploy:
resources:
limits:
cpus: '8'
memory: '32g'
reservations:
devices:
- driver: nvidia
capabilities: ["gpu"]
device_ids: ["0"]
depends_on:
- "etcd"
- "minio"

networks:
default:
name: milvus

ann benchmark install as followings:
yum install -y python311
python3.11 -m ensurepip --upgrade
python3.11 -m pip --version
pip3.11 install milvus-cli==0.4.2
cd ann-benchmarks
pip3.11 install -r requirements.txt

test as followings:
python3.11 run.py --algorithm milvus-hnsw --dataset cohere-768-euclidean --runs 1 --timeout 990000 --local -k 100
python3.11 run.py --algorithm milvus-cpu-ivfflat --dataset cohere-768-euclidean --runs 1 --timeout 990000 --local -k 100
python3.11 run.py --algorithm milvus-cagra --dataset cohere-768-euclidean --runs 1 --timeout 990000 --local -k 100
python3.11 run.py --algorithm milvus-gpu-ivfflat --dataset cohere-768-euclidean --runs 1 --timeout 990000 --local -k 100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

milvus的GPU向量索引性能不符合预期, 测试效果没有达到宣传的加速效果 #33873

{{title}}

Replies: 4 comments 10 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

milvus的GPU向量索引性能不符合预期, 测试效果没有达到宣传的加速效果 #33873

shenguoquan Jun 14, 2024

Replies: 4 comments · 10 replies

Presburger Jun 14, 2024

shenguoquan Jun 16, 2024 Author

liliu-z Jun 14, 2024

shenguoquan Jun 18, 2024 Author

shenguoquan Jun 18, 2024 Author

xiaofan-luan Jun 19, 2024 Maintainer

shenguoquan Jun 19, 2024 Author

xiaofan-luan Jun 19, 2024 Maintainer

Presburger Jun 14, 2024

shenguoquan Jun 16, 2024 Author

xiaofan-luan Jun 16, 2024 Maintainer

xiaofan-luan Jun 16, 2024 Maintainer

shenguoquan Jun 18, 2024 Author

shenguoquan
Jun 14, 2024

Replies: 4 comments 10 replies

Presburger
Jun 14, 2024

shenguoquan Jun 16, 2024
Author

liliu-z
Jun 14, 2024

shenguoquan Jun 18, 2024
Author

shenguoquan Jun 18, 2024
Author

xiaofan-luan Jun 19, 2024
Maintainer

shenguoquan Jun 19, 2024
Author

xiaofan-luan Jun 19, 2024
Maintainer

Presburger
Jun 14, 2024

shenguoquan Jun 16, 2024
Author

xiaofan-luan Jun 16, 2024
Maintainer

xiaofan-luan
Jun 16, 2024
Maintainer

shenguoquan Jun 18, 2024
Author