feat: speedup task runtime with cuML #190

valenzuelaomar · 2025-05-02T21:53:47Z

Resolves #238

This pull request introduces GPU acceleration support for tasks using cuML, along with related updates to the codebase and documentation. The changes include adding installation instructions, updating dependencies, and enabling GPU acceleration if available.

GPU Acceleration Support:

Documentation Update: Added a section in README.md explaining how to install and enable GPU acceleration using cuML for improved performance. Includes installation steps for both source and PyPI installations.
Optional Dependency: Added a new optional dependency group gpu in pyproject.toml that includes cuml-cu12==25.4.*.
GPU Initialization: Introduced _enable_gpu_acceleration() in src/czbenchmarks/__init__.py to initialize GPU acceleration with cuML if available, logging the status of GPU support

Benchmarks (regular sklearn vs gpu-accelerated sklearn)

image is from: https://developer.nvidia.com/blog/nvidia-cuml-brings-zero-code-change-acceleration-to-scikit-learn/#benchmarks

How this works

Tests

Ran with regular installation

uv pip install -e .
czbenchmarks run --models SCGPT --scgpt-model-variant human --datasets tsv2_bladder --tasks clustering --clustering-task-label-key cell_type

Ran with regular installation + GPU acceleration

uv pip install -e ".[gpu]"
czbenchmarks run --models SCGPT --scgpt-model-variant human --datasets tsv2_bladder --tasks clustering --clustering-task-label-key cell_type

Known Limitations

cuML automatically accelerates compatible components on NVIDIA GPUs and falls back to CPU execution for unsupported operations.
https://docs.rapids.ai/api/cuml/stable/zero-code-change-limitations/

src/czbenchmarks/__init__.py

README.md

steveherrin · 2025-05-06T15:44:28Z

For what it's worth, this did not work for me. It was erroring out on the import cuml.accel line with an OSError due to not finding libcudart.so (the code is only catching ImportErrors). I had to sudo apt install nvidia-cuda-toolkit (on ubuntu) to get it to work properly.

valenzuelaomar · 2025-05-06T16:13:06Z

For what it's worth, this did not work for me. It was erroring out on the import cuml.accel line with an OSError due to not finding libcudart.so (the code is only catching ImportErrors). I had to sudo apt install nvidia-cuda-toolkit (on ubuntu) to get it to work properly.

Thanks for reporting! In the README.md I did specify to install the nvidia-cuda-toolkit, but not in the PR description so I apologize for that

steveherrin · 2025-05-06T16:37:30Z

Oops, I see it now. I somehow missed it 😓

steveherrin · 2025-05-06T17:55:58Z

Running the labeling task using cached embeddings for one model (UCE 4l) on one tissues and with --set-baseline, on a g4dn.8xlarge instance, with the acceleration:

real    47m9.206s
user    43m26.473s
sys     4m3.370s

without the acceleration (so without these changes):

real    40m43.173s
user    198m47.306s
sys     1m51.516s

so no real speedup

mlgill · 2025-05-07T20:14:33Z

Running the labeling task using cached embeddings for one model (UCE 4l) on one tissues and with --set-baseline, on a g4dn.8xlarge instance, with the acceleration:
real    47m9.206s
user    43m26.473s
sys     4m3.370s
without the acceleration (so without these changes):
real    40m43.173s
user    198m47.306s
sys     1m51.516s
so no real speedup

I haven't tried the cuml accelerator yet, so I don't know specifics. But in general, needs to be a large amount of data to offset the time to move data on and off of GPU. My guess is that UCE-4l embeddings aren't enough. UCE-33l embeddings might show acceleration with GPU.

There is some dependency on GPU and on specific ML algorithm too.

@valenzuelaomar Maybe the documentation should be updated ot indicate that GPU acceleration can vary based on amout of data, algorithm, and type of GPU?

valenzuelaomar · 2025-05-07T21:18:25Z

Running the labeling task using cached embeddings for one model (UCE 4l) on one tissues and with --set-baseline, on a g4dn.8xlarge instance, with the acceleration:
real    47m9.206s
user    43m26.473s
sys     4m3.370s
without the acceleration (so without these changes):
real    40m43.173s
user    198m47.306s
sys     1m51.516s
so no real speedup
I haven't tried the cuml accelerator yet, so I don't know specifics. But in general, needs to be a large amount of data to offset the time to move data on and off of GPU. My guess is that UCE-4l embeddings aren't enough. UCE-33l embeddings might show acceleration with GPU.

There is some dependency on GPU and on specific ML algorithm too.

@valenzuelaomar Maybe the documentation should be updated ot indicate that GPU acceleration can vary based on amout of data, algorithm, and type of GPU?

@mlgill you're spot on about there needing to be a large amount of data to see the gpu-acceleration benefits. I think updating documentation to reference that is a good idea

valenzuelaomar added 2 commits May 2, 2025 12:17

install and use cuML

8644db5

update readme and logging

fee0dee

valenzuelaomar changed the title ~~feat: speedup tasks by >50x with cuML~~ feat: speedup tasks runtime with cuML May 2, 2025

valenzuelaomar added 3 commits May 2, 2025 15:26

move to tasks/__init__.py

a9e924c

move to __init__.py

483dcef

fix

b35f26b

valenzuelaomar changed the title ~~feat: speedup tasks runtime with cuML~~ feat: speedup task runtime with cuML May 2, 2025

valenzuelaomar commented May 2, 2025

View reviewed changes

src/czbenchmarks/__init__.py Show resolved Hide resolved

atolopko-czi reviewed May 5, 2025

View reviewed changes

README.md Show resolved Hide resolved

valenzuelaomar added 2 commits May 5, 2025 17:06

pr feedback

878a56d

Update README.md

2c8cfec

valenzuelaomar requested a review from atolopko-czi May 6, 2025 00:08

Merge branch 'main' into add-gpu-acceleration

1204a88

atolopko-czi approved these changes May 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: speedup task runtime with cuML #190

feat: speedup task runtime with cuML #190

valenzuelaomar commented May 2, 2025 •

edited by atolopko-czi

Loading

Uh oh!

Uh oh!

Uh oh!

steveherrin commented May 6, 2025 •

edited

Loading

Uh oh!

valenzuelaomar commented May 6, 2025

Uh oh!

steveherrin commented May 6, 2025

Uh oh!

steveherrin commented May 6, 2025 •

edited

Loading

Uh oh!

mlgill commented May 7, 2025

Uh oh!

valenzuelaomar commented May 7, 2025

Uh oh!

Uh oh!

feat: speedup task runtime with cuML #190

Are you sure you want to change the base?

feat: speedup task runtime with cuML #190

Conversation

valenzuelaomar commented May 2, 2025 • edited by atolopko-czi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GPU Acceleration Support:

Benchmarks (regular sklearn vs gpu-accelerated sklearn)

How this works

Tests

Known Limitations

Uh oh!

Uh oh!

Uh oh!

steveherrin commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valenzuelaomar commented May 6, 2025

Uh oh!

steveherrin commented May 6, 2025

Uh oh!

steveherrin commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlgill commented May 7, 2025

Uh oh!

valenzuelaomar commented May 7, 2025

Uh oh!

Uh oh!

valenzuelaomar commented May 2, 2025 •

edited by atolopko-czi

Loading

steveherrin commented May 6, 2025 •

edited

Loading

steveherrin commented May 6, 2025 •

edited

Loading