Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm Sparse Marlin Kernels #1206

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Conversation

petrex
Copy link
Collaborator

@petrex petrex commented Oct 31, 2024

[WIP] Built on top pf #1201. This pull request introduces support for ROCm (Radeon Open Compute) for sparse marling kernel in addition to CUDA, enabling the code to run on AMD GPUs.

The main changes involve conditional compilation to handle differences between CUDA and ROCm, as well as adding ROCm-specific intrinsics for MI300x.

co-author : @lcskrishna


Key changes include:

ROCm Support in setup.py:

  • hip kernels generation

Conditional Compilation in CUDA Source Files:

  • Added conditional compilation directives to exclude certain code for ROCm and include ROCm-specific implementations.

ROCm-specific Implementations:

  • Implemented ROCm-specific versions of functions and macros that are different from their CUDA counterparts, ensuring compatibility and performance on AMD GPUs.

Next:

  • validation and benchmark across workloads on MIxxx GPUs

Copy link

pytorch-bot bot commented Oct 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1206

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures

As of commit 00bc94d with merge base ce4822b (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2024
@msaroufim msaroufim requested review from msaroufim and removed request for msaroufim November 2, 2024 22:51
@@ -46,9 +46,11 @@ def read_version(file_path="version.txt"):
CUDAExtension,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might enjoy stack based PR development https://github.com/modularml/stack-pr

@@ -19,6 +19,28 @@
#include "base.h"

namespace torchao {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xw285cornell looking for some quick advice, do you recommend we support AMD by adding conditional compilation flags to our existing cuda kernels or be OK with some more copy paste?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline and indeed ifdefs are the way to go

@msaroufim
Copy link
Member

Do you have performance numbers by any chance relative to fp16? wanna make sure the performance improvements are competitive with CUDA

@petrex
Copy link
Collaborator Author

petrex commented Nov 5, 2024

Do you have performance numbers by any chance relative to fp16? wanna make sure the performance improvements are competitive with CUDA

still WIP, but would you share the benchmark you guys are using? will try that on mi300x when the PR is ready.

@msaroufim
Copy link
Member

Ok holler at me again whenever you need a review. Really excited to see this land

@drisspg
Copy link
Contributor

drisspg commented Nov 5, 2024

For benchmarking it is a little ad hoc the best place for this today would be to verify on: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/generate.py

@jcaip jcaip mentioned this pull request Nov 11, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants