Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CUDA] Generate device stubs #1332

Merged
merged 1 commit into from
Feb 12, 2025
Merged

[CIR][CUDA] Generate device stubs #1332

merged 1 commit into from
Feb 12, 2025

Conversation

AdUhTkJm
Copy link
Contributor

Now we're able to generate device stubs.

A simple explanation:

We first store function arguments inside a void* args[], which shall be passed into cudaLaunchKernel.

Then we retrieve configuration using __cudaPopCallConfiguration, popping the config pushed by callsite. (We can't generate calls to kernels currently.)

Now we have enough arguments. Invoke cudaLaunchKernel and we're OK.

clang/lib/CIR/CodeGen/CIRGenFunction.h Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenFunction.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenFunction.cpp Outdated Show resolved Hide resolved
// Now emit the call to cudaLaunchKernel
// cudaError_t cudaLaunchKernel(const void *func, dim3 gridDim, dim3 blockDim,
// void **args, size_t sharedMem,
// cudaStream_t stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could have a ... = cir.cuda.setup_device_stub <name>, args, dim3_ty = <some_type>, stream_ty = <some_type2> ... that will hide both __cudaPopCallConfiguration and cudaLaunchKernel calls. This will then be expanded in LoweringPrepare to these calls (so we don't have to postpone this to LLVMLowering).

However, I'd rather see you adding this as-is first (after the other comment about OG is addressed) and in a follow up PR we can raise the representation and move it to LoweringPrepare.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar things also happen at call site. Shall we also generate a cir.cuda.call_kernel for that and expand in LoweringPrepare?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which callsite do you mean? not the call happening in the device stub? if not in the device host, does it not call __cudaPopCallConfiguration to retrieve the dims? Perhaps we should have a bit more of direct CIRGen to have a better grasp of uses of these internal functions before we raise them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the place where we invoke the kernel, for example in main we can write global_fn<<<1, 1>>>(a, b, c). This is the place where we call __cudaPushCallConfiguration for device stub to pop. I guess I'll directly generate them and adjust according to review.

clang/test/CIR/CodeGen/CUDA/simple.cu Show resolved Hide resolved
Copy link

github-actions bot commented Feb 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@bcardosolopes bcardosolopes merged commit e342308 into llvm:main Feb 12, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants