Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle CUDA errors by splitting proof requests into single-block chunks #313

Closed
wants to merge 1 commit into from

Conversation

DeVikingMark
Copy link

Fixes #309

Problem

A panic occurs when using CUDA proofs on the NVIDIA H100 GPU.

Solution

  • Added specific handling for CUDA errors.
  • When a CUDA error is detected, the task is split into minimal chunks (1 block each).
  • The original behavior is preserved for other types of errors.

Testing

  1. Run the proof on a large range of blocks.
  2. When a CUDA error occurs, the system should automatically split the task into minimal chunks.
  3. Check the logs to ensure successful processing of the split chunks.

@TimTinkers
Copy link

Hi @DeVikingMark, are you sure this addresses #309? In that issue we shared an example which was already attempting to only generate a very tiny span. https://gist.github.com/JossDuff/51ef5757a685943a75eb38968edf0023 and was mocking out the proposer entirely. We opened the issue here instead of in the sp1-sdk repository because we don't know if it's a matter related specifically to our use of OP Stack here.

@ratankaliani
Copy link
Member

@TimTinkers I'm fairly sure this is a bot account. I'm going to close the PR.

@DeVikingMark
Copy link
Author

@TimTinkers I'm fairly sure this is a bot account. I'm going to close the PR.

lmao why bot? good maners bro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

local cuda proving "unknown error"
3 participants