fix: handle CUDA errors by splitting proof requests into single-block chunks #313

DeVikingMark · 2025-01-06T00:27:03Z

Fixes #309

Problem

A panic occurs when using CUDA proofs on the NVIDIA H100 GPU.

Solution

Added specific handling for CUDA errors.
When a CUDA error is detected, the task is split into minimal chunks (1 block each).
The original behavior is preserved for other types of errors.

Testing

Run the proof on a large range of blocks.
When a CUDA error occurs, the system should automatically split the task into minimal chunks.
Check the logs to ensure successful processing of the split chunks.

TimTinkers · 2025-01-06T15:44:35Z

Hi @DeVikingMark, are you sure this addresses #309? In that issue we shared an example which was already attempting to only generate a very tiny span. https://gist.github.com/JossDuff/51ef5757a685943a75eb38968edf0023 and was mocking out the proposer entirely. We opened the issue here instead of in the sp1-sdk repository because we don't know if it's a matter related specifically to our use of OP Stack here.

ratankaliani · 2025-01-06T16:29:41Z

@TimTinkers I'm fairly sure this is a bot account. I'm going to close the PR.

DeVikingMark · 2025-01-07T21:10:38Z

@TimTinkers I'm fairly sure this is a bot account. I'm going to close the PR.

lmao why bot? good maners bro

Update prove.go

aeb5fa2

ratankaliani closed this Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle CUDA errors by splitting proof requests into single-block chunks #313

fix: handle CUDA errors by splitting proof requests into single-block chunks #313

DeVikingMark commented Jan 6, 2025

TimTinkers commented Jan 6, 2025

ratankaliani commented Jan 6, 2025

DeVikingMark commented Jan 7, 2025

fix: handle CUDA errors by splitting proof requests into single-block chunks #313

fix: handle CUDA errors by splitting proof requests into single-block chunks #313

Conversation

DeVikingMark commented Jan 6, 2025

Problem

Solution

Testing

TimTinkers commented Jan 6, 2025

ratankaliani commented Jan 6, 2025

DeVikingMark commented Jan 7, 2025