Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with data too large for a single buffer #6138

Open
wants to merge 15 commits into
base: trunk
Choose a base branch
from

Conversation

alphastrata
Copy link

@alphastrata alphastrata commented Aug 20, 2024

Connections
Link to the issues addressed by this PR, or dependent PRs in other repositories
discussion
thread on matrix

Description
The aim of this new example is to demonstrate taking a large input dataset, splitting it into chunks for the purpose of moving it onto the GPU, but then treating it as a single contiguous data structure once on the GPU.

Testing
Explain how this change is tested.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@alphastrata alphastrata marked this pull request as ready for review August 20, 2024 22:28
@alphastrata alphastrata requested a review from a team as a code owner August 20, 2024 22:28
@alphastrata alphastrata changed the title DRAFT: dealing with data too large for a single buffer dealing with data too large for a single buffer Aug 22, 2024
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait time for a review!

Frankly as it exists right now, we cannot accept this example. While it physically shows one strategy for dealing with large data sets, after reading it, the user doesn't get a good idea of why that strategy should be used and what problems they are avoiding, compared to the more naive strategy of using larger and larger buffers. Through inline code comments and verbiage in the readme, the reader who has no idea about any of these topics (or even the details of memory allocation) should be able to understand why this is an effective strategy to utilize.

Some things I think it should touch on:

  • Large buffers may fail to allocate due to fragmentation
  • Growing/shrinking a dataset with a buffer system requires copying the entire buffer contents, whereas pagenated data just requires rebuilding a bind group.

I'm not going to close this, as I do think this can be transformed into something that would be great to have.

Added a few incidental comments.

@@ -0,0 +1 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty file? This example definitely needs tests

Comment on lines +14 to +18
As the maximum supported buffer size varies wildly per system, when you try to run this, then when it will likely fail, in-which-case read the error and update these `const`s accordingly:
>`src/big_compute_buffers/mod.rs`
```rust
const MAX_BUFFER_SIZE: u64 = 1 << 27; // 134_217_728 // 134MB
const MAX_DISPATCH_SIZE: u32 = (1 << 16) - 1; // 65_535
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults should work everywhere, they're the minimum required by webgpu.

@alphastrata
Copy link
Author

Sorry for the long wait time for a review!

Frankly as it exists right now, we cannot accept this example. While it physically shows one strategy for dealing with large data sets, after reading it, the user doesn't get a good idea of why that strategy should be used and what problems they are avoiding, compared to the more naive strategy of using larger and larger buffers. Through inline code comments and verbiage in the readme, the reader who has no idea about any of these topics (or even the details of memory allocation) should be able to understand why this is an effective strategy to utilize.

Some things I think it should touch on:

  • Large buffers may fail to allocate due to fragmentation
  • Growing/shrinking a dataset with a buffer system requires copying the entire buffer contents, whereas pagenated data just requires rebuilding a bind group.

I'm not going to close this, as I do think this can be transformed into something that would be great to have.

Added a few incidental comments.

Cheers, I'll keep working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants