You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /usr/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory
The text was updated successfully, but these errors were encountered:
@rjzamora Any idea what could be happening here? I know you've been putting in some work on Categorify. I think this is happening during the compute of all uniques, which we may want to allow as an input into the op since it's a relatively straightforward piece of information to pull from a data lake.
I suppose there are many possibilities, depending on if the failure happens in the fit or the transform. For example, #1692 explains two reasons why the fit could be a problem with the current implementation (lack of a "proper" tree reduction, and the requirement to write all uniques for a given column to disk at once).
@bschifferer - I'd like to explore if #1692 (or some variation of it) can help with this. Can you share details about the system you are running on and a representative/toy dataset where you are seeing issues? (feel free to contact me offline about the dataset)
Describe the bug
I tried multiple workflows and run into different issues when I run on multi-GPU setup running NVTabular workflows on large datasets.
Error 1: Workers just die one after one
Characteristic:
Error 2: Run into OOM
Workflow:
Characteristics:
The text was updated successfully, but these errors were encountered: