Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask-Cuda running out of memory with Cupy Sobel #1250

Open
a1234jehad opened this issue Sep 28, 2023 · 8 comments
Open

Dask-Cuda running out of memory with Cupy Sobel #1250

a1234jehad opened this issue Sep 28, 2023 · 8 comments

Comments

@a1234jehad
Copy link

Greetings, I have the following problem. I know that Cuda-Dask is still experimental and I tried looking at the Docus to solve the issue, however, nothing seems to work.

https://stackoverflow.com/questions/77186393/dask-cuda-running-out-of-memory-with-cupy-sobel

Thanks in advance.

@wence-
Copy link
Contributor

wence- commented Sep 28, 2023

from_array in your example has to build the full 40GiB array on the client side, that is, as a single allocation, and then move it to the workers. Since your GPU has 24GiB of RAM, this fails. Where do your input data come from? Best practice is to generate the chunked array directly on the workers.

@a1234jehad
Copy link
Author

a1234jehad commented Sep 28, 2023

from_array in your example has to build the full 40GiB array on the client side, that is, as a single allocation, and then move it to the workers. Since your GPU has 24GiB of RAM, this fails. Where do your input data come from? Best practice is to generate the chunked array directly on the workers.

The data comes from a file as np array then I convert it to cupy array with cp.asarray(file). The program works fine with smaller datasets and Larger than memroy for CPU but not for larger then memory for GPU. "Best practice is to generate the chunked array directly on the workers" can you give an example of how to do that?

@wence-
Copy link
Contributor

wence- commented Sep 28, 2023

"Best practice is to generate the chunked array directly on the workers" can you give an example of how to do that?

Use one of the creation mechanisms that don't materialise the data on the client. e.g. if the data are in an HDF5 file, open the HDF5 file and use from_array on that. Or if they are in a format zarr understands, use from_zarr.

Can you post a minimal failing example here so we can see what a likely route might be.

@a1234jehad
Copy link
Author

a1234jehad commented Sep 28, 2023

"Best practice is to generate the chunked array directly on the workers" can you give an example of how to do that?

Use one of the creation mechanisms that don't materialise the data on the client. e.g. if the data are in an HDF5 file, open the HDF5 file and use from_array on that. Or if they are in a format zarr understands, use from_zarr.

Can you post a minimal failing example here so we can see what a likely route might be.

Sure here is the minimal example:


import segyio
import dask.array as da
import dask.distributed
import numpy as np
import cupy as cp
from cupyx.scipy.ndimage import sobel
from dask_cuda import LocalCUDACluster

def edge_detection_sobel_gpu(data):
            sobel_x = sobel(data, axis=0)
            sobel_y = sobel(data, axis=1)
            sobel_z = sobel(data, axis=2)
            edge_magnitude = cp.sqrt(sobel_x**2 + sobel_y**2 + sobel_z**2)
            return edge_magnitude


def load_segy_data(seis_file):
        with segyio.open(seis_file,) as f:
            data = cp.asarray(segyio.tools.cube(f))  
            return da.from_array(data , chunks=(256, 256, 256))  # Create a Dask array from CuPy array

#Cluster Setup
cluster = LocalCUDACluster(device_memory_limit=0.7)
client = dask.distributed.Client(cluster)
client.run(cp.cuda.set_allocator)

seis_file = "example_file.sgy"
40GB_CP_ARRAY_DATA  = load_segy_data(seis_file) # data shape is around (1719, 2527, 1276)



with dask.config.set({"array.backend": "cupy"}):

            array_on_workers = client.scatter(40GB_CP_ARRAY_DATA ) # to solve "Valueerror: bytes object is too large dask Issue"

            edge_magnitude = da.map_blocks(edge_detection_sobel_gpu, 
            array_on_workers,dtype=cp.float32)

            edge_magnitude = edge_magnitude.persist()

            dask.distributed.wait(edge_magnitude)

            final_data = edge_magnitude.compute()

@wence-
Copy link
Contributor

wence- commented Sep 28, 2023

OK, in this case, I recommend using segysak to load the SEG-Y files and then interface with dask. They have an example doing just this in their documentation, which should also work with LocalCUDACluster and your approach of setting the array backend to cupy.

@a1234jehad
Copy link
Author

a1234jehad commented Oct 2, 2023

Greetings, sadly we are not allowed to use SEGYSAK in our org... I also tried data in .npy file, but I still have the same issue. I suspect that
array_on_workers = client.scatter(seis_data)
is causing some issues or something... However, if I remove it I get "Valueerror: bytes object is too large "

Here is the new update to the code:

seis_np_file = "binary_segy.npy"
40GB_CP_ARRAY_DATA   = da.from_array(np.memmap(seis_np_file,dtype=np.float32,shape=(1719, 2527, 1276)),chunks=(256,256,256),asarray=cp.asarray)

the full code now:

import dask.array as da
import dask.distributed
import numpy as np
import cupy as cp
from cupyx.scipy.ndimage import sobel
from dask_cuda import LocalCUDACluster

def edge_detection_sobel_gpu(data):
            sobel_x = sobel(data, axis=0)
            sobel_y = sobel(data, axis=1)
            sobel_z = sobel(data, axis=2)
            edge_magnitude = cp.sqrt(sobel_x**2 + sobel_y**2 + sobel_z**2)
            return edge_magnitude


#Cluster Setup
cluster = LocalCUDACluster(device_memory_limit=0.7)
client = dask.distributed.Client(cluster)
client.run(cp.cuda.set_allocator)


seis_np_file = "binary_segy.npy"
40GB_CP_ARRAY_DATA  = da.from_array(np.memmap(seis_np_file,dtype=np.float32,shape=(1719, 2527, 1276)),chunks=(256,256,256),asarray=cp.asarray)



with dask.config.set({"array.backend": "cupy"}):

            array_on_workers = client.scatter(40GB_CP_ARRAY_DATA ) # to solve "Valueerror: bytes object is too large dask Issue"

            edge_magnitude = da.map_blocks(edge_detection_sobel_gpu, 
            array_on_workers,dtype=cp.float32)

            final_data = edge_magnitude.compute()

@wence-
Copy link
Contributor

wence- commented Oct 2, 2023

I am still a little confused, da.from_array delivers an array on the workers, no? Why are you then doing client.scatter with it?

@a1234jehad
Copy link
Author

a1234jehad commented Oct 2, 2023

Sorry for the confusion, I thought so... but as I mentioned earlier I kept getting the following error:

  File "msgpack/_packer.pyx", line 286, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 289, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 258, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 225, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 196, in msgpack._cmsgpack.Packer._pack
ValueError: bytes object is too large

I found a solution online with the client.scatter but I got another issue which is running out of memory the previous code was

with dask.config.set({"array.backend": "cupy"}):

            edge_magnitude = 40GB_CP_ARRAY_DATA.map_blocks(edge_detection_sobel_gpu ,dtype=cp.float32)

            final_data = edge_magnitude.compute()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants