Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random_walk_cuda is causing an illegal memory access #176

Open
ProfDoof opened this issue May 12, 2023 · 7 comments
Open

random_walk_cuda is causing an illegal memory access #176

ProfDoof opened this issue May 12, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@ProfDoof
Copy link

ProfDoof commented May 12, 2023

Hi,

When running the following code, I get an illegal memory access error with the following graph. I am not sure why and do not understand the algorithm or C++ well enough to track it down. I do not get the error when I set device to 'cpu'.

I'm using the nightly build of pyg installed through a locally built conda package, and version 1.6.1 of PyTorch-cluster.

from torch_geometric.data import Data
from torch_geometric.utils import to_networkx
from networkx.drawing.nx_agraph import write_dot
import torch
new_node_ids = [x for x in range(7)]
sources = [
    0, 1, 2, 2, 4, 5,
]

targets = [
    1, 2, 3, 4, 5, 2,
]

data = Data(torch.tensor(new_node_ids), torch.tensor([sources, targets]))
data.num_nodes = 7

write_dot(to_networkx(data), 'test_test.dot')

device = 'cuda'
rowptr, col, perm = data.to(device).csr()
rowptr, col = rowptr[None], col[None]

print(rowptr, col)
start_indices = torch.arange(0, data.num_nodes, dtype=torch.long).flatten().to(device)

print(torch.ops.torch_cluster.random_walk(rowptr, col, start_indices,
                                 10, 2, 4))

EDIT:

Here's the error I get

tensor([0, 1, 2, 4, 4, 5, 6, 6], device='cuda:0') tensor([1, 2, 3, 4, 5, 2], device='cuda:0')
Traceback (most recent call last):
  File "/home/john/Research/EmbeddingGraphs/cfg2vec/gnn/test.py", line 25, in <module>
    print(torch.ops.torch_cluster.random_walk(rowptr, col, start_indices,
  File "/home/john/mambaforge/envs/gnn/lib/python3.9/site-packages/torch/_ops.py", line 503, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@rusty1s
Copy link
Owner

rusty1s commented May 12, 2023

This seems to be currently failing because node 6 is an isolated node, so data.num_nodes = 6 should fix this.

@ProfDoof
Copy link
Author

This is a minimum example, the actual graph is more complicated and I can't remove the isolated nodes. Also, this doesn't fail for any other values of p or q. It also only happens in the CUDA version, not the CPU version. All that being said, I'm not sure what exactly is going on.

@ProfDoof
Copy link
Author

@rusty1s just wanted to check if you had the chance to see this yet this evening.

@rusty1s
Copy link
Owner

rusty1s commented May 13, 2023

Will take a look soon.

@jsun57
Copy link

jsun57 commented Jul 23, 2023

Wondering if there are any updates on this issue.

@rusty1s
Copy link
Owner

rusty1s commented Jul 24, 2023

Not yet, sorry for the delay.

Copy link

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

@github-actions github-actions bot added the stale label Jan 21, 2024
@rusty1s rusty1s added bug Something isn't working and removed stale labels Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants