Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fps, Trying to create tensor with negative dimension #125

Open
etaoxing opened this issue Mar 25, 2022 · 6 comments
Open

fps, Trying to create tensor with negative dimension #125

etaoxing opened this issue Mar 25, 2022 · 6 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@etaoxing
Copy link

I'm training a PointNet++, based on https://github.com/pyg-team/pytorch_geometric/blob/master/examples/pointnet2_classification.py. I'm getting a similar error as this issue:

File "(..)/train_pointnet2.py", line 33, in forward
  idx = fps(pos, batch, ratio=self.ratio)
File "(..)/python3.7/site-packages/torch_cluster/fps.py", line 70, in fps
  return torch.ops.torch_cluster.fps(src, ptr, r, random_start)
RuntimeError: Trying to create tensor with negative dimension -2334829709134659584: [-2334829709134659584]

As a temporary fix, I've added a try/except repeat around the fps(...) call.

Environment

  • OS: Docker, Ubuntu 18.04
  • Python version: 3.7.12
  • PyTorch version: 1.9.1+cu111
  • pytorch_geometric version: 2.0.4
  • pytorch_cluster version: 1.5.9
@rusty1s
Copy link
Owner

rusty1s commented Mar 25, 2022

Thanks for reporting. How does the try/except block resolves this issue? Do you make use of a custom dataset?

@rusty1s rusty1s added the help wanted Extra attention is needed label Mar 25, 2022
@rusty1s rusty1s self-assigned this Mar 25, 2022
@etaoxing
Copy link
Author

etaoxing commented Mar 25, 2022

How does the try/except block resolves this issue?

I added:

        n_tries = 0
        while n_tries < 3:
            try:
                idx = fps(pos, batch, ratio=self.ratio)
                break
            except Exception as e:
                import pickle
                pickle.dump((x, pos, batch), open(f"fps_err_{n_tries}.pkl", "wb"))
                print(e)
            n_tries += 1

Tensors are on the GPU. My guess is that fps(..., random_start=True) is causing this issue.

Do you make use of a custom dataset?

Yes, I am training reinforcement learning algos, so not using data/ModelNet10. This error occurs sporadically when training. I'm able to run pytorch3d.ops.sample_farthest_points(...) elsewhere in the code on the same point cloud data, so I don't think there's an issue with the point cloud. Attaching fps_err_0.pkl.zip.

@rusty1s
Copy link
Owner

rusty1s commented Mar 26, 2022

The following code works fine for me, super weird :(

import pickle

from torch_cluster import fps

with open('fps_err_0.pkl', 'rb') as f:
    data = pickle.load(f)
x, pos, batch = data
print(pos.shape, batch.shape)

for random_start in [True, False]:
    for ratio in [0.2, 0.5, 0.8]:
        for _ in range(10000):
            out = fps(pos, batch, ratio, random_start)
            print(out.shape, out.device)

@etaoxing
Copy link
Author

Yeah, it's a strange issue. Btw pytorch3d.ops.sample_farthest_points(random_start_point=False) is their default setting. I'll try running with fps(random_start=False) when I get the chance, and see if the error occurs.

@etaoxing
Copy link
Author

Update: I did a few runs with fps(random_start=False) and did not encounter the exception.

Interestingly, a colleague started a job on the same GPU which I was using, and I did notice CUDA out of memory errors popped up alongside this negative dimension bug. The multiple tries loop still caught the exception so the run did not crash. The 3090 being used also still had a few GB of memory available.

For now though, the solution seems to be fine, so feel free to close.

@rusty1s rusty1s added the bug Something isn't working label Mar 28, 2022
@rusty1s
Copy link
Owner

rusty1s commented Mar 28, 2022

Thanks for sharing. I will do some further digging on my end as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants