Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to resize error #39

Open
leedh0124 opened this issue May 10, 2021 · 3 comments
Open

Unable to resize error #39

leedh0124 opened this issue May 10, 2021 · 3 comments

Comments

@leedh0124
Copy link

Hi. Many thanks for making your work public. It's been a pleasure reading your paper.

I tried running the code on Spyder. It works fine until at one point, it hits the following runtime error.

Start train epoch 12, lr=0.0001 for run run_20210510T145253
Evaluating baseline on dataset...
100%|██████████| 10/10 [00:00<00:00, 22.94it/s]
100%|██████████| 10/10 [00:03<00:00, 3.04it/s]
100%|██████████| 1/1 [00:00<00:00, 23.44it/s]
100%|██████████| 1/1 [00:00<00:00, 22.40it/s]
Finished epoch 12, took 00:00:03 s
Saving model and state...
Validating...
Validation overall avg_cost: -7.61328125 +- 0.06633966416120529
Evaluating candidate model on evaluation dataset
Epoch 12 candidate mean -7.60546875, baseline epoch 11 mean -7.64453125, difference 0.0390625
Start train epoch 13, lr=0.0001 for run run_20210510T145253
30%|███ | 3/10 [00:00<00:00, 22.74it/s]Evaluating baseline on dataset...
100%|██████████| 10/10 [00:00<00:00, 22.78it/s]
0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\site-packages\torch\multiprocessing\reductions.py", line 88, in rebuild_tensor
t = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\site-packages\torch_utils.py", line 133, in rebuild_tensor
return t.set
(storage, storage_offset, size, stride)
RuntimeError: Trying to resize storage that is not resizable at ..\aten\src\TH\THStorageFunctions.cpp:87

The problem is op with const data distribution. To make problem simple, I set graph_size as 20, batch_size 512, epoch_size as 5120, eval_batch_size 512, and 100 epochs. Other parameters are set as before.

Any idea to tackle this problem?

Thanks in advance!

@wouterkool
Copy link
Owner

Hi! Thanks for reporting. This sounds like a very weird issue which I have not seen before. Does it occur consistently, i.e. is it reproducible? Can you try to isolate the issue?

From your trace and a quick google search, it seems related to a zero dimensional tensor being pickled, maybe you can investigate if/why this happens?

@leedh0124
Copy link
Author

Hi! Thanks for responding quickly. It occurs on a random basis. Sometimes, the algorithm goes on until 100 epochs. Sometimes, this happens. It seems that this error happens right before line 84 of train.py as below.

# Put model in train mode!
model.train()
set_decode_type(model, "sampling")

for batch_id, batch in enumerate(tqdm(training_dataloader, disable=opts.no_progress_bar)):   <- here

I haven't made any changes to the existing code. Could I know which exact version of PyTorch(>1.7.0) and Python(>3.8) this code is based on? Im using Pytorch 1.7.1 and Python 3.8.3 btw.

@TimD3
Copy link

TimD3 commented Jul 29, 2021

Did you find a fix? I'm running into the same error and it seems to be in the same line. I don't understand where it would even come from. I don't see where in this part of the code zero dimensional tensors could appear or what gets pickled there.

Python 3.8.3 and PyTorch 1.8.1 btw.

Edit: I figured out it happens when enumerate(training_dataloader) is called and it can be avoided when setting the number of workers of the data loader to 0, I am unsure however why that happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants