Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some operational issues #3

Open
lilucy11 opened this issue May 10, 2022 · 13 comments
Open

Some operational issues #3

lilucy11 opened this issue May 10, 2022 · 13 comments

Comments

@lilucy11
Copy link

Hello, HowieMa,
I encountered a recall when running the training program on Windows, the specific error is as follows. I am not sure what's wrong,Can you help me solve this problem? Thank you very much. @HowieMa

=> no checkpoint found at output\multiview_h36m\multiview_transpose_50\256_fusion_enc3_GPE\checkpoint.pth.tar
before filter 1559752
after filter 1559752
before filter 550644
after filter 532192
Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 171, in main
train(config, train_loader, model, criterion, optimizer, epoch,
File "D:\TransFusion-Pose\run\pose2d....\lib\core\function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
File "", line 1, in
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

@HowieMa
Copy link
Owner

HowieMa commented May 10, 2022

It seems this is not a bug of my code, but a bug of Pytorch on Windows. Here are some references:

pytorch/pytorch#12085
https://discuss.pytorch.org/t/eoferror-ran-out-of-input-when-enumerating-the-train-loader/22692

I am sorry but I don't have a Windows computer so I can't help you debug. Besides, all of my codes are built on Ubuntu.
Maybe switching to Unbutu is a better choice for you.

@lilucy11
Copy link
Author

Thank you very much for your quick reply, I'll try Ubuntu later.
I compared the "256_fusion_geometry_3d_emb_2021-06-20-14-02_train" file you provided and found that I did not "(conv1x1)" after "(final_layer)" and received "no checkpoint found at outputmultiview_h36mmultiview_transpose_50256_ fusion_enc3_GPEcheckpoint.pth.tar" Should I change the "model_best.pth.tar" file to "checkpoint"? Or is this also a system problem?
Thanks again for your reply.

@lilucy11
Copy link
Author

Hello, HowieMa
I followed your advice and tried to run the train code on ubuntu20.04. Unfortunately, although I avoided the above mistakes, I received another recall , which shows below. Can you give me some help in this regard, thank you . @HowieMa

Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 171, in main
train(config, train_loader, model, criterion, optimizer, epoch,
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem
i, t, w, m = super().getitem(item)
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem
data_numpy = data_numpy[:1000] # According to ET
TypeError: 'NoneType' object is not subscriptable

@HowieMa
Copy link
Owner

HowieMa commented May 11, 2022

It seems that you didn't load the image successfully.
Have you strictly followed the preprocessing step of the dataset?

@lilucy11
Copy link
Author

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

@HowieMa
Copy link
Owner

HowieMa commented May 16, 2022

👍

@Cxz-dev
Copy link

Cxz-dev commented Oct 26, 2022

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

Hi lilucy11,Can you give me a download link of the h36m dataset? Thank u very much

@Gordon4629
Copy link

Hello HowieMa
TypeError: Caught TypeError in DataLoader worker process 0.
TypeError: 'NoneType' object is not subscriptable
I also have these two problems. How can I solve them?
thanks @HowieMa
If your error was resolved how was it resolved?
thanks @lilucy11

@HowieMa
Copy link
Owner

HowieMa commented Mar 27, 2023

I am so sorry but so far what I can know is this error is related to the dataset/ data loader. Without any detail, I don't know what happens on your side. It is possible that you don't set the data path correctly. Or your data does not organize in the correct way.

@Gordon4629
Copy link

Hello HowieMa
I set the data exactly according to your steps. The specific error content is like this. Is it related to the configuration environment? My Python version is 3.6.13 Torch version is 1.10.1.Can you give me some advice? Thanks
(final_layer): Conv2d(256, 20, kernel_size=(1, 1), stride=(1, 1))
)
=> no checkpoint found at output/multiview_h36m/multiview_transpose_50/256_fusion_enc3_GPE/checkpoint.pth.tar
before filter 1559752
after filter 1559752
before filter 550644
after filter 532192
/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 172, in main
final_output_dir, writer_dict)
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem
i, t, w, m = super().getitem(item)
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem
data_numpy = data_numpy[:1000] # According to ET
TypeError: 'NoneType' object is not subscriptable

Thank you @HowieMa

@HowieMa
Copy link
Owner

HowieMa commented Mar 27, 2023

Hello HowieMa I set the data exactly according to your steps. The specific error content is like this. Is it related to the configuration environment? My Python version is 3.6.13 Torch version is 1.10.1.Can you give me some advice? Thanks (final_layer): Conv2d(256, 20, kernel_size=(1, 1), stride=(1, 1)) ) => no checkpoint found at output/multiview_h36m/multiview_transpose_50/256_fusion_enc3_GPE/checkpoint.pth.tar before filter 1559752 after filter 1559752 before filter 550644 after filter 532192 /home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Traceback (most recent call last): File "run/pose2d/train.py", line 200, in main() File "run/pose2d/train.py", line 172, in main final_output_dir, writer_dict) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train for i, (inputs, targets, weights, metas) in enumerate(data): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem i, t, w, m = super().getitem(item) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem data_numpy = data_numpy[:1000] # According to ET TypeError: 'NoneType' object is not subscriptable

Thank you @HowieMa

Hello HowieMa I set the data exactly according to your steps. The specific error content is like this. Is it related to the configuration environment? My Python version is 3.6.13 Torch version is 1.10.1.Can you give me some advice? Thanks (final_layer): Conv2d(256, 20, kernel_size=(1, 1), stride=(1, 1)) ) => no checkpoint found at output/multiview_h36m/multiview_transpose_50/256_fusion_enc3_GPE/checkpoint.pth.tar before filter 1559752 after filter 1559752 before filter 550644 after filter 532192 /home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Traceback (most recent call last): File "run/pose2d/train.py", line 200, in main() File "run/pose2d/train.py", line 172, in main final_output_dir, writer_dict) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train for i, (inputs, targets, weights, metas) in enumerate(data): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem i, t, w, m = super().getitem(item) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem data_numpy = data_numpy[:1000] # According to ET TypeError: 'NoneType' object is not subscriptable

Thank you @HowieMa

It seems that the variable "data_numpy" is None, which means you don't load the image correctly. Make sure your images are not stored in a Zip file. In general, this is not a bug in my code. Maybe setting some breakpoints in the Dataloader code can help you understand your bug.

If you believe it is the version of Pytorch or Python, maybe it's better to report the issue to Pytorch Team. Thanks!

@Gordon4629
Copy link

Thank you. I'll try again.

@nyyy13
Copy link

nyyy13 commented Aug 11, 2023

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

did you fix the issue? I meet the same problem with you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants