Some operational issues #3

lilucy11 · 2022-05-10T08:34:35Z

Hello， HowieMa,
I encountered a recall when running the training program on Windows, the specific error is as follows. I am not sure what's wrong，Can you help me solve this problem? Thank you very much. @HowieMa

=> no checkpoint found at output\multiview_h36m\multiview_transpose_50\256_fusion_enc3_GPE\checkpoint.pth.tar
before filter 1559752
after filter 1559752
before filter 550644
after filter 532192
Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 171, in main
train(config, train_loader, model, criterion, optimizer, epoch,
File "D:\TransFusion-Pose\run\pose2d....\lib\core\function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
File "", line 1, in
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\procedure_for_study\Anaconda3\envs\transpose\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

HowieMa · 2022-05-10T08:41:58Z

It seems this is not a bug of my code, but a bug of Pytorch on Windows. Here are some references:

pytorch/pytorch#12085
https://discuss.pytorch.org/t/eoferror-ran-out-of-input-when-enumerating-the-train-loader/22692

I am sorry but I don't have a Windows computer so I can't help you debug. Besides, all of my codes are built on Ubuntu.
Maybe switching to Unbutu is a better choice for you.

lilucy11 · 2022-05-10T09:07:11Z

Thank you very much for your quick reply, I'll try Ubuntu later.
I compared the "256_fusion_geometry_3d_emb_2021-06-20-14-02_train" file you provided and found that I did not "(conv1x1)" after "(final_layer)" and received "no checkpoint found at outputmultiview_h36mmultiview_transpose_50256_ fusion_enc3_GPEcheckpoint.pth.tar" Should I change the "model_best.pth.tar" file to "checkpoint"? Or is this also a system problem?
Thanks again for your reply.

lilucy11 · 2022-05-11T05:37:06Z

Hello, HowieMa
I followed your advice and tried to run the train code on ubuntu20.04. Unfortunately, although I avoided the above mistakes, I received another recall , which shows below. Can you give me some help in this regard, thank you . @HowieMa

Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 171, in main
train(config, train_loader, model, criterion, optimizer, epoch,
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lilucy/anaconda3/envs/transpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem
i, t, w, m = super().getitem(item)
File "/home/lilucy/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem
data_numpy = data_numpy[:1000] # According to ET
TypeError: 'NoneType' object is not subscriptable

HowieMa · 2022-05-11T05:42:53Z

It seems that you didn't load the image successfully.
Have you strictly followed the preprocessing step of the dataset?

lilucy11 · 2022-05-16T14:08:05Z

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

HowieMa · 2022-05-16T15:58:01Z

👍

Cxz-dev · 2022-10-26T10:16:56Z

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

Hi lilucy11，Can you give me a download link of the h36m dataset? Thank u very much

Gordon4629 · 2023-03-27T05:41:34Z

Hello HowieMa
TypeError: Caught TypeError in DataLoader worker process 0.
TypeError: 'NoneType' object is not subscriptable
I also have these two problems. How can I solve them？
thanks @HowieMa
If your error was resolved how was it resolved？
thanks @lilucy11

HowieMa · 2023-03-27T05:43:54Z

I am so sorry but so far what I can know is this error is related to the dataset/ data loader. Without any detail, I don't know what happens on your side. It is possible that you don't set the data path correctly. Or your data does not organize in the correct way.

Gordon4629 · 2023-03-27T07:37:07Z

Hello HowieMa
I set the data exactly according to your steps. The specific error content is like this. Is it related to the configuration environment? My Python version is 3.6.13 Torch version is 1.10.1.Can you give me some advice? Thanks
(final_layer): Conv2d(256, 20, kernel_size=(1, 1), stride=(1, 1))
)
=> no checkpoint found at output/multiview_h36m/multiview_transpose_50/256_fusion_enc3_GPE/checkpoint.pth.tar
before filter 1559752
after filter 1559752
before filter 550644
after filter 532192
/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "run/pose2d/train.py", line 200, in
main()
File "run/pose2d/train.py", line 172, in main
final_output_dir, writer_dict)
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train
for i, (inputs, targets, weights, metas) in enumerate(data):
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem
i, t, w, m = super().getitem(item)
File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem
data_numpy = data_numpy[:1000] # According to ET
TypeError: 'NoneType' object is not subscriptable

Thank you @HowieMa

HowieMa · 2023-03-27T07:45:49Z

Hello HowieMa I set the data exactly according to your steps. The specific error content is like this. Is it related to the configuration environment? My Python version is 3.6.13 Torch version is 1.10.1.Can you give me some advice? Thanks (final_layer): Conv2d(256, 20, kernel_size=(1, 1), stride=(1, 1)) ) => no checkpoint found at output/multiview_h36m/multiview_transpose_50/256_fusion_enc3_GPE/checkpoint.pth.tar before filter 1559752 after filter 1559752 before filter 550644 after filter 532192 /home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Traceback (most recent call last): File "run/pose2d/train.py", line 200, in main() File "run/pose2d/train.py", line 172, in main final_output_dir, writer_dict) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/core/function.py", line 98, in train for i, (inputs, targets, weights, metas) in enumerate(data): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/anaconda3/envs/sine/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/multiview_h36m.py", line 113, in getitem i, t, w, m = super().getitem(item) File "/home/jsj_21_05/TransFusion-Pose/run/pose2d/../../lib/dataset/joints_dataset.py", line 130, in getitem data_numpy = data_numpy[:1000] # According to ET TypeError: 'NoneType' object is not subscriptable

Thank you @HowieMa

It seems that the variable "data_numpy" is None, which means you don't load the image correctly. Make sure your images are not stored in a Zip file. In general, this is not a bug in my code. Maybe setting some breakpoints in the Dataloader code can help you understand your bug.

If you believe it is the version of Pytorch or Python, maybe it's better to report the issue to Pytorch Team. Thanks!

Gordon4629 · 2023-03-27T07:49:34Z

Thank you. I'll try again.

nyyy13 · 2023-08-11T12:34:36Z

After processing data in strict accordance with the requirements of H36M-Toolbox, I jumped out of this problem. Thank you for your ideas

did you fix the issue? I meet the same problem with you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some operational issues #3

Some operational issues #3

lilucy11 commented May 10, 2022

HowieMa commented May 10, 2022

lilucy11 commented May 10, 2022

lilucy11 commented May 11, 2022

HowieMa commented May 11, 2022

lilucy11 commented May 16, 2022

HowieMa commented May 16, 2022

Cxz-dev commented Oct 26, 2022

Gordon4629 commented Mar 27, 2023

HowieMa commented Mar 27, 2023

Gordon4629 commented Mar 27, 2023

HowieMa commented Mar 27, 2023

Gordon4629 commented Mar 27, 2023

nyyy13 commented Aug 11, 2023

Some operational issues #3

Some operational issues #3

Comments

lilucy11 commented May 10, 2022

Hello， HowieMa, I encountered a recall when running the training program on Windows, the specific error is as follows. I am not sure what's wrong，Can you help me solve this problem? Thank you very much. @HowieMa

HowieMa commented May 10, 2022

lilucy11 commented May 10, 2022

lilucy11 commented May 11, 2022

HowieMa commented May 11, 2022

lilucy11 commented May 16, 2022

HowieMa commented May 16, 2022

Cxz-dev commented Oct 26, 2022

Gordon4629 commented Mar 27, 2023

HowieMa commented Mar 27, 2023

Gordon4629 commented Mar 27, 2023

HowieMa commented Mar 27, 2023

Gordon4629 commented Mar 27, 2023

nyyy13 commented Aug 11, 2023

Hello， HowieMa,
I encountered a recall when running the training program on Windows, the specific error is as follows. I am not sure what's wrong，Can you help me solve this problem? Thank you very much. @HowieMa