Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError in DataLoader Worker Process with Custom Dataset #48

Open
yulrio opened this issue Aug 16, 2024 · 8 comments
Open

IndexError in DataLoader Worker Process with Custom Dataset #48

yulrio opened this issue Aug 16, 2024 · 8 comments

Comments

@yulrio
Copy link

yulrio commented Aug 16, 2024

Hello,

I'm currently using your code from the repository [insert repository name] with my own dataset, but I'm encountering an IndexError during the training phase. Below is the traceback I received:

[ Fri Aug 16 10:18:36 2024 ] Parameters:
{'work_dir': './work_dir/baseline_res18/', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '3', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'QSLR2024', 'dataset_info': {'dataset_root': './dataset/QSLR2024', 'dict_path': './preprocess/QSLR2024/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'QSLR2024-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/QSLR2024', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 30}

0%| | 0/162 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/raid/data/m33221012/VAC_CSLR_QSLR/main.py", line 213, in
processor.start()
File "/raid/data/m33221012/VAC_CSLR_QSLR/main.py", line 44, in start
seq_train(self.data_loader['train'], self.model, self.optimizer,
File "/raid/data/m33221012/VAC_CSLR_QSLR/seq_scripts.py", line 18, in seq_train
for batch_idx, data in enumerate(tqdm(loader)):
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 48, in getitem
input_data, label = self.normalize(input_data, label)
File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 80, in normalize
video, label = self.data_aug(video, label, file_id)
File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

It seems the issue occurs within the video_augmentation.py script when accessing clip[0]. I suspect it might be related to the data augmentation process or the input data structure.

Since I'm using my own dataset, could you please let me know what specific adjustments or preprocessing steps are necessary to ensure compatibility with your code? Additionally, is there a possibility that this error is related to hardware settings, such as GPU configuration or memory limitations?

Any advice on how to resolve this error and properly integrate my dataset would be greatly appreciated.

Thank you in advance for your help!

@RafaelAmauri
Copy link

Did you run the preprocessing script on your training data before training? I was having this issue too when using a custom dataset, but after running the pre-processing script it worked out fine.

@yulrio
Copy link
Author

yulrio commented Sep 29, 2024

Thank you for replying to my question.
May I know the configuration of the .yaml file?
Thanks in advance.

@RafaelAmauri
Copy link

I am using the default values. I haven't changed any configs

@Onestringlab
Copy link

I just ran the following command:

!python main.py --load-weights resnet18_baseline_dev_23.80_epoch25_model.pt --phase test --device 0

and got the following result:

Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
Working tree is dirty. Patch:
diff --git a/.gitignore b/.gitignore
old mode 100755
new mode 100644

[ Tue Oct  8 22:35:41 2024 ] Model: slr_network.SLRModel.
[ Tue Oct  8 22:35:42 2024 ] Weights: /content/drive/MyDrive/MyResearch/pretrain/resnet18_baseline_dev_23.80_epoch25_model.pt.
100% 68/68 [1:09:24<00:00, 61.24s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-dev-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Tue Oct 8 11:45:07 PM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Tue Oct  8 23:45:07 2024 ] Epoch 6667, dev 100.00%
100% 79/79 [1:15:47<00:00, 57.56s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Wed Oct 9 01:00:55 AM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Wed Oct  9 01:00:55 2024 ] Epoch 6667, test 100.00%
[ Wed Oct  9 01:00:55 2024 ] Evaluation Done.

Can you explain why the error Unexpected error: <class 'AttributeError'> occurred and which part of the code needs to be corrected?

Also, why did I get 100% for both dev and test?

Thanks in advance!

@RafaelAmauri
Copy link

File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 48, in getitem
input_data, label = self.normalize(input_data, label)
File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 80, in normalize
video, label = self.data_aug(video, label, file_id)
File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 24, in call
image = t(image)
File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 119, in call
if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

Just in case anyone else runs into this, this error happens because the dataloader couldn't load the dataset for whatever reason. I just had this error again because inside my dataset I had it like this: dataset/features/train,test,dev. I forgot to add the 'fullFrame-256x256px' folder right after features, and because of that the dataloader wasn't able to find the train/test/dev folders. It is hard-coded to look specifically for a fullFrame-256x256px folder, and when it couldn't find one, nothing was loaded.

This is to say, make sure that the structure inside your custom dataset is 100% similar to the one found inside phoenix2014. Any changes could break the training script.

@RafaelAmauri
Copy link

RafaelAmauri commented Oct 30, 2024

I just ran the following command:

!python main.py --load-weights resnet18_baseline_dev_23.80_epoch25_model.pt --phase test --device 0

and got the following result:

Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
Working tree is dirty. Patch:
diff --git a/.gitignore b/.gitignore
old mode 100755
new mode 100644

[ Tue Oct  8 22:35:41 2024 ] Model: slr_network.SLRModel.
[ Tue Oct  8 22:35:42 2024 ] Weights: /content/drive/MyDrive/MyResearch/pretrain/resnet18_baseline_dev_23.80_epoch25_model.pt.
100% 68/68 [1:09:24<00:00, 61.24s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-dev-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Tue Oct 8 11:45:07 PM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Tue Oct  8 23:45:07 2024 ] Epoch 6667, dev 100.00%
100% 79/79 [1:15:47<00:00, 57.56s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Wed Oct 9 01:00:55 AM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Wed Oct  9 01:00:55 2024 ] Epoch 6667, test 100.00%
[ Wed Oct  9 01:00:55 2024 ] Evaluation Done.

Can you explain why the error Unexpected error: <class 'AttributeError'> occurred and which part of the code needs to be corrected?

Also, why did I get 100% for both dev and test?

Thanks in advance!

I don't know how to fix the AttributeError, but getting 100% WER on the dev and test splits happens because you need to have an 'evaluation' folder in the folder where the main code for VAC is. Inside this evaluation folder you need to have the .stm files with the groundtruth for the dev and test splits.

Luckily, the preprocessing step generates these automatically. After you run the preprocessing step, you should see a new folder created inside the preprocess folder with the name of your dataset. There you will find the .stm files with the groundtruth.

The phoenix dataset comes with this evaluation folder by default with a bunch of different files, not only the .stm files, so I don't know if it's only the .stms that you need or if you need the rest too. What I did was copy the entire 'evaluation' folder from phoenix and just replaced the .stms that come with phoenix with the ones generated by the preprocessing script for my custom dataset.

Good luck!

@Onestringlab
Copy link

Thank you for the answer.

Could you let me know which version of PyTorch you used for these experiments?

Thanks again!

@RafaelAmauri
Copy link

Thank you for the answer.

Could you let me know which version of PyTorch you used for these experiments?

Thanks again!

I'm using python 3.8.10 and pytorch 1.13.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants