-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training error in dpt_vit-b16 #77
Comments
a227799770055
changed the title
dpt_vit-b16 training error
Training error in dpt_vit-b16
Feb 24, 2023
You should download the pre-trained models.Please refer the DPT's markdown file. |
@Z-chocking thank you! |
@zhyever Traceback (most recent call last):
File "./tools/train.py", line 168, in <module>
main()
File "./tools/train.py", line 157, in main
train_depther(
File "/home/insign/work/Monocular-Depth-Estimation-Toolbox/depth/apis/train.py", line 121, in train_depther
runner.run(data_loaders, cfg.workflow)
File "/home/insign/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/insign/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 49, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/insign/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/home/insign/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/home/insign/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/home/insign/.local/lib/python3.8/site-packages/torch/_utils.py", line 542, in reraise
raise RuntimeError(msg) from None
RuntimeError: Caught UnicodeDecodeError in DataLoader worker process 0. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @zhyever
I want to train the model with my custom dataset in dpt_vit-b16_kitti, and I encounter the error as below.
It's seem that can not find the pretrain file nfs/checkpoints/jx_vit_base_p16_224-80ecf9dd.pth. Where can I download the file and which path should I put the file?
Traceback (most recent call last): File "./tools/train.py", line 168, in <module> main() File "./tools/train.py", line 135, in main model.init_weights() File "/home/insign2/.local/lib/python3.8/site-packages/mmcv/runner/base_module.py", line 117, in init_weights m.init_weights() File "/home/insign2/work/Monocular-Depth-Estimation-Toolbox/depth/models/backbones/vit.py", line 282, in init_weights checkpoint = CheckpointLoader.load_checkpoint( File "/home/insign2/.local/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 314, in load_checkpoint return checkpoint_loader(filename, map_location) # type: ignore File "/home/insign2/.local/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 333, in load_from_local raise FileNotFoundError(f'{filename} can not be found.') FileNotFoundError: nfs/checkpoints/jx_vit_base_p16_224-80ecf9dd.pth can not be found. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 151529) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in <module> main() File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/insign2/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered: