Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error about "Unexpected key(s) in state_dict: "w_junc", "w_heatmap" #85

Open
GGshmily opened this issue Jul 11, 2023 · 7 comments
Open

Comments

@GGshmily
Copy link

Hi, I completed the step1 and got my trained model. During step2, I commented gt_source_train and gt_source_test and disable the photometric and homographic augmentations , but error occur:

python -m sold2.experiment --exp_name wireframe_train --mode export --resume_path experiments/sold2_synth/ --model_config sold2/config/train_detector.yaml --dataset_config sold2/config/wireframe_dataset.yaml --checkpoint_name checkpoint-epoch199-end.tar --export_dataset_mode train --export_batch_size 4
[Info] Export mode
Output path: ./datasets/export_datasets/wireframe_train
[Info] Export predictions with homography adaptation.
Initializing dataset and dataloader
[Info] Initializing wireframe dataset...
Found filename cache wireframe_train_cache.pkl at ./datasets/wireframe
Load filename cache...
[Info] Successfully initialized dataset
Name: wireframe
Mode: train
Gt: /media/cqw/KESU/SOLD2/datasets/synthetic_shapes/synthetic_shape_train.h5
Counts: 20000

     Successfully intialized dataset and dataloader.
    --------Initializing model----------
    Model architecture: simple
    Backbone: lcnn
    Junction decoder: superpoint_decoder
    Heatmap decoder: pixel_shuffle
    -------------------------------------

Traceback (most recent call last):
File "/media/cqw/KESU/SOLD2/sold2/export.py", line 23, in restore_weights
model.load_state_dict(state_dict)
File "/home/cqw/anaconda3/envs/SOLD2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SOLD2Net:
Unexpected key(s) in state_dict: "w_junc", "w_heatmap".

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cqw/anaconda3/envs/SOLD2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/cqw/anaconda3/envs/SOLD2/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/cqw/KESU/SOLD2/sold2/experiment.py", line 227, in
export_dataset_mode=args.export_dataset_mode, device=device)
File "/media/cqw/KESU/SOLD2/sold2/experiment.py", line 116, in main
export(args, dataset_cfg, model_cfg, output_path, export_dataset_mode, device=device)
File "/media/cqw/KESU/SOLD2/sold2/experiment.py", line 92, in export
export_dataset_mode, device)
File "/media/cqw/KESU/SOLD2/sold2/export.py", line 158, in export_homograpy_adaptation
model = restore_weights(model, checkpoint["model_state_dict"])
File "/media/cqw/KESU/SOLD2/sold2/export.py", line 27, in restore_weights
missing_keys = err.missing_keys
AttributeError: 'NoneType' object has no attribute 'missing_keys'

I dont know how to solve it, can you give me some suggestion?

@rpautrat
Copy link
Member

Hi, when running Step 1, did you change any parameter in the config/train_detector.yaml file? In particular, did you keep the 'dynamic' policy for the junction and heatmap losses?

@GGshmily GGshmily reopened this Jul 13, 2023
@GGshmily
Copy link
Author

Hi, I didn't change any parameter in the config/train_detector.yaml file. Here is my parameter in the train_detector.yaml

[Model parameters]

model_name: "lcnn_simple"
model_architecture: "simple"

Backbone related config

backbone: "lcnn"
backbone_cfg:
input_channel: 1 # Use RGB images or grayscale images.
depth: 4
num_stacks: 2
num_blocks: 1
num_classes: 5

Junction decoder related config

junction_decoder: "superpoint_decoder"
junc_decoder_cfg:

Heatmap decoder related config

heatmap_decoder: "pixel_shuffle"
heatmap_decoder_cfg:

Shared configurations

grid_size: 8
keep_border_valid: True

Threshold of junction detection

detection_thresh: 0.0153846 # 1/65

Threshold of heatmap detection

prob_thresh: 0.5

[Loss parameters]

weighting_policy: "dynamic"

[Heatmap loss]

w_heatmap: 0.
w_heatmap_class: 1
heatmap_loss_func: "cross_entropy"
heatmap_loss_cfg:
policy: "dynamic"

[Junction loss]

w_junc: 0.
junction_loss_func: "superpoint"
junction_loss_cfg:
policy: "dynamic"

[Training parameters]

learning_rate: 0.0005
epochs: 200
train:
batch_size: 6
num_workers: 8
test:
batch_size: 6
num_workers: 8
disp_freq: 100
summary_freq: 200
max_ckpt: 150

It seems that the policy for the junction and heatmap losses are 'dynamic'

@rpautrat
Copy link
Member

What is your torch version? It might just be a compatibility issue.

Note that you can probably solve this issue with a quick fix: replace the lines 22 to 36 of sold2/export.py by this line: model.load_state_dict(state_dict, strict=False). This could potentially fix your problem.

@GGshmily
Copy link
Author

Hi, thank you for your reply. I retrained the model and finished the step 1. The model seems to work. But when running step 2, error occurs:
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/jimlee/65F33762C14D581B/SOLD2/sold2/dataset/wireframe_dataset.py", line 953, in getitem
exported_label = parse_h5_data(f[file_key])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/site-packages/h5py/_hl/group.py", line 305, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object '000000' doesn't exist)"

I used the wireframe datasets you provided and I check the './datasets/wireframe/train', there is no picture named '000000', can you give me some suggestions. Sorry to bother you.

@rpautrat
Copy link
Member

Hi, did you keep the fields 'gt_source_train' and 'gt_source_test' commented in config/wireframe_dataset.yaml as requested in the ReadMe? This is necessary to export the pseudo ground truth.

@GGshmily
Copy link
Author

Hi, thank you for your help. I finished step2. But when running step3, error occurs:

Traceback (most recent call last):
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/jimlee/anaconda3/envs/DeepLSD/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/jimlee/65F33762C14D581B/SOLD2/sold2/postprocess/convert_homography_results.py", line 122, in
junctions_pred_raw, heatmap_pred, device=device)
File "/media/jimlee/65F33762C14D581B/SOLD2/sold2/model/line_detection.py", line 101, in detect
self.heatmap_refine_cfg["valid_thresh"]
File "/media/jimlee/65F33762C14D581B/SOLD2/sold2/model/line_detection.py", line 262, in refine_heatmap_local
heatmap_output = torch.clamp((heatmap_output / count_map).float(),
RuntimeError: expected device cuda:0 and dtype Float but got device cuda:0 and dtype Int

My pytorch version is 1.4.0. Sorry to bother you once again, can you give me some suggestions.

@rpautrat
Copy link
Member

Hi, I just pushed a small fix. Can you try again with the latest version of the code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants