Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train parameters size mismatch #69

Open
kebijuelun opened this issue Dec 28, 2021 · 37 comments
Open

train parameters size mismatch #69

kebijuelun opened this issue Dec 28, 2021 · 37 comments

Comments

@kebijuelun
Copy link

kebijuelun commented Dec 28, 2021

  • follow testing steps, but meet the following error. It seems that the model parameters do not correspond to the model definition.
/data/github_code/ai-imu-dr/src/main_kitti.py in launch(args)
     29 
     30     if args.test_filter:
---> 31         test_filter(args, dataset)
     32 
     33     if args.results_filter:

/data/github_code/ai-imu-dr/src/main_kitti.py in test_filter(args, dataset)
    427     from IPython import embed; embed()
    428 
--> 429     torch_iekf.load(args, dataset)
    430     iekf.set_learned_covariance(torch_iekf)
    431 

/data/github_code/ai-imu-dr/src/utils_torch_filter.py in load(self, args, dataset)
    461         if os.path.isfile(path_iekf):
    462             mondict = torch.load(path_iekf)
--> 463             self.load_state_dict(mondict)
    464             cprint("IEKF nets loaded", 'green')
    465         else:

~/miniconda3/envs/dfvo/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    775         if len(error_msgs) > 0:
    776             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 777                                self.__class__.__name__, "\n\t".join(error_msgs)))
    778         return _IncompatibleKeys(missing_keys, unexpected_keys)
    779 

RuntimeError: Error(s) in loading state_dict for TORCHIEKF:
        Unexpected key(s) in state_dict: "mes_net.cov_net.8.weight", "mes_net.cov_net.8.bias", "mes_net.cov_net.12.weight", "mes_net.cov_net.12.bias", "mes_net.cov_net.16.weight", "mes_net.cov_net.16.bias". 
        size mismatch for mes_net.cov_net.4.weight: copying a param with shape torch.Size([64, 32, 5]) from checkpoint, the shape in current model is torch.Size([32, 32, 5]).
        size mismatch for mes_net.cov_net.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
  • torch version
torch                              1.1.0     
torchvision                        0.3.0 
@kimms74
Copy link

kimms74 commented Jan 3, 2022

make continue_training False in class KITTIArgs() of main_kitty.py

@hmf21
Copy link

hmf21 commented Jan 10, 2022

make continue_training False in class KITTIArgs() of main_kitty.py

It still have the same error for parameter mismatching after setting the continue_training False. Would you have any other idea about this problems?

@lumyus
Copy link

lumyus commented Jan 10, 2022

Same issue here

@scott81321
Copy link

I also get something very similar:
RuntimeError: Error(s) in loading state_dict for TORCHIEKF:
Unexpected key(s) in state_dict: "mes_net.cov_net.8.weight", "mes_net.cov_net.8.bias", "mes_net.cov_net.12.weight", "mes_net.cov_net.12.bias", "mes_net.cov_net.16.weight", "mes_net.cov_net.16.bias".
size mismatch for mes_net.cov_net.4.weight: copying a param with shape torch.Size([64, 32, 5]) from checkpoint, the shape in current model is torch.Size([32, 32, 5]).
size mismatch for mes_net.cov_net.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).

Part of the problem goes away if you adjust the sizes in mesnet but either I cannot find (so far) make the right size adjustments to make completely the problem go away,

=> This happens if path_iekf finds the file ../temp/iekfnets.p However, if it is not there the program carries and I still get the beautiful plot as shown in Github namely the route segment of file 2011_09_30_drive_0028_extract

@lumyus
Copy link

lumyus commented Jan 10, 2022

Same here. If you put "train_filter = 1" and then back to "train_filter = 0" the IEKF will be loaded however that's not the trained model which was specificed at the URL in the Readme. Also changing the network parameters (shape) did not work for me. Any idea on whats going on here @mbrossar ?

@scott81321
Copy link

scott81321 commented Jan 10, 2022

I am still working with the original default at train_filter = 0 and test_filter=1. I cannot complain because the curves obtained are BEAUTIFUL (Merci Martin) but I realize that some training had to be used to get that BEAUTIFUL curve. The code reads in ../temp/normalize_factors.p and needs them

I suppose an adjacent question is: how to get the curves with pure IEKF and no help from the AI or CNN part?

@hmf21
Copy link

hmf21 commented Jan 10, 2022

According to @scott81321 , I delete the ../temp/iekfnets.p and also get some curves which seem to be generated from the mesnet with randomly initialized parameters.
And refer to the paper, mesnet is composed of two Conv layers but the iekfnets.p gives a model with five layers. Is there anything wrong with the implement?

@scott81321
Copy link

Hello @hmf17 can you read the contents of iekfnets.p? What little I know is that they contain the CNN (mes.net). I can get a picture of it using netron but can you give me the python instructions to read the contents?

CNN_model

@hmf21
Copy link

hmf21 commented Jan 10, 2022

Hi @scott81321 , I use torch.load() to read the contents of iekfnets.p and the result is shown in picture below. Although the picture is not intuitive, it seems the stucture is different from your picture which only contains two conv layers. How do you get this diagram? It is very beautiful.

image

@scott81321
Copy link

scott81321 commented Jan 10, 2022

Hello @hmf17 Thank you. To get that picture of the CNN, I use a relatively new software called netron. You can use it online https://netron.app/ or download it from Github https://github.com/lutzroeder/netron. You have to create a .pt file inside init in class TORCHIEKF. After the instruction: self.mes_net = MesNet() then save the CNN model with
PATH = "...../CNN_model.pt"
torch.save(self.mes_net, PATH)
Once you have that, then load it into netron

I do see something weird in the picture you just showed me , dimension indices as high as 128? Your picture is beautiful also. I used torchload() but then followed with a print statement which gives too many details. How did you get the tensor dimensions upfront?

@hmf21
Copy link

hmf21 commented Jan 10, 2022

Hi @scott81321 , thank you for providing this powerful software. I just simply use Pycharm to see the details in iekfnets.p and you can see the prameter states in the variables toolbar. The max output peature dimension is 128 in this model which is quiet different from the description in the paper.
And I still have no progress for running this program, do you have any good idea?

@scott81321
Copy link

scott81321 commented Jan 10, 2022

Oh! just use the code as originally loaded and remove iekfnets.p from the temp sub-directory [just put iekfnets.p elsewhere]. If it cannot find the file, it gives a print statement [look for cprint("IEKF nets NOT loaded", 'yellow') in utils_torch_filter.py]
but carries on nonetheless. The original version that you can download only uses normalize_factors.p [make sure train_filter=0]. I got the code working on the test files producing 10 ensembles of graphs. What I would like to know is how to get the results without the training i.e. pure IEKF because ironically, even though I am clearly NOT loading iekfnets.p, the picture I get for 2011_09_30_drive_0028_extract i.e. file position_xy.png looks like the result enhanced with AI (CNN) not the raw IEKF result.

Please, can you give me the specific Python command(s) to print out the contents of iekfnets.p ??

@hmf21
Copy link

hmf21 commented Jan 10, 2022

Hi @scott81321 , I just use some simple commands :
path_iekf = './temp/iekfnets.p'
mondict = torch.load(path_iekf)
then I can see the content of the loaded model in Variables toolbar on the right.

@scott81321
Copy link

Thx. Here is what netron gives for iekfnets.pt (note as a pt file)
iekfnets

@hmf21
Copy link

hmf21 commented Jan 11, 2022

great! @scott81321

@lumyus
Copy link

lumyus commented Jan 11, 2022

So did anyone get it to work? I mean actually use your own data to get results? The plots seem to be generated no matter what model is used..

@scott81321
Copy link

I got it to work for the datasets downloaded from github. Not on my own data yet. I need to better understand his code. E.g. how to switch on the neural network and not use it i.e. pure IEKF.

@lumyus
Copy link

lumyus commented Jan 11, 2022

Nice! What did you change? Running the model which is provided by the author does not work..

@Hazeline2018
Copy link

@scott81321 @hmf17
Hi, I wonder how you guys got the program working with training (train_filter = 1), even with the KITTI datasets that Martin originally used? When I read in the datasets, and start training, I got the following error that I have no clue about:

_Sequence name : 2011_09_30_drive_0028_sync

Sequence name : 2011_09_30_drive_0033_sync
Dataset is too short (15.94 s)

Sequence name : 2011_09_30_drive_0034_sync
Dataset is too short (12.24 s)

Sequence name : 2011_09_30_drive_0072_sync
Dataset is too short (0.05 s)

Total dataset duration : 825.41 s
IEKF nets NOT loaded
Traceback (most recent call last):
File "main_kitti.py", line 484, in
launch(KITTIArgs)
File "main_kitti.py", line 28, in launch
train_filter(args, dataset)
File "/home/terryl/projects/AI-IMU-DR/ai-imu-dr/src/train_torch_filter.py", line 61, in train_filter
prepare_loss_data(args, dataset)
File "/home/terryl/projects/AI-IMU-DR/ai-imu-dr/src/train_torch_filter.py", line 108, in prepare_loss_data
Rot_gt = torch.zeros(Ns[1], 3, 3)
TypeError: zeros() received an invalid combination of arguments - got (NoneType, int, int), but expected one of:

  • (tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
  • (tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)_

Hopefully you guys can give me some advice on how to get over this error and most importantly, get the training program working first. I tend to tailor the program toward my application by training model with own datasets if all possible.

I'm using PyTorch 1.0.0 with GPU version

Thanks in advance!
Terry

@saltrack
Copy link

saltrack commented Mar 24, 2022

@kebijuelun @scott81321 @lumyus @hmf17 did you guys make any progress on resolving this problem?

The plots seem pretty good even with randomly initialized parameters.

I've modified the sizes of the layers of the Mesnet which resolved some of the errors but this error continues to persist.

"RuntimeError: mat1 and mat2 shapes cannot be multiplied (47945x64 and 32x2)"

@nothing371442
Copy link

Hi, I also met the proplem of mismatch of mesnet size. When I deleted the iekfnets.p and run the code without CNN, the result looked good. I wonder how can I run the code with CNN? At the mean time, why the result without CNN adapter has been so good? Thanks a lot :)

@nothing371442
Copy link

Hi, I also met the proplem of mismatch of mesnet size. When I deleted the iekfnets.p and run the code without CNN, the result looked good. I wonder how can I run the code with CNN? At the mean time, why the result without CNN adapter has been so good? Thanks a lot :)

The problem of dismatich can be solved, by turning on the train option (set to 1) and it can generate a new iekfnets.p which can be used for test filter.

@Rajat-Arora
Copy link

@nothing371442 didn't you get any errors while training as mentioned in #72?

Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

@nothing371442
Copy link

@nothing371442 didn't you get any errors while training as mentioned in #72?

Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

Did you delete the iekfnets.p file first? I delete the iekfnets.p file firstly, and do train option (set to 1), which can generate a new .p file.

@Rajat-Arora
Copy link

Rajat-Arora commented Nov 19, 2022

@nothing371442 didn't you get any errors while training as mentioned in #72?
Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

Did you delete the iekfnets.p file first? I delete the iekfnets.p file firstly, and do train option (set to 1), which can generate a new .p file.

Yes, I have deleted this file and set the train option (set to 1), but it gives me an error similar to #72.

image
image

@scott81321
Copy link

Hi guys. As I can tell there is a mismatch in format between the file iekfnets.p and what CNN format is. Notice that Brossard's default is on test mode, not train mode. I saw discrepancies in the values for the noise covariances of his thesis and what he encoded for the OXTS data files of his test data. This suggests to me that he hardwired these numbers to get the best test results for his test cases and kind of relinquished the training aspect in a pragmatic way. These noise covariances are in the initials ones on main_kitti.py and less importantly in utils_numpy_filter.py I had to modify the ones in main_kitti.py to get the best results for the data given to me.

So I would like to ask all of you: what does iefknets.p contain? Is it only noise covariances? If so, which ones?

@nothing371442
Copy link

@nothing371442 didn't you get any errors while training as mentioned in #72?
Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

Did you delete the iekfnets.p file first? I delete the iekfnets.p file firstly, and do train option (set to 1), which can generate a new .p file.

Yes, I have deleted this file and set the train option (set to 1), but it gives me an error similar to #72.

image image

Hi, did you download the provided delta_p.p file firstly?

@nothing371442
Copy link

Hi guys. As I can tell there is a mismatch in format between the file iekfnets.p and what CNN format is. Notice that Brossard's default is on test mode, not train mode. I saw discrepancies in the values for the noise covariances of his thesis and what he encoded for the OXTS data files of his test data. This suggests to me that he hardwired these numbers to get the best test results for his test cases and kind of relinquished the training aspect in a pragmatic way. These noise covariances are in the initials ones on main_kitti.py and less importantly in utils_numpy_filter.py I had to modify the ones in main_kitti.py to get the best results for the data given to me.

So I would like to ask all of you: what does iefknets.p contain? Is it only noise covariances? If so, which ones?
I think it contains net parameters like pic below
net_para

@Rajat-Arora
Copy link

@nothing371442 didn't you get any errors while training as mentioned in #72?
Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

Did you delete the iekfnets.p file first? I delete the iekfnets.p file firstly, and do train option (set to 1), which can generate a new .p file.

Yes, I have deleted this file and set the train option (set to 1), but it gives me an error similar to #72.
image image

Hi, did you download the provided delta_p.p file firstly?

I was able to figure it out and train the model, there were some issues regarding the version of PyTorch that I was using.

@Rajat-Arora
Copy link

Hi @scott81321, could you please describe more about that actually what modifications were done in main_kitti.py to get the best results? Also, you mentioned data given to you, so are you talking about the dataset given to you by the author or your own dataset?

Hi guys. As I can tell there is a mismatch in format between the file iekfnets.p and what CNN format is. Notice that Brossard's default is on test mode, not train mode. I saw discrepancies in the values for the noise covariances of his thesis and what he encoded for the OXTS data files of his test data. This suggests to me that he hardwired these numbers to get the best test results for his test cases and kind of relinquished the training aspect in a pragmatic way. These noise covariances are in the initials ones on main_kitti.py and less importantly in utils_numpy_filter.py I had to modify the ones in main_kitti.py to get the best results for the data given to me.

So I would like to ask all of you: what does iefknets.p contain? Is it only noise covariances? If so, which ones?

Hi @scott81321, could you please describe more about that actually what modifications were done in main_kitti.py to get the best results?
Also, you mentioned data given to you, so are you talking about the dataset given to you by the author or your dataset?

@scott81321
Copy link

Hi @scott81321, could you please describe more about that actually what modifications were done in main_kitti.py to get the best results? Also, you mentioned data given to you, so are you talking about the dataset given to you by the author or your own dataset?

Hi guys. As I can tell there is a mismatch in format between the file iekfnets.p and what CNN format is. Notice that Brossard's default is on test mode, not train mode. I saw discrepancies in the values for the noise covariances of his thesis and what he encoded for the OXTS data files of his test data. This suggests to me that he hardwired these numbers to get the best test results for his test cases and kind of relinquished the training aspect in a pragmatic way. These noise covariances are in the initials ones on main_kitti.py and less importantly in utils_numpy_filter.py I had to modify the ones in main_kitti.py to get the best results for the data given to me.
So I would like to ask all of you: what does iefknets.p contain? Is it only noise covariances? If so, which ones?

Hi @scott81321, could you please describe more about that actually what modifications were done in main_kitti.py to get the best results? Also, you mentioned data given to you, so are you talking about the dataset given to you by the author or your dataset?

The data is proprietary and I cannot tell you where it came from. It's not OXTS data. That much I can tell you. The IMU sensor is not as high quality. As I said to get the best results, I had to change the noise covariances - variables starting with cov_ in the python files I mentioned. I cannot and will not tell what settings I used, only point out that I had to increase them. To find the best results, I tried many simulations on the same data until I found a range that worked well.

@ajay1606
Copy link

ajay1606 commented Dec 5, 2022

@scott81321 Thank you for your input on most of the queries posted here. Every single comment you posted here is useful in understanding this work. With your support, able to get the following result from the custom dataset.

But still, there are a few parameters that need to tune to get a better result, Has anyone come across with similar situation? appreciate any response.

And I am trying to port it to work in ROS, so we can test in real-time sensor input. I will share once I have completed that.

XY PLOT
image
ALIGNED XY PLOT
image

@scott81321
Copy link

scott81321 commented Dec 5, 2022

@ajay1606 What are you asking for? How to improve your results? With all due respect, the aligned picture looks pretty good in terms of agreement. What sensor are you using? Is it high quality? Also what is the resolution of your lat-longs i.e. position? If it's GPS, the accuracy is limited by the number of digits. E.g. 5 digits of lat-longs gives 1.1 meters resolution. 4 digits only gives 11.1 meters. It seems to me, this result is pretty good. The only thing I can think of, to improve it, would be a slight, e.g. adjustment of the initial noise covariances (variables cov_* ) in main_kitti.py. There is also the issue of the INITIAL CONDITIONS i.e. Initial velocity and especially initial RPY. This program is VERY sensitive to initial RPY. E.g. if you're driving a vehicle on a horizontal flat surface, you have to worry about initial Yaw. Roll and pitch should be about zero in this case.

@ajay1606
Copy link

ajay1606 commented Dec 6, 2022

@scott81321 Thank you so much for your quick response. Currently am testing with NOVATEL RTK GNSS + Epson G320N MEMS IMU Model. And Thank you so much for your confirmation and I will try to tune initial noise covariances as you suggested. I agree with you completely, the program is very sensitive to initial RPY.

Thank you so much.

@kartikeya13
Copy link

Hello,
Apologies for the newbie question but can anyone tell me what is the difference between XY plot and the aligned XY plot?
Thanks

@scott81321
Copy link

As far as I know, the aligned plot is one which tries to align the IEKF computed solution from IMU data with the ground truth (usually GPS values). The XY plot is the plot without that alignment. This alignment is made in utils_plot.py

@Akudavale
Copy link

@nothing371442 didn't you get any errors while training as mentioned in #72?
Did you make any changes to getting the train option (set to 1) working on the existing dataset provided by the author? Could you help me out with it.

Did you delete the iekfnets.p file first? I delete the iekfnets.p file firstly, and do train option (set to 1), which can generate a new .p file.

Yes, I have deleted this file and set the train option (set to 1), but it gives me an error similar to #72.
image image

Hi, did you download the provided delta_p.p file firstly?

I was able to figure it out and train the model, there were some issues regarding the version of PyTorch that I was using.

@Rajat-Arora hey can you explain what did you do to solve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests