Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why 352*1216? #69

Open
ciwei123 opened this issue Nov 18, 2022 · 5 comments
Open

Why 352*1216? #69

ciwei123 opened this issue Nov 18, 2022 · 5 comments

Comments

@ciwei123
Copy link

@zhyever Thanks for your sharing. The input of the model is only the best when it is 3521216, and the performance of other sizes becomes worse. Could you tell me why the output size is 3521216? Thank you very much!

@zhyever
Copy link
Owner

zhyever commented Nov 29, 2022

Could you please explain your question more specifically? I wonder if you mean you resize the images/gt during the inference stage, but get inferior performance?

@ciwei123
Copy link
Author

ciwei123 commented Nov 30, 2022

@zhyever Thanks for your reply. I have some questions.

  1. Why is it cropped to 352×1216 first during training, and then randomly cropped 352×704,instead of randomly cropped on the whole image?
  2. During the test, the indicator is the best only when the model output is 352×1216 and grap_crop is used for evaluation. That is model output size: 352×1216 ,eval size :206×1129(grap_crop).
    I tested some scenarios, and the metrics got worse,such as
    model output size: raw_img ,eval size :raw_img
    model output size: raw_img ,eval size :206×1129(grap_crop)
    model output size: raw_img ,eval size :352×1216(no grap_crop)
    model output size: 352×1216 ,eval size :352×1216(no grap_crop)
    model output size: 302×1216(crop_by_me),eval size :206×1129(grap_crop)

So I'm wondering why the indicator is the best only in that one case?Thank you very much!

@zhyever
Copy link
Owner

zhyever commented Jan 31, 2023

Hi, I'm sorry for this late reply.

  1. The logic is that we first crop images to 352x1216 following the previous KBCrop. It is a center crop and the aim is to delete unuseful fringe areas. I can't remember what paper first adopted this, but many implementations followed it. Then, we come to the random crop of 352x704. The aim is to achieve data augmentation and reduce the memory cost caused by high-resolution input images.
  2. That's an interesting question. Maybe I cannot surely answer you but I hope my words can provide some useful hints for you. I remember the pipeline of the evaluation is raw image -> pred depth -> KBCrop -> evaluation. As mentioned in tips1, the KBCrop removes the fringe of images, in which large depth errors may appear. Hence <model output size: raw_img ,eval size :352×1216(no grap_crop)> can get a better performance than <model output size: raw_img ,eval size :raw_img>. Eval size of 206x1129 is another area of images so the results could be different.
    I have to say that <model output size: 352×1216 ,eval size :352×1216(no grap_crop)> is also reasonable. When seeing NYU dataset, the problem is more intuitive: the fringe of GT depth map has no valid value. And the evaluation pipeline is still image -> pred depth -> NYUCrop -> evaluation. Soooo, why not first crop images and then predict depth, like image -> NYUCrop -> pred depth -> evaluation? I tested it many months ago and get better evaluation results. That could be reasonable, but I have to follow previous works to achieve a fair comparison.

@ciwei123
Copy link
Author

@zhyever Thanks for your reply. I think grap_crop is a trick , not universal. And I can understand image -> NYUCrop -> pred depth -> evaluation, but the most interesting thing is why it is 3521216 instead of other values(may be 3201184). In other words, when we get a new dataset, we don't know how big it should be cropped.

@zhyever
Copy link
Owner

zhyever commented Jan 31, 2023

I think the raw resolution is better if there is no invalid GT value lying in the fringe of images. If we have to crop, I guess we can calculate some error statistic information (I have to say that this is hard. But I recommend visualizing the GT maps of KITTI or NYU, and then you can see that the invalid fringe is so universal that you can easily set the crop size) about the error and then select relatively accurate areas (as large as possible). Another consideration is the requirement of the model. Many models require inputs with certain resolutions (multiple of 2, 4, 8, ...). You may also consider it when setting the crop size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants