Why 352*1216？ #69

ciwei123 · 2022-11-18T02:56:50Z

@zhyever Thanks for your sharing. The input of the model is only the best when it is 3521216, and the performance of other sizes becomes worse. Could you tell me why the output size is 3521216? Thank you very much!

zhyever · 2022-11-29T08:18:42Z

Could you please explain your question more specifically? I wonder if you mean you resize the images/gt during the inference stage, but get inferior performance?

ciwei123 · 2022-11-30T00:45:02Z

@zhyever Thanks for your reply. I have some questions.

Why is it cropped to 352×1216 first during training, and then randomly cropped 352×704，instead of randomly cropped on the whole image？
During the test, the indicator is the best only when the model output is 352×1216 and grap_crop is used for evaluation. That is model output size: 352×1216 ,eval size :206×1129(grap_crop).
I tested some scenarios, and the metrics got worse，such as
model output size: raw_img ,eval size :raw_img
model output size: raw_img ,eval size :206×1129(grap_crop)
model output size: raw_img ,eval size :352×1216(no grap_crop)
model output size: 352×1216 ,eval size :352×1216(no grap_crop)
model output size: 302×1216(crop_by_me),eval size :206×1129(grap_crop)

So I'm wondering why the indicator is the best only in that one case？Thank you very much!

zhyever · 2023-01-31T00:46:53Z

Hi, I'm sorry for this late reply.

The logic is that we first crop images to 352x1216 following the previous KBCrop. It is a center crop and the aim is to delete unuseful fringe areas. I can't remember what paper first adopted this, but many implementations followed it. Then, we come to the random crop of 352x704. The aim is to achieve data augmentation and reduce the memory cost caused by high-resolution input images.
That's an interesting question. Maybe I cannot surely answer you but I hope my words can provide some useful hints for you. I remember the pipeline of the evaluation is raw image -> pred depth -> KBCrop -> evaluation. As mentioned in tips1, the KBCrop removes the fringe of images, in which large depth errors may appear. Hence <model output size: raw_img ,eval size :352×1216(no grap_crop)> can get a better performance than <model output size: raw_img ,eval size :raw_img>. Eval size of 206x1129 is another area of images so the results could be different.
I have to say that <model output size: 352×1216 ,eval size :352×1216(no grap_crop)> is also reasonable. When seeing NYU dataset, the problem is more intuitive: the fringe of GT depth map has no valid value. And the evaluation pipeline is still image -> pred depth -> NYUCrop -> evaluation. Soooo, why not first crop images and then predict depth, like image -> NYUCrop -> pred depth -> evaluation? I tested it many months ago and get better evaluation results. That could be reasonable, but I have to follow previous works to achieve a fair comparison.

ciwei123 · 2023-01-31T01:02:00Z

@zhyever Thanks for your reply. I think grap_crop is a trick , not universal. And I can understand image -> NYUCrop -> pred depth -> evaluation, but the most interesting thing is why it is 3521216 instead of other values(may be 3201184). In other words, when we get a new dataset, we don't know how big it should be cropped.

zhyever · 2023-01-31T04:11:20Z

I think the raw resolution is better if there is no invalid GT value lying in the fringe of images. If we have to crop, I guess we can calculate some error statistic information (I have to say that this is hard. But I recommend visualizing the GT maps of KITTI or NYU, and then you can see that the invalid fringe is so universal that you can easily set the crop size) about the error and then select relatively accurate areas (as large as possible). Another consideration is the requirement of the model. Many models require inputs with certain resolutions (multiple of 2, 4, 8, ...). You may also consider it when setting the crop size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why 352*1216？ #69

Why 352*1216？ #69

ciwei123 commented Nov 18, 2022

zhyever commented Nov 29, 2022

ciwei123 commented Nov 30, 2022 •

edited

Loading

zhyever commented Jan 31, 2023 •

edited

Loading

ciwei123 commented Jan 31, 2023

zhyever commented Jan 31, 2023

Why 352*1216？ #69

Why 352*1216？ #69

Comments

ciwei123 commented Nov 18, 2022

zhyever commented Nov 29, 2022

ciwei123 commented Nov 30, 2022 • edited Loading

zhyever commented Jan 31, 2023 • edited Loading

ciwei123 commented Jan 31, 2023

zhyever commented Jan 31, 2023

ciwei123 commented Nov 30, 2022 •

edited

Loading

zhyever commented Jan 31, 2023 •

edited

Loading