Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about input depth image #5

Open
gujiaqivadin opened this issue Apr 10, 2020 · 4 comments
Open

Question about input depth image #5

gujiaqivadin opened this issue Apr 10, 2020 · 4 comments

Comments

@gujiaqivadin
Copy link

Hello, valgur!
Thanks for sharing your code about computing surface-normal. I have a question about the input depth image. I know it is from KITTI depth completion dataset, but dont know it is a input sparse depth map or dense ground truth depth map. And will the sparisty of depth map effect the quality of surface normal?
Also, the second question is that when we are training our model, we will use cropped depth image, can it compute the right surface normal at 256x512 depth image size but not full scale size.

@valgur
Copy link
Owner

valgur commented Apr 11, 2020

input sparse depth map or dense ground truth depth map. And will the sparsity of depth map effect the quality of surface normal?

Since you probably want to use the normal images as the ground truth for a model, you want them to be as high-quality as possible. In general, the denser, aggregated KITTI depth completion ground truth images will work much better for this. The surface normal estimation algorithm estimates a local plane in an nxn window (the default window size is 15 px), so you will have more points in the window for a more accurate estimate and will also have more than the minimal required 3 points within that window. The only situation where the sparser single-scan depth maps are more accurate seems to be in the presence of dynamic objects, where the noise from the imperfect point cloud aggregation in the denser depth map results in some "wobblyness" in the normals for the dynamic objects. Also, the sparse depth images have not been filtered to exclude points that should be occluded, but overlap with closer ones due to being transformed into the camera frame.

can it compute the right surface normal at 256x512 depth image size but not full scale size.

If you are asking whether the model trained on cropped images can also process full-size images, then yes, the DeepLidar model is a convolutional model and is not limited to a fixed image size.

@gujiaqivadin
Copy link
Author

gujiaqivadin commented Apr 12, 2020

1st question: Thanks for your detailed answer. maybe if I want to supervise surface normal for my output depth map, I will choose to use (sparse+gt) to generate surface normal to get more precise surface normal groundtruth.
2nd question: Yes. I got you about the size of model input. But my question has an another meaning. If I want to supervise depth and surface normal in one pipeline, I need to generate surface normal from a cropped 256x512 image(Because we use this size in depth supervising pipeline).
Therefore, I see the there are cx,cy,f arguements in surface-normal function input args. But in a cropped image, these arguements will not have any meaning because we dont know where the cropped area is.
So I wonder to know if I need to generate surface normal from a cropped depth image, need I store cropped areas(th,tw) compared to the whole size in model pipeline?

@valgur
Copy link
Owner

valgur commented Apr 12, 2020

  1. Looking at a concrete sparse vs dense depth input example for surface normal estimation:
    normals_sparse
    normals_dense
    The sparse normals look more accurate to me and have better spatial coverage in some places, so in that sense they might work better as GT. Using just the sparse normals or combining sparse+dense might work quite well, but the overlapping occluded points is still a likely issue that might need to be corrected for.

  2. I agree, trying to predict normals in a cropped image without knowledge of the offset from the camera center and the focal length is rather questionable. The neural net will definitely learn to guess these values to some degree, but either

  • providing the angular offset from the image center in some form or
  • modifying the definition of the normal values might be a better approach, perhaps.
    By the latter I mean possibly changing the coordinate frame in which the normal direction coordinates are provided in, so that the z-direction of the frame points towards the 3D location of the normal instead of using the same camera frame for all points. The perspective distortion at the image edges might still cause problems with this I approach, though, I guess.

@anthcolange
Copy link

Hi, how does one access these dense depth GT images? I've only been able to find the sparse ones shown first in the above post. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants