July 2019
tl;dr: Estimate the intrinsics in addition to the extrinsics of the camera from any video.
This work eliminates the assumption of the availability of intrinsics. This opens up a whole lot possibilities to learn from a wide range of videos.
This network regresses depth, ego-motion, object motion and camera intrinsics from mono videos.
- Estimate each of the intrinsics
- Occlusion aware loss (picking the most foreground pixels during photometric loss calculation)
- Foreground mask to mask out the possible moving objects.
- Use a randomized layer optimization (this is quite weird)
- Sometimes an overall supervision signal is given to two tightly coupled parameters and it is not enough to get accurate estimate for both parameters. (cf. Deep3Dbox)
- In detail, how was the lens correction regressed?
- See interview with the CEO of isee on this paper.
- Q: Can we project the intermediate representation (3D points) to BEV instead of back to camera plane for loss calculation? This would eliminate the need for using occlusion-aware loss.