From video / pose data? #1

antithing · 2022-09-24T20:53:47Z

Hi! This code looks great, thanks for making it available!

If i have a RGBD camera with pose info (tx,ty,tz, qx,qy,qz,qw) and corresponding RGB / RGBD frames, could I run them through and get a global mesh from a video sequence?

weigert · 2022-09-25T10:43:33Z

That's exactly what I use it for in one of the examples in the main README (single image). It is not yet ready for multi-image, and also not as a simple CLI meshing tool. Sorry! There is still some development required:

A few considerations (single image):

In the triangulation, a single vertex might represent two (or more!) vertices in 3D space (i.e. occlusion is happening). To recognise this and spawn multiple vertices, a statistical argument has to be made based on plane goodness of fit for the surrounding triangles. This would lead to non-water tight meshes (which is true given occlusion).
RGB-D data can be non-dense and have holes. Therefore in some cases it makes sense to not triangulate areas without depth information, focusing on areas which do. I have almost completed adapting the triangulation algorithm to allow this, but it's very tricky. This would allow for non-rectangular boundaries, holes, etc. in the adaptive triangulation and not meshing where data is not available.

But this definitely currently works with single RGB-D images as you can see in the README.

Multiple is much more tricky:

A long video sequence can take any path through space and so we can't trivially build a global stitched image panorama
The images can be rectified using homographies given the current pose in the sequence. The stitched, rectified image could be used to form a large (non-rectangular boundary!) triangulation on a sphere, and use that as our mesh guess (it a position)
We then adaptively warp this mesh through the rectified image spheres as we move through the sequence. Note that the topological warping is not yet implemented either but is necessary for long image sequences, very tricky subject.
Finally, we have our camera positions, rectified image spheres at every position and warped meshes at every position. Using a registered point cloud, we can then fit the mesh to the data and use the images as the texture!

So overall the process would be:

For every pose, build rectified image spheres using homographies.
Triangulate the spheres and warp them adaptively to make sure that you get one, dynamic "mesh"
Pointcloud Registration on RGB-D data for a global 3D model. Downsample as far as you like.
Fit the mesh to the planes, apply statistics to duplicate the vertices in 3D

I currently don't have time to work on these aspect myself, but if you (or anyone) are interested in helping develop this concept feel free to let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From video / pose data? #1

From video / pose data? #1

antithing commented Sep 24, 2022

weigert commented Sep 25, 2022

From video / pose data? #1

From video / pose data? #1

Comments

antithing commented Sep 24, 2022

weigert commented Sep 25, 2022