Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Georeferenced datasets processed on Metashape are loaded incorrectly/can't be loaded using splatfacto. #3255

Open
gaigc opened this issue Jun 25, 2024 · 0 comments

Comments

@gaigc
Copy link

gaigc commented Jun 25, 2024

Describe the bug
I've noticed that, when trying to use a dataset that I've aligned using GPS reference in Metashape, it will not load, or it will load but produce no results. This has been an issue since nerfstudio implemented loading in point clouds (.ply) for splat seeding.

I thought that @simonbethke might have reported this problem when it first was being tested in pull #3122. When @jb-ye asked for sample data, I assumed that they had shared that somewhere, but I couldn't find any issue related to this, so I'm making one here.

On older (month ago) nerfstudio and gsplat versions, I was getting error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

On the ODM_mygla dataset on current versions, I'm getting this error after ~1000its:
CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}") ZeroDivisionError: division by zero

If I do process the data without the gps reference, it does load correctly.

I'm able to use this data when exporting using the gaussian splatting metashape script on inria's and postshot implementation of gaussian splatting. The data is exported using colmap format.

Why use GPS reference

I've found that using GPS reference in metashape helps with alignment (speed and accuracy), and usually results in a consistent scale, orientation, and ground plane.

I'm not asking to somehow implement GPS data in splats/nerfstudio-data, but to simply accept data created with it.

To Reproduce
My workflow usually consists of:

  1. importing dataset(s) into metashape in different chunks. GPS data is auto selected in the reference panel from exif.
  2. Align images, usually with high/highest details, and high number of keypoints and unlimited tiepoints.
  3. Export camera positions and sparse/tie pointcloud using a batch job (Also tried manually)
  4. Process data into nerfstudio using the following command: ns-process-data metashape --data "e:/3D Datasets/[Dataset Name]" --xml "e:/3D Datasets/[Dataset Name]/db.xml" --ply "e:/3D Datasets/[Dataset Name]/PointCloud.ply" --output-dir e:/NerfStudio/data/[Dataset Name]/out
  5. Run Splatfacto with ns-train splatfacto --output-dir ./outputs/ODMlogs nerfstudio-data --data ./data/ODMlogs/out
  6. After undistorting it either loads incorrectly or give an error

I've tried exporting camera positions/point cloud with local coordinates and wgs84 as the coordinate system, both give the same error.

Expected behavior
To be able to load the data that has been georeferenced.

Alternative solution
Using the colmap export script data to create the nerfstudio data, but this will only be an option with people who have Metashape pro, and in my opinion a workaround (maybe still useful for using data used in other programs).

I tried to simply copy and paste this data into the colmap structure in nerfstudio, but I failed. I'm sure that there is a command/script to convert this into something that nerfstudio can run, but I wasn't able to get something that works.

Here is how the folder is formatted in case it helps:

[Dataset]/
├─ node_modules/
├─ images/
│ ├─ Image1.jpg
│ ├─ Image2.jpg
│ ├─ Image3.jpg
├─ sparse/
│ ├─ 0/
│ │ ├─ cameras.bin
│ │ ├─ images.bin
│ │ ├─ points3D.bin

Screenshots
ODM_mygla processed using splatfacto georeferenced before it crashes:
image
The only notable detail is a small white dot at the bottom of the scene at possibly infinite distance. ODM_helenenschacht also displays similar results, but doesn't crash. Camera sometimes are obscured by the scene, so I need to disable composite depth to use the camera positions as reference.

ODM_mygla processed using nerfacto on same dataset:
image

Additional context
Machines specs and info:
Pc 1- R9 5900X, 128gb ram, 3060 12gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.2, Gsplat 1.0.0
Pc 2- i7 7700HQ, 16gb ram, 1060 6gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.0, Gsplat 0.1.12
All latest nvidia drivers, also tested with drivers from 4 months ago.
(I'm aware that Pc 2 won't be able to run any future gsplats, still included this info since the old version was giving different errors that might help narrowing down the problem)

Data processed using Metashape 2.x

I capture my own data using a Mavic Air 2s, it often aligns a few meters below ground, but even when adjusting for that, there are errors. I'm not sure how to share my own dataset, so here are some datasets that I've tested and display the same errors:

Datasets for reference:
ODM_mygla 41 images ~5mb each. Captured on DJI phantom 3

Here are the export files as .txt, you'll need to change the extension:
db.xml.txt
PointCloud.ply.txt

ODM_helenenschacht 176 images ~12mb each. Captured on Autel Evo II Pro RTK
I've also tested, but pointcloud is too big. Here is the camera positions:
db.xml.txt

This is my first issue submitted, so apologies for any missing info, or bad etiquette.

Dump of Logs

Console error from trying to run splatfacto with ODM_mygla dataset on nerfstudio 1.1.2 with PC 1

890 (2.97%)         8.359 ms             4 m, 3 s             78.09 M
----------------------------------------------------------------------------------------------------   splatfacto.py:
Viewer running locally at: http://localhost:7007 (listening on 0.0.0.0)
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0155
VanillaPipeline.get_train_loss_dict: 0.0118
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 265, in train
    callback.run_callback_at_location(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 115, in run_callback_at_location
    self.run_callback(step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 100, in run_callback
    self.func(*self.args, **self.kwargs, step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 456, in refinement_after
    split_params = self.split_gaussians(splits, nsamps)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 543, in split_gaussians
    CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}")
ZeroDivisionError: division by zero 

Console error from trying to run splatfacto with nerfstudio 1.1.0 with PC 2

 [17:24:14] Caching / undistorting train images                                            full_images_datamanager.py:183
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 8.2725
VanillaPipeline.get_train_loss_dict: 8.2715
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 261, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 498, in train_iteration
    self.grad_scaler.scale(loss).backward()  # type: ignore
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\autograd\__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant