The current implementation supports two execution modes: Batch, requiring a folder of image sequences and a text file of time-stamped inertial measurements, or online using a ROS bag. The format for the image sequences and text files are described here. The format of the ROS messages is described here (imu,image).
We provide several sequences recorded by a Tango platform. The images are recorded at 30 Hz in VGA (640x480) size, and the inertial measurements are recorded at 200 Hz. You can download the sequences here as compressed files (.tar.gz
) or ROS bags (.bag
).
Once you have downloaded the data as compressed .tar.gz
files, uncompress them into the directory of your choice and set the environment variable $DATAROOT
to that directory as the following:
export DATAROOT=/DIRECTORY/OF/YOUR/CHOICE
Note, the $DATAROOT
directory should contain datasets structured in the format described here.
From the project root directory, run the estimator with configuration specified by the option -cfg cfg/vio.json
on the data9_workbench
sequence (option -seq data9_workbench
) which resides in directory $DATAROOT
(option -root $DATAROOT
) as the following:
bin/vio -cfg cfg/vio.json -root $DATAROOT -seq data9_workbench -out out_state -dataset xivo
where estimated states are saved to the output file out_state
.
For detailed usage of the application, see the flags defined at the beginning of the application source file src/app/vio.cpp
.
We use glog for system logging. If you want to log runtime information to a text file for inspection later, first create a directory to hold logs
mkdir /YOUR/LOG/DIRECTORY/HERE
and then run the estimator with a prefix to specify the log directory:
GLOG_log_dir=/YOUR/LOG/DIRECTORY/HERE bin/vio -cfg cfg/vio.json -root $DATAROOT -seq data9_workbench -out out_state -dataset xivo
Note, by default, log is suppressed for runtime speed by setting add_definitions(-DGOOGLE_STRIP_LOG=1)
in src/CMakeLists.txt
. To enable log, simply comment out that line and re-build.
For more details on glog, see the tutorial here.
XIVO's dependencies include OpenCV, eigen, g2o, gflags, glog, googletest, gperftools, imu_tk, jsoncpp, Pangolin, pybind, and stb. You will need to install OpenCV 3.4.x and its accompanying OpenCV Contrib library. The rest of the dependencies are included with XIVO and automatically compiled with the build script.
To build, run ./build.sh
in XIVO's root directory.
By default the ROS node of the system will not be built in case that your operating system does not have ROS installed.
However, if you want to use the system with ROS support, you need to first install ROS (see instructions here), and then build with both ROS and g2o.
./build.sh ros g2o
- In the project root directory,
source build/devel/setup.zsh
. If another shell, such as bash/sh, is used, source the corresponding shell script (setup.bash
/setup.sh
) respectively. roslaunch node/launch/xivo.launch
to launch the ros node, which subscribes to/imu0
and/cam0/image_raw
topics.- In antoher terminal, playback the rosbag of your choice, i.e.,
rosbag play /PATH/TO/YOUR/ROSBAG
.
A simple python binding is provided by wrapping some public interfaces of the estimator via pybind11. Check out pybind11/pyxivo.cpp
for the available interfaces in python. With pybind11, it is relatively easy if you want to expose more interfaces of the C++ implementation to python.
An example of using the Python binding is available in scripts/pyxivo.py
, which demonstrates estimator creation, data loading, and visualization in python.
To run the demo, execute:
python scripts/pyxivo.py -cfg cfg/phab.json -root $DATAROOT -seq data9_workbench -dataset xivo -use_viewer
in the project root directory. The command-line options are more or less the same as the C++ executable. For detailed usage, you can look at the options defined at the beginning of the script scripts/pyxivo.py
. Note you might need to install some python dependencies by executing the following in the project root directory:
pip install -r requirements.txt
We benchmarked the performance of our system in terms of ATE (Absolute Trajectory Error), RPE (Relative Pose Error), and computational cost against other top-performing open-source implementations, i.e., OKVIS [Leutenegger et al.], VINS-Mono [Qin et al.] and ROVIO [Bloesch et al.], on publicly available datasets. Our implementation achieves comparable accuracy at a fraction of the computational cost. On a desktop PC equipped with an Intel Core i7 CPU @ 3.6 GHz, our system runs at around 140 Hz at low CPU consumption rate. As a comparison, OKVIS and VINS-Mono runs at around 20 Hz, and ROVIO runs at around 60 Hz. The runtime of our system can be further improved by better utilizing CPU cache and memory.
OKVIS and VINS-Mono are optimization based, which means they operate on keyframes in an iterative manner, which in general results in more accurate pose estimates at the price of higher latency and computational cost. ROVIO and XIVO are filtering based, which are causal and much cheaper in terms of computatioanl cost. Yet, they produce pose estimates comparable to optimization based methods.
Besides, OKVIS runs on stereo images, whereas the other three methods only use monocular images.
We benchmarked the runtime of OKVIS, VINS-Mono, ROVIO and XIVO on a desktop machine equipped with an Intel Core i7 CPU @ 3.6 GHz. The table below shows the runtime of the feature processing and state update modules.
Module | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|
Feature detection & matching | 15ms | 20ms | 1ms* | 3 ms |
State update | 42ms | 50m | 13ms | 4 ms |
* ROVIO is a 'direct' method that skips the feature matching step and directly uses the photometric error as the innovation term in EKF update step. Since it uses Iterative Extended Kalman Filter (IEKF) for state update, it's slower than our EKF-based method.
OKVIS and VINS-Mono (marked with Keyframe) perform iterative nonlinear least square on keyframes for state estimation, and thus are much slower in the state update step.
We compared the performance of our system in terms of ATE and RPE on two publicly available datasets: TUM-VI and EuRoC. We achieve comparable pose estimation accuracy at a fraction of the computational cost of the top-performing open-source implementations.
The following table shows the performance on 6 indoor sequences where ground-truth poses are available. The numbers for OKVIS, VINS-Mono, and ROVIO are taken from the TUM-VI benchmark paper. The evaluation script of XIVO can be found in misc/run_all.sh
.
Sequence | length | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|---|
room1 | 156m | 0.06m | 0.07m | 0.16m | 0.12m |
room2 | 142m | 0.11m | 0.07m | 0.33m | 0.08m |
room3 | 135m | 0.07m | 0.11m | 0.15m | 0.13m |
room4 | 68m | 0.03m | 0.04m | 0.09m | 0.08m |
room5 | 131m | 0.07m | 0.20m | 0.12m | 0.09m |
room6 | 67m | 0.04m | 0.08m | 0.05m | 0.10m |
Table 1. RMSE ATE in meters. Methods marked with Keyframe are keyframe-based, others are recursive approaches.
Sequence | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|
room1 | 0.013m/0.43o | 0.015m/0.44o | 0.029m/0.53o | 0.022m/0.53o |
room2 | 0.015m/0.62o | 0.017m/0.63o | 0.030m/0.67o | 0.037m/0.72o |
room3 | 0.012m/0.64o | 0.023m/0.63o | 0.027m/0.66o | 0.061m/0.74o |
room4 | 0.012m/0.57o | 0.015m/0.41o | 0.022m/0.61o | 0.024m/0.64o |
room5 | 0.012m/0.47o | 0.026m/0.47o | 0.031m/0.60o | 0.027m/0.57o |
room6 | 0.012m/0.49o | 0.014m/0.44o | 0.019m/0.50o | 0.025m/0.54o |
Table 2. RMSE RPE in translation (meters) and rotation (degrees). Methods marked with Keyframe are keyframe-based, others are recursive approaches.
Benchmark results on the EuRoC dataset will be available soon.