Skip to content
/ DAVO Public

The implementation of Dynamic Attention-based Visual Odometry.

Notifications You must be signed in to change notification settings

BassyKuo/DAVO

Repository files navigation

Dynamic Attention-based Visual Odometry

os python tensorflow cuda license

Dynamic attention-based visual odometry framework (DAVO) is a learning-based VO method, for estimating the ego-motion of a monocular camera. DAVO dynamically adjusts the attention weights on different semantic categories for different motion scenarios based on optical flow maps. These weighted semantic categories can then be used to generate attention maps that highlight the relative importance of different semantic regions in input frames for pose estimation. In order to examine the proposed DAVO, we perform a number of experiments on the KITTI Visual Odometry and SLAM benchmark suite to quantitatively and qualitatively inspect the impacts of the dynamically adjusted weights on the accuracy of the evaluated trajectories. Moreover, we design a set of ablation analyses to justify each of our design choices, and validate the effectiveness as well as the advantages of DAVO. Our experiments on the KITTI dataset shows that the proposed DAVO framework does provide satisfactory performance in ego-motion estimation, and is able deliver competitive performance when compared to the contemporary VO methods.

Environment

This codebase is tested on Ubuntu 16.04 with Tensorflow 1.13.1 and CUDA 10.0 (w/ cuDNN 7.5).

Installation

1. Clone this repository.

git clone https://github.com/BassyKuo/DAVO.git

2. Check the python version

Make sure the python version you used is python3.6.

python -V                           # should be python 3.6

3. Requirements

Install necessary packages from the requirement file.

pip install -r requirements.txt

(optional) 4. Use customized cuda path

If you do not have CUDA 10.0, please download CUDA 10.0 toolkit from the official website or here, and set the path where you installed to $CUDA_HOME.

export CUDA_HOME="<your_cuda_path>"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"

Quickstart

Use following scripts to quickly start while you have already setup the environment and the dataset.

Training

version="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"

# Train the model w/ flipping images.
./run_training.sh ${version}

# Train the model w/ augmented images (including flipping, brightness, contrast, saturation, hue).
./run_training.sh ${version} --data_aug
➡️ Check here to see more available ${version} names.

Inference

export ckpt_dir="ckpt_dir/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh/"
export version="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"
export seq_name="03"
export ckpt_step="1500000"
export output_root="test_DAVO"

# Run the script to generate predicted poses of sequence ${seq_name}.
./run_inference.sh ${ckpt_dir} ${version} ${seq_name} ${ckpt_step} ${output_root}

# The result would be saved in ${output_root}/${version}--model-${ckpt_step}
➡️ Download the pretrained model from our google drive and set "pretrain-ckpt" to $ckpt_dir for quickly.

Evaluation

cd kitti_benchmark/

export test_output_dir="../${output_root}/${version}--model-${ckpt_step}"
export save_name="${version}--model-${ckpt_step}"

# Use pose_kitti_eval.sh to run the KITTI Benchmark.
./pose_kitti_eval.sh ${test_output_dir} ${save_name}
➡️ Check here to see more information.

Prepare Training/Testing Data

There are three types of inputs used in DAVO:

  1. RGB frames
  2. Optical flows
  3. Semantic segmentations

Please check here to see how to prepare them for DAVO.

Training

Once the data are formatted following the above instructions, you are able to train the model with the following command:

python train.py \
    --dataset_dir=$kitti_odom_dump \
    --img_width=416 \
    --img_height=128 \
    --batch_size=4 \
    --seq_length=3 \
    --max_steps=310000 \
    --save_freq=25000 \
    --learning_rate=0.001 \
    --pose_weight=0.1 \
    --checkpoint_dir=./ckpt/${version} \
    --version=${version}

Example:

python train.py \
    --dataset_dir=./kitti_odom-dump/ --img_width=416 --img_height=128 --batch_size=4 \
    --seq_length=3 --max_steps=310000 --save_freq=25000 --learning_rate=0.001  --pose_weight=0.1 \
    --checkpoint_dir=./ckpt/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh \
    --version=v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh

Note that all available ${version} names are defined here.

Evaluation

To evaluate the pose estimation performance in this work, use the following command to produce esitimated poses first:

python test_kitti_pose.py \
    --test_seq=$seq \
    --concat_img_dir=./kitti_odom-dump/ \
    --ckpt_file=${ckpt_file} \
    --version=${version} \
    --output_dir=${output_dir}

Example:

export VESION="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"
export MODEL_NAME="$VERSION"

python test_kitti_pose.py \
    --test_seq=3 \
    --concat_img_dir=./kitti_odom-dump/ \
    --ckpt_file=./ckpt/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh/model-10 \
    --version=$VERSION
    --output_dir=./test_DAVO/$SAVE_NAME

Then copy the prediction file to the kitti_benchmark/results/$MDOEL_NAME/data/ folder and execute ./test_odometry_all $MODEL_NAME. Please check here to see how to do that, or use the quickscript:

cd kitti_benchmark/
./pose_kitti_eval.sh ../test_DAVO/$SAVE_NAME $SAVE_NAME
➡️ Check here to see the evaluation results.

Visualization

In our work, we use evo tool to visualize trajetories with references sequences 00 to 10:

for seq in {00..10..1} ; do 
    evo_traj kitti \
        --ref kitti_benchmark/data/odometry/poses/${seq}.txt \
        ./test_DAVO/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh--model-10/${seq}-pred_kitti_pose.txt \
        -p --plot_mode xz --save_plot plots/${figure_name}.png
done

Because of the lack of ground truth poses in the testing sequence 11 to 21, we use trajetories generated from ORB-SLAM2-S (ORB-SLAM2 stereo version) to compare our prediction:

for seq in {11..21..1} ; do 
    evo_traj kitti \
        --ref kitti_benchmark/data/odometry/poses_from_ORBSLAM2-S/${seq}-ORB-SLAM2-S.txt \
        ./test_DAVO/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh--model-10/${seq}-pred_kitti_pose.txt \
        -p --plot_mode xz --save_plot plots/${figure_name}.png
done

[NOTE]

Before you use the --save_plot argument to save in png file, please change the export format first:

evo_config set plot_export_format png

you could also change trajetory colors:

evo_config set plot_seaborn_palette Dark2
➡️ Execute evo_config show to see more configurations.

Contact

Please feel free to contact us if you have any questions. 😄

Acknowledgements

We appreciate the great works/repos along this direction, such as SfMLearner, GeoNet, DeepMatchVO and also the evaluation tools such as KITTI VO/SLAM devkit and evo for KITTI full sequence evaluation.

About

The implementation of Dynamic Attention-based Visual Odometry.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published