This is the PyTorch implementation of our paper, a hierarchical scene coordinate prediction approach for one-shot RGB camera relocalization:
Hierarchical Scene Coordinate Classification and Regression for Visual Localization, CVPR 2020
Xiaotian Li, Shuzhe Wang, Yi Zhao, Jakob Verbeek, Juho Kannala
Python3 and the following packages are required:
cython
numpy
pytorch
opencv
tqdm
imgaug
It is recommended to use a conda environment:
- Install anaconda or miniconda.
- Create the environment:
conda env create -f environment.yml
. - Activate the environment:
conda activate hscnet
.
To run the evaluation script, you will need to build the cython module:
cd ./pnpransac
python setup.py build_ext --inplace
We currently support 7-Scenes, 12-Scenes, Cambridge Landmarks, and the three combined scenes which have been used in the paper. We will upload the code for the Aachen Day-Night dataset experiments.
You will need to download the datasets from the websites, and we provide a data package which contains other necessary files for reproducing our results. Note that for the Cambridge Landmarks dataset, you will also need to rename the files according to the train/test.txt
files and put them in the train/test
folders. And the depth maps we used for this dataset are from DSAC++. The provided label maps are obtained by running k-means hierarchically on the 3D points.
The trained models for the main experiments in the paper can be downloaded here.
To evaluate on a scene from a dataset:
python eval.py \
--model [hscnet|scrnet] \
--dataset [7S|12S|Cambridge|i7S|i12S|i19S] \
--scene scene_name \
--checkpoint /path/to/saved/model/ \
--data_path /path/to/data/
You can train the hierarchical scene coordinate network or the baseline regression network by running the following command:
python train.py \
--model [hscnet|scrnet] \
--dataset [7S|12S|Cambridge|i7S|i12S|i19S] \
--scene scene_name \ # not required for the combined scenes
--n_iter number_of_training_iterations \
--data_path /path/to/data/
Copyright (c) 2020 AaltoVision.
This code is released under the MIT License.
The PnP-RANSAC pose solver builds on DSAC++. The sensor calibration file and the normalization translation files for the 7-Scenes dataset are from DSAC. The rendered depth images for the Cambridge Landmarks dataset are from DSAC++.
Please consider citing our paper if you find this code useful for your research:
@inproceedings{li2020hscnet,
title = {Hierarchical Scene Coordinate Classification and Regression for Visual Localization},
author = {Li, Xiaotian and Wang, Shuzhe and Zhao, Yi and Verbeek, Jakob and Kannala, Juho},
booktitle = {CVPR},
year = {2020}
}