This repository contains the code for the paper, EVAL: Explainable Video Anomaly Localization by Ashish Singh, Michael Jones and Erik Learned-Miller.
Abstract:
We develop a novel framework for single-scene video anomaly localization that allows for human-understandable reasons for the decisions the system makes. We first learn general representations of objects and their motions (using deep networks) and then use these representations to build a high-level, location-dependent model of any particular scene. This model can be used to detect anomalies in new videos of the same scene. Importantly, our approach is explainable. Our high-level appearance and motion features can provide human-understandable reasons for why any part of a video is classified as normal or anomalous. We conduct experiments on standard video anomaly detection datasets (Street Scene, CUHK Avenue, ShanghaiTech and UCSD Ped1, Ped2) and show significant improvements over the previous state-of-the-art.
After cloning this repository, the following commands can be run to create a virtual conda environment with the required packages installed.
conda create -n EVAL python=3.8
conda activate EVAL
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install joblib
pip install scipy
pip install more-itertools
pip install scikit-image
pip install scikit-learn
The implementation of EVAL is built on pytorch.
Executing EVAL is divided into 4 different stages:
-
feature extraction - uses the pretrained appearance and motion deep network models to extract the feature vectors for all video volumes in a video dataset
-
exemplar selection - builds the exemplar-based model for the training portion of the dataset given the features computed in stage 1.
-
anomaly detection - compute anomaly scores for all regions of all frames of the testing video given the features computed in stage 1 and the exemplar model computed in stage 2.
-
evaluation - Evaluating anomaly detection accuracy using the region-based detection criterion, the track-based detection criterion or the frame-level criterion
The following explains how to run each stage on a small subset of the Street Scene dataset which is included in this repository.
Their are config files for the Street Scene dataset that are already included in EVAL/config/. To run on a different dataset, these config files (for training and testing sets) would need to be edited so that the path to the training and testing frames is correct and the window size is appropriate for the new dataset.
To run feature extraction on a GPU machine, cd to EVAL/feature_extraction and run:
python main_feature_extraction.py --config ../config/config_SS_train.ini --idx N
where N is in [0, 1] and indicates which of the 2 training sequences (included in this small subset of Street Scene) to run on.
This commands needs to be run for each train sequence and each test sequence. A shell script for use with sbatch can also be set up to make this easier when there are many training/testing sequences.
This should result in feature vector files in EVAL/feature_extraction/Results/SS/Train/TrainNNN.joblib and EVAL/feature_extraction/Results/SS/Test/TestNNN.joblib
After all of the features have been precomputed for the training and testing videos, the next step is to build a model from the training sequences. This is done by exemplar selection.
Exemplar selection must be run on each training video sequence separately, and then the individual exemplar files are merged into a final exemplar file. To run exemplar selection on a single training sequence, use the following command from the EVAL/exemplar_selection directory:
python select_exemplars.py --dataset ss --version appearance_motion --id N
where N is in [0, 1] and indicates the training sequence to use.
This will result in an exemplar joblib file (and a count joblib file) for each training sequence in EVAL/exemplar_selection/results/SS/clusters_appearance_motion/.
To merge these into a final exemplar file, run:
python merge_exemplars.py --dataset ss --version appearance_motion
This will output to the screen a list of every spatial region and the total number of exemplars selected for the region. It also creates the final exemplar file in EVAL/exemplar_selection/results/SS/exemplars_appearance_motion_merged_ss.joblib.
After the model of normal training video has been created, the final step is to run anomaly detection on each testing video sequence. To do this, change to the EVAL/anomaly_detection directory and run the anomaly detection command on each testing sequence:
python anomaly.py --dataset ss --version appearance_motion --idx N
where N indicates the testing sequence to use. In this example using the subset of Street Scene, only one testing sequence is included and so N should be 0.
This results in anomaly score files (.npy) in EVAL/anomaly_detection/results/SS/anomaly_scores_appearance_motion/TestNNN.npy.
To visualize results you can use the visualize_anomaly.py program as follows:
python visualize_anomaly.py --dataset ss --version appearance_motion --id 0 --thresh 2.0
Output frames for the input test sequence specified by id will be put in results/SS/viz/appearance_motion/TestNNN where NNN depends on --id.
To evaluate the anomaly detection results, cd to EVAL/evaluation and edit the config file in EVAL/evaluation/config/config_ss.ini. However, it should not be necessary to edit the config file when using the code as is with Street Scene. Then run eval_main.py:
python eval_main.py --config config/config_ss.ini
This will create ROC curve files in EVAL/evaluation/results/SS/anomaly_scores_appearance_motion/ for region-based and track-based evaluation criteria.
To compute the area under the ROC curve for both of these evaluation criteria, run:
python compute_AUC.py results/SS/anomaly_scores_appearance_motion/region_bbox_fpr.npy results/SS/anomaly_scores_appearance_motion/region_bbox_tpr.npy
python compute_AUC.py results/SS/anomaly_scores_appearance_motion/track_bbox_fpr.npy results/SS/anomaly_scores_appearance_motion/track_bbox_tpr.npy
This will print to the screen the area under the curve (AUC) for each criterion. The area can be from 0.0 (worst) to 1.0 (perfect).
To evaluate the anomaly detection results using the frame level criterion (which does not evaluate spatial localization accuracy, only temporal), run:
python compute_frame_roc.py --td ../datasets/StreetScene/Test --ad ../anomaly_detection/results/SS/anomaly_scores_appearance_motion --rd results/SS/anomaly_scores_appearance_motion
Then, the area under the ROC curve is computed by:
python compute_AUC.py results/SS/anomaly_scores_appearance_motion/anomaly_scores_appearance_motion_frame_fpr.npy results/SS/anomaly_scores_appearance_motion/anomaly_scores_appearance_motion_frame_tpr.npy
If you use the software, please cite the following CVPR paper.
@article{SinghEVAL2023,
author = {Singh, Ashish and Jones, Michael J. and Learned-Miller, Erik G.},
title = {EVAL: Explainable Video Anomaly Localization},
year = 2023,
inproceedings = {CVPR},
url = {https://openaccess.thecvf.com/content/CVPR2023/papers/Singh_EVAL_Explainable_Video_Anomaly_Localization_CVPR_2023_paper.pdf}
}
Mike Jones [email protected]
See CONTRIBUTING.md for our policy on contributions.
Released under AGPL-3.0-or-later
license, as found in the LICENSE.md
file.
All files (except those in the datasets directory):
Copyright (c) 2023-2024 Mitsubishi Electric Research Laboratories (MERL).
SPDX-License-Identifier: AGPL-3.0-or-later
All files in the datasets directory containing part of the The Street Scene dataset are licensed under CC-BY-SA-4.0
license:
Copyright (c) 2023 Mitsubishi Electric Research Laboratories (MERL).
SPDX-License-Identifier: CC-BY-SA-4.0