Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval
This is the official implementation of the paper Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval (WACV 2024).
- Install python packages.
$ pip install -r requirements.txt
- (Optional) Boost mAP calculation.
$ python setup.py install
-
The datasets can be downloaded from our OneDrive.
-
The folder structure should be like this:
./data ├── ActivityNet │ ├── C3D │ │ └── activitynet_v1-3_c3d.hdf5 │ ├── I3D │ │ ├── v_00Dk03Jr70M.npy │ │ ├── v_00KMCm2oGhk.npy │ │ ├── ... | | └── ... │ ├── multi_test.json │ ├── test.json │ ├── train.json │ └── val.json ├── CharadesSTA │ ├── VGG │ │ └── vgg_rgb_features.hdf5 │ ├── C3D │ │ └── Charades_C3D.hdf5 │ ├── I3D │ │ ├── 001YG.npy │ │ ├── 003WS.npy │ │ ├── ... | | └── ... │ ├── multi_test.json │ ├── test.json │ └── train.json └── QVHighlights ├── features │ ├── clip_features │ ├── clip_text_features │ └── slowfast_features ├── test.json ├── train.json └── val.json
All our training configuration files are in the ./configs
folder. The training command is as follows:
-
Single GPU training.
$ python main.py --config path/to/config.json --logdir path/to/log/dir
For example, to train the model on CharadesSTA dataset with VGG backbone:
$ python main.py --config ./configs/charades-VGG.json --logdir ./logs/charades-VGG-log
-
Multi-GPU training. For example, to train the model on CharadesSTA dataset with VGG backbone on 4 GPUs:
$ CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config ./configs/charades-VGG.json --logdir ./logs/charades-VGG-log
The pretrained models used in our paper can be downloaded from OneDrive.
-
It is recommended to put the pretrained models in the
./logs
folder../logs ├── activity-C3D-log │ ├── ... ├── activity-I3D-log │ ├── ... ├── charades-C3D-log │ ├── ... ├── charades-I3D-log │ ├── ... ├── charades-VGG-log │ ├── best.pth │ └── config.json └── qv-log ├── ...
-
Reproduce the results in our paper. Take the CharadesSTA dataset as an example:
$ python main.py --test_only --config ./logs/charades-VGG-log/config.json --logdir ./logs/charades-VGG-log
If you find this code useful for your research, please cite our paper:
@InProceedings{Huang_2024_WACV,
author = {Cheng Huang, Yi-Lun Wu, Hong-Han Shuai, Ching-Chun Huang},
title = {Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024}
}