This repo is the official PyTorch implementation for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization accepted by CVIU.
To use this LAV-DF dataset, you should agree the terms and conditions.
Download link: OneDrive, Google Drive, HuggingFace.
Method | [email protected] | [email protected] | [email protected] | AR@100 | AR@50 | AR@20 | AR@10 |
---|---|---|---|---|---|---|---|
BA-TFD | 79.15 | 38.57 | 00.24 | 67.03 | 64.18 | 60.89 | 58.51 |
BA-TFD+ | 96.30 | 84.96 | 04.44 | 81.62 | 80.48 | 79.40 | 78.75 |
Please note this result of BA-TFD is slightly better than the one reported in the paper. This is because we have used the better hyperparameters in this repository.
The main versions are,
- Python >= 3.7, < 3.11
- PyTorch >= 1.13
- torchvision >= 0.14
- pytorch_lightning == 1.7.*
Run the following command to install the required packages.
pip install -r requirements.txt
Train the BA-TFD introduced in paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization with default hyperparameter on LAV-DF dataset.
python train.py \
--config ./config/batfd_default.toml \
--data_root <DATASET_PATH> \
--batch_size 4 --num_workers 8 --gpus 1 --precision 16
The checkpoint will be saved in ckpt
directory, and the tensorboard log will be saved in lighntning_logs
directory.
Train the BA-TFD+ introduced in paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization with default hyperparameter on LAV-DF dataset.
python train.py \
--config ./config/batfd_plus_default.toml \
--data_root <DATASET_PATH> \
--batch_size 4 --num_workers 8 --gpus 2 --precision 32
Please use FP32
for training BA-TFD+ as FP16
will cause inf and nan.
The checkpoint will be saved in ckpt
directory, and the tensorboard log will be saved in lighntning_logs
directory.
Please run the following command to evaluate the model with the checkpoint saved in ckpt
directory.
Besides, you can also download the BA-TFD and BA-TFD+ pretrained models.
python evaluate.py \
--config <CONFIG_PATH> \
--data_root <DATASET_PATH> \
--checkpoint <CHECKPOINT_PATH> \
--batch_size 1 --num_workers 4
In the script, there will be a temporal inference results generated in output
directory, and the AP and AR scores will
be printed in the console.
Note please make sure only one GPU is visible to the evaluation script.
This project is under the CC BY-NC 4.0 license. See LICENSE for details.
If you find this work useful in your research, please cite them.
The conference paper,
@inproceedings{cai2022you,
title = {Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization},
author = {Cai, Zhixi and Stefanov, Kalin and Dhall, Abhinav and Hayat, Munawar},
booktitle = {2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)},
year = {2022},
doi = {10.1109/DICTA56598.2022.10034605},
pages = {1--10},
address = {Sydney, Australia},
}
The extended journal version is accepted by CVIU,
@article{cai2023glitch,
title = {Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization},
author = {Cai, Zhixi and Ghosh, Shreya and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin and Hayat, Munawar},
journal = {Computer Vision and Image Understanding},
year = {2023},
volume = {236},
pages = {103818},
issn = {1077-3142},
doi = {10.1016/j.cviu.2023.103818},
}
Some code related to boundary matching mechanism is borrowed from JJBOY/BMN-Boundary-Matching-Network and xxcheng0708/BSNPlusPlus-boundary-sensitive-network.