Localized Audio Visual DeepFake Dataset (LAV-DF)

This repo is the official PyTorch implementation for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization accepted by CVIU.

LAV-DF Dataset

Download

To use this LAV-DF dataset, you should agree the terms and conditions.

Download link: OneDrive, Google Drive, HuggingFace.

Baseline Benchmark

Method	[email protected]	[email protected]	[email protected]	AR@100	AR@50	AR@20	AR@10
BA-TFD	79.15	38.57	00.24	67.03	64.18	60.89	58.51
BA-TFD+	96.30	84.96	04.44	81.62	80.48	79.40	78.75

Please note this result of BA-TFD is slightly better than the one reported in the paper. This is because we have used the better hyperparameters in this repository.

Baseline Models

Requirements

The main versions are,

Python >= 3.7, < 3.11
PyTorch >= 1.13
torchvision >= 0.14
pytorch_lightning == 1.7.*

Run the following command to install the required packages.

pip install -r requirements.txt

Training BA-TFD

Train the BA-TFD introduced in paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization with default hyperparameter on LAV-DF dataset.

python train.py \
  --config ./config/batfd_default.toml \
  --data_root <DATASET_PATH> \
  --batch_size 4 --num_workers 8 --gpus 1 --precision 16

The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory.

Training BA-TFD+

Train the BA-TFD+ introduced in paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization with default hyperparameter on LAV-DF dataset.

python train.py \
  --config ./config/batfd_plus_default.toml \
  --data_root <DATASET_PATH> \
  --batch_size 4 --num_workers 8 --gpus 2 --precision 32

Please use FP32 for training BA-TFD+ as FP16 will cause inf and nan.

The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory.

Evaluation

Please run the following command to evaluate the model with the checkpoint saved in ckpt directory.

Besides, you can also download the BA-TFD and BA-TFD+ pretrained models.

python evaluate.py \
  --config <CONFIG_PATH> \
  --data_root <DATASET_PATH> \
  --checkpoint <CHECKPOINT_PATH> \
  --batch_size 1 --num_workers 4

In the script, there will be a temporal inference results generated in output directory, and the AP and AR scores will be printed in the console.

Note please make sure only one GPU is visible to the evaluation script.

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful in your research, please cite them.

The conference paper,

@inproceedings{cai2022you,
  title = {Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization},
  author = {Cai, Zhixi and Stefanov, Kalin and Dhall, Abhinav and Hayat, Munawar},
  booktitle = {2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)},
  year = {2022},
  doi = {10.1109/DICTA56598.2022.10034605},
  pages = {1--10},
  address = {Sydney, Australia},
}

The extended journal version is accepted by CVIU,

@article{cai2023glitch,
  title = {Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization},
  author = {Cai, Zhixi and Ghosh, Shreya and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin and Hayat, Munawar},
  journal = {Computer Vision and Image Understanding},
  year = {2023},
  volume = {236},
  pages = {103818},
  issn = {1077-3142},
  doi = {10.1016/j.cviu.2023.103818},
}

Acknowledgements

Some code related to boundary matching mechanism is borrowed from JJBOY/BMN-Boundary-Matching-Network and xxcheng0708/BSNPlusPlus-boundary-sensitive-network.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
config		config
dataset		dataset
model		model
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
TERMS_AND_CONDITIONS.md		TERMS_AND_CONDITIONS.md
evaluate.py		evaluate.py
inference.py		inference.py
loss.py		loss.py
metrics.py		metrics.py
post_process.py		post_process.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Localized Audio Visual DeepFake Dataset (LAV-DF)

LAV-DF Dataset

Download

Baseline Benchmark

Baseline Models

Requirements

Training BA-TFD

Training BA-TFD+

Evaluation

License

References

Acknowledgements

About

Releases 2

Packages

Languages

License

ControlNet/LAV-DF

Folders and files

Latest commit

History

Repository files navigation

Localized Audio Visual DeepFake Dataset (LAV-DF)

LAV-DF Dataset

Download

Baseline Benchmark

Baseline Models

Requirements

Training BA-TFD

Training BA-TFD+

Evaluation

License

References

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages