Pre-processing of annotated music video datasets:
Requirements • How to Use • How to Cite
Tested with Python 2.7 and Ubuntu 16.04
pip install -r requirements.txt
sudo apt-get install -y sox
See more details
-
Download COGNIMUSE dataset:
- Download annotations
- Emotion: 2D (valence-arousal) with ranges between [-1, 1]
- 2 emotions = {Neg: 0, Pos: 1}
- 4 emotions = {NegHigh: 0, NegLow: 1, PosLow: 2, PosHigh: 3}
- Download videos, extract the last 30 minutes of each video, and copy them to
data/
- The final directory structure should be as follow:
.data +-- BMI | +-- emotion | | +-- intended_1.dat | +-- text | | +-- subtitle.srt | +-- video.mp4 ...
- Download annotations
-
Splice full video (with subtitle information) into S seconds each -> video, emotion, audio, text
- Run:
python video2splice.py
- Output:
.data_test +-- BMI | +-- audio_splices_Xsecs | | 0.wav | ... | | N.wav | +-- emotion | | +-- intended_1.dat | | +-- intended_1_[1D/2D].csv | | +-- intended_1_[1D/2D]_splices_Xsecs.csv | +-- text | | +-- subtitle.srt | | +-- text.csv | | +-- text_splices_Xsecs.csv | +-- video_splices_Xsecs | | 0.mp4 | ... | | N.mp4 ...
- Run:
-
(Optional) Transform audio to instrumental piano audio
- Run:
python audio2piano.py
More info, needs Python 2.7
- Run:
-
Save spliced data in Python's npz format
- Run:
python splices2npz.py
- Run after full video has been spliced accordingly
- Full: 7 annotated music videos divided into splices of S seconds stored in data_test/
- Run:
-
Results will be a train and test dataset with the npz extension in the same root directory containing the data folders
- Download dataset (need to sign EULA form):
- Train data:
- Choose a video option:
highlights
(1 minute videos) orraw video
(original music videos of varying lengths) - Convert
$DEAP_DATA/Video/highlights/*.wmv
files tomp4
- Copy videos to
./data/deap/mp4/
- Open
$DEAP_DATA/metadata_xls/participatn_ratings.XLS
, save as$DEAP_DATA/metadata_xls/participatn_ratings.CSV
and copy it to./data/deap/
- Choose a video option:
- Test data (same for
highlights
orraw video
):- Extract the first 11 seconds of each train video
- Copy it in
./data/deap/test_data/
- The final directory structure should be as follow:
.data/deap/ +-- mp4 | +-- 1.mp4 | +-- 2.mp4 | ... +-- participatn_ratings.csv +-- test_data | +-- 1.mp4 | +-- 2.mp4 | ... ...
- Train data:
- Get average of emotion scores
- Run:
python deap_1_average_emotion_scores.py
- Run:
- Splice video, audio, emotion and dummy text files
Dummy text is necessary in order to ensure compatibility with the COGNIMUSE script
- Run:
python deap_2_video2splice.py
- Run:
- (Optional) Transform audio to instrumental piano audio
- Run:
python deap_3_audio2piano.py
- Run:
- Save spliced data in Python's npz format
- Run:
python deap_4_splices2npz.py
- Run:
Please star or fork if this code was useful for you. If you use it in a paper, please cite as:
@software{gwena_cunha_2020_3910918,
author = {Gwenaelle Cunha Sergio},
title = {{gcunhase/AnnotatedMV-PreProcessing: Pre-Processing
of Annotated Music Video Corpora (COGNIMUSE and
DEAP)}},
month = jun,
year = 2020,
publisher = {Zenodo},
version = {v2.0},
doi = {10.5281/zenodo.3910918},
url = {https://doi.org/10.5281/zenodo.3910918}
}
If you use the COGNIMUSE database:
@article{zlatintsi2017cognimuse,
title={COGNIMUSE: A multimodal video database annotated with saliency, events, semantics and emotion with application to summarization},
author={Zlatintsi, Athanasia and Koutras, Petros and Evangelopoulos, Georgios and Malandrakis, Nikolaos and Efthymiou, Niki and Pastra, Katerina and Potamianos, Alexandros and Maragos, Petros},
journal={EURASIP Journal on Image and Video Processing},
volume={2017},
number={1},
pages={54},
year={2017},
publisher={Springer}
}
If you use the DEAP database:
@article{koelstra2011deap,
title={Deap: A database for emotion analysis; using physiological signals},
author={Koelstra, Sander and Muhl, Christian and Soleymani, Mohammad and Lee, Jong-Seok and Yazdani, Ashkan and Ebrahimi, Touradj and Pun, Thierry and Nijholt, Anton and Patras, Ioannis},
journal={IEEE transactions on affective computing},
volume={3},
number={1},
pages={18--31},
year={2011},
publisher={IEEE}
}