This repo hosts the code and models of "MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection" [Accepted by Interspeech 2024].
- The STARSS22 and STARSS23 datasets can be downloaded from the link.
- The official Synthetic dataset can be downloaded from the link.
Download and unzip the datasets, the directory of datasets looks like:
./dataset
│
├── STARSS22
│ ├── foa_dev
│ ├── foa_eval
│ └── metadata_dev
│
├── STARSS23
│ ├── foa_dev
│ ├── foa_eval
│ └── metadata_dev
│
└── synth_dataset
└── official
├── foa
└── metadata
Use the provided environment.yaml
. Please change the last line of environment.yaml
to your own Anaconda envs folder and run
conda env create -f environment.yaml
Then activate the environment
conda activate mff-einv2
Hyper-parameters are stored in ./configs/ein_seld/seld.yaml
. You need to set dataset_dir
to your own dataset directory.
The default setting is to only use the STARSS22 and official Synthetic datasets. If you want to use STARSS23 and official Synthetic datasets, please uncomment the commented code in the file ./scripts/preprocess.sh
, and modify dataset
parameter to dcase2023task3
in the file ./configs/ein_seld/seld.yaml
.
After downloading the dataset, directly run
bash ./scripts/preprocess.sh
You can modify the hyper-parameters in ./configs/ein_seld/seld.yaml
. and run
bash ./scripts/train.sh
You can find the training results in the directory ./results/out_train
.
The prediction results and model outputs will be saved in the ./results/out_infer
folder.
bash ./scripts/infer.sh
Evaluate the generated submission result. Directly run
python3 code/compute_seld_metrics.py --dataset='STARSS22'
or
python3 code/compute_seld_metrics.py --dataset='STARSS23'
@article{mu2024mffeinv2,
title={MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection},
author={Da Mu and Zhicheng Zhang and Haobo Yue},
journal={arXiv preprint arXiv:2406.08771},
year={2024},
}
The code is based on the Jinbo-Hu's repo.
@inproceedings{hu2022,
author={Hu, Jinbo and Cao, Yin and Wu, Ming and Kong, Qiuqiang and Yang, Feiran and Plumbley, Mark D. and Yang, Jun},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection},
year={2022},
pages={9196-9200},
doi={10.1109/ICASSP43922.2022.9747283}}