MFF-EINV2

This repo hosts the code and models of "MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection" [Accepted by Interspeech 2024].

Data Preparation

The STARSS22 and STARSS23 datasets can be downloaded from the link.
- STARSS22
- STARSS23
The official Synthetic dataset can be downloaded from the link.
- Synthetic dataset

Download and unzip the datasets, the directory of datasets looks like:

./dataset
│
├── STARSS22
│   ├── foa_dev
│   ├── foa_eval
│   └── metadata_dev
│
├── STARSS23
│   ├── foa_dev
│   ├── foa_eval
│   └── metadata_dev
│
└── synth_dataset
    └── official
        ├── foa
        └── metadata

Environments

Use the provided environment.yaml. Please change the last line of environment.yaml to your own Anaconda envs folder and run

conda env create -f environment.yaml

Then activate the environment

conda activate mff-einv2

Quick Start

Hyper-parameters are stored in ./configs/ein_seld/seld.yaml. You need to set dataset_dir to your own dataset directory.

The default setting is to only use the STARSS22 and official Synthetic datasets. If you want to use STARSS23 and official Synthetic datasets, please uncomment the commented code in the file ./scripts/preprocess.sh, and modify dataset parameter to dcase2023task3 in the file ./configs/ein_seld/seld.yaml.

1. Preprocessing

After downloading the dataset, directly run

bash ./scripts/preprocess.sh

2. Training

You can modify the hyper-parameters in ./configs/ein_seld/seld.yaml. and run

bash ./scripts/train.sh

You can find the training results in the directory ./results/out_train.

3. Inference

The prediction results and model outputs will be saved in the ./results/out_infer folder.

bash ./scripts/infer.sh

4. Evaluation

Evaluate the generated submission result. Directly run

python3 code/compute_seld_metrics.py --dataset='STARSS22'

or

python3 code/compute_seld_metrics.py --dataset='STARSS23'

Citation

@article{mu2024mffeinv2,
      title={MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection}, 
      author={Da Mu and Zhicheng Zhang and Haobo Yue},
      journal={arXiv preprint arXiv:2406.08771},
      year={2024},
}

Reference

The code is based on the Jinbo-Hu's repo.

@inproceedings{hu2022,
  author={Hu, Jinbo and Cao, Yin and Wu, Ming and Kong, Qiuqiang and Yang, Feiran and Plumbley, Mark D. and Yang, Jun},
  booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection}, 
  year={2022},
  pages={9196-9200},
  doi={10.1109/ICASSP43922.2022.9747283}}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
configs/ein_seld		configs/ein_seld
images		images
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MFF-EINV2

Data Preparation

Environments

Quick Start

1. Preprocessing

2. Training

3. Inference

4. Evaluation

Citation

Reference

About

Releases 1

Packages

Languages

License

muuda/MFF-EINV2

Folders and files

Latest commit

History

Repository files navigation

MFF-EINV2

Data Preparation

Environments

Quick Start

1. Preprocessing

2. Training

3. Inference

4. Evaluation

Citation

Reference

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages