Sound Source Localization is All About Alignment (ICCV’23)

Official PyTorch implementation of our following papers:

Sound Source Localization is All About Cross-Modal Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung (* Equal Contribution)

ICCV 2023

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung (* Equal Contribution)

arXiV 2024

Index

Overview
Interactive Synthetic Sound Source (IS3) Dataset
Model Checkpoints
Demo
Citation

Overview

Interactive Synthetic Sound Source (IS3) Dataset

IS3 dataset is available here or you can simply run download_is3.sh.

The IS3 data is organized as follows:

Note that in IS3 dataset, each annotation is saved as a separate file. For example; the sample accordion_baby_10467 image contains two annotations for accordion and baby objects. These annotations are saved as accordion_baby_10467_accordion and accordion_baby_10467_baby for straightforward use. You can always project bounding boxes or segmentation maps onto the original image to see them all at once.

images and audio_waw folders contain all the image and audio files respectively.

IS3_annotation.json file contains ground truth bounding box and category information of each annotation.

gt_segmentation folder contains segmentation maps in binary image format for each annotation. You can query the file name in IS3_annotation.json to get semantic category of each segmentation map.

Model Checkpoints

The model checkpoints are available for the following experiments:

Training Set	Test Set	Model Type	Performance (cIoU)	Checkpoint
VGGSound-144K	VGG-SS	NN w/ Sup. Pre. Enc.	39.94	Link
VGGSound-144K	VGG-SS	NN w/ Self-Sup. Pre. Enc.	39.16	Link
VGGSound-144K	VGG-SS	NN w/ Sup. Pre. Enc. Pre-trained Vision	41.42	Link
Flickr-SoundNet-144K	Flickr-SoundNet	NN w/ Sup. Pre. Enc.	85.20	Link
Flickr-SoundNet-144K	Flickr-SoundNet	NN w/ Self-Sup. Pre. Enc.	84.80	Link
Flickr-SoundNet-144K	Flickr-SoundNet	NN w/ Sup. Pre. Enc. Pre-trained Vision	86.00	Link

Demo

We provide a zip file that contains model checkpoints and a few data samples from VGGSound.

https://mm.kaist.ac.kr/share/kccv_tutorial.zip

Download the dataset and set up the environment as described below.

sh environment.sh
sh download_is3.sh

Now enjoy the Sound Localization Demo.ipynb!

Citation

If you find this code useful, please consider giving a star ⭐ and citing us:

@inproceedings{senocak2023sound,
  title={Sound source localization is all about cross-modal alignment},
  author={Senocak, Arda and Ryu, Hyeonggon and Kim, Junsik and Oh, Tae-Hyun and Pfister, Hanspeter and Chung, Joon Son},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7777--7787},
  year={2023}
}

If you use this dataset, please consider giving a star ⭐ and citing us:

@article{senocak2024align,
  title={Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment},
  author={Senocak, Arda and Ryu, Hyeonggon and Kim, Junsik and Oh, Tae-Hyun and Pfister, Hanspeter and Chung, Joon Son},
  journal={arXiv preprint arXiv:2407.13676},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ACLSSL		ACLSSL
Alignment		Alignment
figs		figs
README.md		README.md
Sound Localization Demo.ipynb		Sound Localization Demo.ipynb
car.wav		car.wav
car_engine.mp3		car_engine.mp3
car_img.jpeg		car_img.jpeg
download_is3.sh		download_is3.sh
environments.sh		environments.sh
samples.json		samples.json
synthetic3240_bbox.json		synthetic3240_bbox.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sound Source Localization is All About Alignment (ICCV’23)

Index

Overview

Interactive Synthetic Sound Source (IS3) Dataset

Model Checkpoints

Demo

Citation

ssl_kccv

About

Releases

Packages

Languages

kaistmm/ssl_kccv

Folders and files

Latest commit

History

Repository files navigation

Sound Source Localization is All About Alignment (ICCV’23)

Index

Overview

Interactive Synthetic Sound Source (IS3) Dataset

Model Checkpoints

Demo

Citation

ssl_kccv

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages