Pytorch implementation of librosa

This codebase provides PyTorch implementation of some librosa functions. The functions can run on GPU. For example, users can extract log mel spectrogram on GPU. The numerical difference between this codebase and librosa is less than 1e-6.

Install

$ pip install torchlibrosa

Examples

Here are examples of extracting spectrogram, log mel spectrogram, STFT and ISTFT using torchlibrosa.

import torch
import torchlibrosa as tl

# Data
x = torch.zeros(1, 22050)	# (batch_size, samples_num)

# Spectrogram
spectrogram_extractor = tl.stft.Spectrogram(n_fft=2048, hop_length=512)
sp = spectrogram_extractor.forward(x)	# (batch_size, 1, time_steps, freq_bins)

# Log mel spectrogram
logmel_extractor = tl.stft.LogmelFilterBank(sr=22050, n_fft=2048, n_mels=128)
logmel = logmel_extractor.forward(sp)	# (batch_size, 1, time_steps, freq_bins)

# STFT
stft_extractor = tl.stft.STFT(n_fft=2048, hop_length=512)
(real, imag) = stft_extractor.forward(x)
# real: (batch_size, 1, time_steps, freq_bins), imag: (batch_size, 1, time_steps, freq_bins) #

# ISTFT
istft_extractor = tl.stft.ISTFT(n_fft=2048, hop_length=512)
y = istft_extractor.forward(real, imag, x.shape[-1])	# (batch_size, samples_num)

More examples

python3 torchlibrosa/stft.py

Compability to librosa functions

If one you previously used for training cpu-extracted features from librosa, but want to add GPU acceleration during i.e., evaluation, then note that the following code will provide identical features to standard mel spectrograms:

## Librosa implementation
import torch
import torchlibrosa as tl

sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

raw_audio = torch.empty(sample_rate).uniform_(-1, 1) #Float32 input with normalized scale (-1, 1)

#Torchlibrosa feature extractor similar to librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.stft.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.stft.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, #Default is true
    ))
feature = feature_extractor(raw_audio.unsqueeze(0)) # Shape is (Batch, 1, T, N_Mels)

Cite

[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2880-2894.

Name	Name	Last commit message	Last commit date
Latest commit yinkalario 0.0.6, added channel dimension and more features Jan 24, 2021 b8d0ac4 · Jan 24, 2021 History 28 Commits
examples	examples	0.0.6 add channel dimension	Jan 24, 2021
torchaudio_stft	torchaudio_stft	0.0.6 add channel dimension	Jan 24, 2021
torchlibrosa	torchlibrosa	0.0.6, added channel dimension and more features	Jan 24, 2021
.gitignore	.gitignore	new	Jan 24, 2021
LICENSE.md	LICENSE.md	add all	Jul 7, 2019
README.md	README.md	change hyper-parameters to same as librosa	Jan 24, 2021
setup.py	setup.py	0.0.6	Jan 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch implementation of librosa

Install

Examples

More examples

Compability to librosa functions

Cite

About

Releases

Packages

Languages

License

yinkalario/torchlibrosa

Folders and files

Latest commit

History

Repository files navigation

Pytorch implementation of librosa

Install

Examples

More examples

Compability to librosa functions

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages