9
9
<center ><img src =" https://user-images.githubusercontent.com/15963413/218609148-881e39df-33af-4af9-ab95-1427c4ebf062.png " width =" 800 " ></center >
10
10
11
11
## News
12
+ - ** Sep 2024 (v2.4):**
13
+ - We have updated the pretrained checkpoints trained for 5M steps. This is final release of the BigVGAN-v2 checkpoints.
14
+
12
15
- ** Jul 2024 (v2.3):**
13
16
- General refactor and code improvements for improved readability.
14
17
- Fully fused CUDA kernel of anti-alised activation (upsampling + activation + downsampling) with inference speed benchmark.
@@ -185,11 +188,11 @@ One can download the checkpoints of the generator weight (named `bigvgan_generat
185
188
186
189
| Model Name | Sampling Rate | Mel band | fmax | Upsampling Ratio | Params | Dataset | Steps | Fine-Tuned |
187
190
| :--------------------------------------------------------------------------------------------------------:| :-------------:| :--------:| :-----:| :----------------:| :------:| :--------------------------:| :-----:| :----------:|
188
- | [ bigvgan_v2_44khz_128band_512x] ( https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x ) | 44 kHz | 128 | 22050 | 512 | 122M | Large-scale Compilation | 3M | No |
189
- | [ bigvgan_v2_44khz_128band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x ) | 44 kHz | 128 | 22050 | 256 | 112M | Large-scale Compilation | 3M | No |
190
- | [ bigvgan_v2_24khz_100band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x ) | 24 kHz | 100 | 12000 | 256 | 112M | Large-scale Compilation | 3M | No |
191
- | [ bigvgan_v2_22khz_80band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_22khz_80band_256x ) | 22 kHz | 80 | 11025 | 256 | 112M | Large-scale Compilation | 3M | No |
192
- | [ bigvgan_v2_22khz_80band_fmax8k_256x] ( https://huggingface.co/nvidia/bigvgan_v2_22khz_80band_fmax8k_256x ) | 22 kHz | 80 | 8000 | 256 | 112M | Large-scale Compilation | 3M | No |
191
+ | [ bigvgan_v2_44khz_128band_512x] ( https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x ) | 44 kHz | 128 | 22050 | 512 | 122M | Large-scale Compilation | 5M | No |
192
+ | [ bigvgan_v2_44khz_128band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_256x ) | 44 kHz | 128 | 22050 | 256 | 112M | Large-scale Compilation | 5M | No |
193
+ | [ bigvgan_v2_24khz_100band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x ) | 24 kHz | 100 | 12000 | 256 | 112M | Large-scale Compilation | 5M | No |
194
+ | [ bigvgan_v2_22khz_80band_256x] ( https://huggingface.co/nvidia/bigvgan_v2_22khz_80band_256x ) | 22 kHz | 80 | 11025 | 256 | 112M | Large-scale Compilation | 5M | No |
195
+ | [ bigvgan_v2_22khz_80band_fmax8k_256x] ( https://huggingface.co/nvidia/bigvgan_v2_22khz_80band_fmax8k_256x ) | 22 kHz | 80 | 8000 | 256 | 112M | Large-scale Compilation | 5M | No |
193
196
| [ bigvgan_24khz_100band] ( https://huggingface.co/nvidia/bigvgan_24khz_100band ) | 24 kHz | 100 | 12000 | 256 | 112M | LibriTTS | 5M | No |
194
197
| [ bigvgan_base_24khz_100band] ( https://huggingface.co/nvidia/bigvgan_base_24khz_100band ) | 24 kHz | 100 | 12000 | 256 | 14M | LibriTTS | 5M | No |
195
198
| [ bigvgan_22khz_80band] ( https://huggingface.co/nvidia/bigvgan_22khz_80band ) | 22 kHz | 80 | 8000 | 256 | 112M | LibriTTS + VCTK + LJSpeech | 5M | No |
@@ -216,11 +219,12 @@ When training BigVGAN-v2 from scratch with small batch size, it can potentially
216
219
217
220
Below are the objective results of the 24kHz model (` bigvgan_v2_24khz_100band_256x ` ) obtained from the LibriTTS ` dev ` sets. BigVGAN-v2 shows noticeable improvements of the metrics. The model also exhibits reduced perceptual artifacts, especially for non-speech audio.
218
221
219
- | Model | Dataset | Steps | PESQ(↑) | M-STFT(↓) | MCD(↓) | Periodicity(↓) | V/UV F1(↑) |
220
- | :----------:| :-----------------------:| :-----:| :---------:| :----------:| :------:| :--------------:| :----------:|
221
- | BigVGAN | LibriTTS | 1M | 4.027 | 0.7997 | 0.3745 | 0.1018 | 0.9598 |
222
- | BigVGAN | LibriTTS | 5M | 4.256 | 0.7409 | 0.2988 | 0.0809 | 0.9698 |
223
- | BigVGAN-v2 | Large-scale Compilation | 3M | ** 4.359** | ** 0.7134** | 0.3060 | ** 0.0621** | ** 0.9777** |
222
+ | Model | Dataset | Steps | PESQ(↑) | M-STFT(↓) | MCD(↓) | Periodicity(↓) | V/UV F1(↑) |
223
+ | :----------:| :-----------------------:| :-----:| :---------:| :----------:| :----------:| :--------------:| :----------:|
224
+ | BigVGAN | LibriTTS | 1M | 4.027 | 0.7997 | 0.3745 | 0.1018 | 0.9598 |
225
+ | BigVGAN | LibriTTS | 5M | 4.256 | 0.7409 | 0.2988 | 0.0809 | 0.9698 |
226
+ | BigVGAN-v2 | Large-scale Compilation | 3M | 4.359 | 0.7134 | 0.3060 | 0.0621 | 0.9777 |
227
+ | BigVGAN-v2 | Large-scale Compilation | 5M | ** 4.362** | ** 0.7026** | ** 0.2903** | ** 0.0593** | ** 0.9793** |
224
228
225
229
## Speed Benchmark
226
230
0 commit comments