Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<div align="center">


# MindAudio
# MindSpore AUDIO

[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/mindspore-lab/mindaudio/ut_test.yaml)
![GitHub issues](https://img.shields.io/github/issues/mindspore-lab/mindaudio)
Expand All @@ -20,7 +20,7 @@ English | [中文](README_CN.md)

## Introduction

MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https://www.mindspore.cn/). It provides a series of API for common audio data processing,data enhancement,feature extraction, so that users can preprocess data conveniently. Also provides examples to show how to build audio deep learning models with mindaudio.
MindSpore AUDIO is a toolbox of audio models and algorithms based on [MindSpore](https://www.mindspore.cn/). It provides a series of API for common audio data processing,data enhancement,feature extraction, so that users can preprocess data conveniently. Also provides examples to show how to build audio deep learning models with mindaudio.

The following is the corresponding `mindaudio` versions and supported `mindspore` versions.

Expand All @@ -46,15 +46,15 @@ The following is the corresponding `mindaudio` versions and supported `mindspore

### Install with PyPI

The released version of MindAudio can be installed via `PyPI` as follows:
The released version of MindSpore AUDIO can be installed via `PyPI` as follows:

```shell
pip install mindaudio
```

### Install from Source

The latest version of MindAudio can be installed as follows:
The latest version of MindSpore AUDIO can be installed as follows:

```shell
git clone https://github.com/mindspore-lab/mindaudio.git
Expand All @@ -67,7 +67,7 @@ python setup.py install

###

MindAudio provides a series of commonly used audio data processing apis, which can be easily invoked for data analysis and feature extraction.
MindSpore AUDIO provides a series of commonly used audio data processing apis, which can be easily invoked for data analysis and feature extraction.

```python
>>> import mindaudio.data.io as io
Expand Down
16 changes: 8 additions & 8 deletions README_CN.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<div align="center">


# MindAudio
# MindSpore AUDIO

[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/mindspore-lab/mindaudio/ut_test.yaml)
![GitHub issues](https://img.shields.io/github/issues/mindspore-lab/mindaudio)
Expand All @@ -18,7 +18,7 @@
</div>

## 介绍
MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算法工具箱。它提供了一系列用于常见音频数据处理、数据增强、特征提取的 API,方便用户对数据进行预处理。此外,它还提供了一些示例,展示如何利用 mindaudio 建立音频深度学习模型。
MindSpore AUDIO 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算法工具箱。它提供了一系列用于常见音频数据处理、数据增强、特征提取的 API,方便用户对数据进行预处理。此外,它还提供了一些示例,展示如何利用 mindaudio 建立音频深度学习模型。

下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。

Expand All @@ -44,14 +44,14 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算

### Pypi安装

MindAudio的发布版本可以通过`PyPI`安装:
MindSpore AUDIO的发布版本可以通过`PyPI`安装:

```shell
pip install mindaudio
```

### 源码安装
最新版本的 MindAudio 可以通过如下方式安装:
最新版本的 MindSpore AUDIO 可以通过如下方式安装:

```shell
git clone https://github.com/mindspore-lab/mindaudio.git
Expand All @@ -64,7 +64,7 @@ python setup.py install

###

MindAudio 提供了一系列常用的音频数据处理 APIs,可以轻松调用这些 APIs 进行数据分析和特征提取。
MindSpore AUDIO 提供了一系列常用的音频数据处理 APIs,可以轻松调用这些 APIs 进行数据分析和特征提取。

```python
>>> import mindaudio.data.io as io
Expand Down Expand Up @@ -93,16 +93,16 @@ MindAudio 提供了一系列常用的音频数据处理 APIs,可以轻松调


## 贡献方式
我们感谢开发者用户的所有贡献,一起让 MindAudio 变得更好。
我们感谢开发者用户的所有贡献,一起让 MindSpore AUDIO 变得更好。
贡献指南请参考[CONTRIBUTING.md](CONTRIBUTING.md) 。

## 许可证

MindAudio 遵循[Apache License 2.0](LICENSE)开源协议.
MindSpore AUDIO 遵循[Apache License 2.0](LICENSE)开源协议.

## 引用

如果你觉得 MindAudio 对你的项目有帮助,请考虑引用:
如果你觉得 MindSpore AUDIO 对你的项目有帮助,请考虑引用:

```latex
@misc{MindSpore Audio 2022,
Expand Down
2 changes: 2 additions & 0 deletions examples/ECAPA-TDNN/speaker_verification_cosine.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# ECAPA_TDNN in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/speaker_verification_cosine.py
"""
Recipe for training a speaker verification system based on cosine distance.
"""
Expand Down
2 changes: 2 additions & 0 deletions examples/ECAPA-TDNN/train_speaker_embeddings.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# ECAPA_TDNN in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/train_speaker_embeddings.py
"""
Recipe for training speaker embeddings using the VoxCeleb Dataset.
"""
Expand Down
2 changes: 2 additions & 0 deletions examples/ECAPA-TDNN/voxceleb_prepare.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# ECAPA_TDNN in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/voxceleb_prepare.py
"""
Data preparation, from mindaudio VoxCeleb recipe.
"""
Expand Down
2 changes: 2 additions & 0 deletions examples/conformer/asr_model.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Conformer in mindspore.
# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/asr_model.py
"""Definition of ASR model."""

import mindspore
Expand Down
29 changes: 2 additions & 27 deletions examples/conv_tasnet/data.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# AudioDataLoader in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/data.py
"""
Logic:
1. AudioDataLoader generate a minibatch from AudioDataset, the size of this
Expand All @@ -16,14 +18,11 @@
Each targets's shape is B x C x T
"""

import argparse
import json
import math
import os

import mindspore.dataset as ds
import numpy as np
from mindspore import context

import mindaudio.data.io as io

Expand Down Expand Up @@ -176,27 +175,3 @@ def sort_and_pad(self, batch):

sources_pad = sources_pad.transpose((0, 2, 1))
return mixtures_pad, ilens, sources_pad


if __name__ == "__main__":
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=4)
args = parser.parse_args()
print(args)
tr_dataset = DatasetGenerator(
args.train_dir,
args.batch_size,
sample_rate=args.sample_rate,
segment=args.segment,
)
dataset = ds.GeneratorDataset(
tr_dataset, ["mixture", "lens", "sources"], shuffle=False
)
dataset = dataset.batch(batch_size=5)
iter_per_epoch = dataset.get_dataset_size()
print(iter_per_epoch)
h = 0
for data in dataset.create_dict_iterator():
h += 1
print(data["mixture"])
print(data["lens"])
print(data["sources"])
2 changes: 2 additions & 0 deletions examples/conv_tasnet/eval.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Evaluation of Conv-TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/evaluate.py
import mindspore
import mindspore.dataset as ds
import mindspore.ops as ops
Expand Down
2 changes: 2 additions & 0 deletions examples/conv_tasnet/preprocess.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Preprocess of Conv-TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/preprocess.py
""" Convert the relevant information in the audio wav file to a json file """

import argparse
Expand Down
2 changes: 2 additions & 0 deletions examples/conv_tasnet/train.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Train of Conv-TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/train.py
import os

import mindspore.dataset as ds
Expand Down
2 changes: 2 additions & 0 deletions examples/deepspeech2/eval.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Evaluation of deepspeech2 in mindspore.
# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/deepspeech_pytorch/validation.py
"""
Eval DeepSpeech2
"""
Expand Down
2 changes: 2 additions & 0 deletions examples/deepspeech2/train.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Train of deepspeech2 in mindspore.
# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/deepspeech_pytorch/training.py
"""train_criteo."""

import os
Expand Down
2 changes: 2 additions & 0 deletions examples/fastspeech2/dataset.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# LJSpeech dataloader in mindspore.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/dataset.py
import os
import sys
from multiprocessing import cpu_count
Expand Down
2 changes: 2 additions & 0 deletions examples/fastspeech2/generate.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Synthesize in mindspore.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/synthesize.py
import argparse
import os
import re
Expand Down
2 changes: 2 additions & 0 deletions examples/fastspeech2/ljspeech.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# LJSpeech dataloader in mindspore.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/ljspeech.py
import csv
import os

Expand Down
1 change: 1 addition & 0 deletions examples/fastspeech2/preprocess.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Given the path to ljspeech/wavs,
# this script converts wav files to .npy features used for training.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/preprocessor.py

import argparse
import os
Expand Down
2 changes: 1 addition & 1 deletion examples/fastspeech2/text/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" from https://github.com/keithito/tacotron """
# Copited from from https://github.com/keithito/tacotron
import re

from text import cleaners
Expand Down
2 changes: 1 addition & 1 deletion examples/fastspeech2/text/cleaners.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" from https://github.com/keithito/tacotron """
# Copited from https://github.com/keithito/tacotron

"""
Cleaners are transformations that run over the input text at both training and eval time.
Expand Down
2 changes: 1 addition & 1 deletion examples/fastspeech2/text/cmudict.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" from https://github.com/keithito/tacotron """
# Copited from https://github.com/keithito/tacotron

import re

Expand Down
2 changes: 1 addition & 1 deletion examples/fastspeech2/text/numbers.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" from https://github.com/keithito/tacotron """
# Copited from https://github.com/keithito/tacotron

import re

Expand Down
1 change: 1 addition & 0 deletions examples/fastspeech2/text/pinyin.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Copited from https://github.com/ming024/FastSpeech2/blob/master/text/pinyin.py
initials = [
"b",
"c",
Expand Down
1 change: 1 addition & 0 deletions examples/fastspeech2/text/symbols.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Copited from https://github.com/ming024/FastSpeech2/blob/master/text/symbols.py
from text import pinyin

valid_symbols = [
Expand Down
2 changes: 2 additions & 0 deletions examples/fastspeech2/train.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Train in mindspore.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/train.py
import argparse
import ast
import os
Expand Down
2 changes: 2 additions & 0 deletions examples/tasnet/data.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# AudioDataLoader in mindspore.
# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/train.py
""" data """
import json
import os
Expand Down
4 changes: 2 additions & 2 deletions examples/tasnet/eval.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Evaluation of TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/evaluate.py
import argparse
import json
import os
Expand All @@ -7,8 +9,6 @@
import mindspore.ops as ops
from data import DatasetGenerator
from mindspore import (
Parameter,
Tensor,
context,
load_checkpoint,
load_param_into_net,
Expand Down
2 changes: 2 additions & 0 deletions examples/tasnet/preprocess.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Preprocess of TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/preprocess.py
""" Convert the relevant information in the audio wav file to a json file """

import argparse
Expand Down
2 changes: 2 additions & 0 deletions examples/tasnet/train.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Train of TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/train.py
""" Train """
import argparse
import json
Expand Down
2 changes: 2 additions & 0 deletions examples/wavegrad/dataset.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# AudioDataLoader in mindspore.
# Adapted from https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/dataset.py
from multiprocessing import cpu_count

import numpy as np
Expand Down
2 changes: 2 additions & 0 deletions examples/wavegrad/ljspeech.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# LJSpeech dataloader in mindspore.
# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/ljspeech.py
import csv
import os

Expand Down
2 changes: 2 additions & 0 deletions examples/wavegrad/preprocess.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Preprocess in mindspore.
# Adapted from https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/preprocess.py
import sys
from multiprocessing import Pool, cpu_count

Expand Down
2 changes: 2 additions & 0 deletions mindaudio/data/aishell.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# AISHELL dataloader in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/AISHELL-1/aishell_prepare.py
import argparse
import csv
import glob
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/data/librispeech.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# LibriSpeech dataloader in mindspore.
# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/data/librispeech.py
import argparse
import json
import os
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/data/voxceleb.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Voxceleb dataloader in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/voxceleb_prepare.py
"""
Data preparation, from mindaudio VoxCeleb recipe.
"""
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/loss/AdditiveAngularMargin.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# AdditiveAngularMargin in mindspore.
# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/losses.py
import math

import mindspore as ms
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/loss/ctc_loss.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# CTC in mindspore.
# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/ctc.py
"""CTC layer."""

import mindspore
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/loss/label_smoothing_loss.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Label_smoothing_loss in mindspore.
# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/label_smoothing_loss.py
"""Label smoothing module."""

import mindspore
Expand Down
2 changes: 2 additions & 0 deletions mindaudio/loss/separation_loss.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Separation_loss in mindspore.
# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/pit_criterion.py
""" Loss """
from itertools import permutations

Expand Down
2 changes: 2 additions & 0 deletions mindaudio/metric/snr.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# SNR in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/evaluate.py
import numpy as np
from mir_eval.separation import bss_eval_sources

Expand Down
4 changes: 3 additions & 1 deletion mindaudio/models/conformer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Definition of ASR model."""
# Conformer in mindspore.
# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer
"""Definition of conformer model."""

from typing import Optional, Tuple

Expand Down
2 changes: 2 additions & 0 deletions mindaudio/models/conv_tasnet.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Conv-TasNet in mindspore.
# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/conv_tasnet.py
import argparse
import math

Expand Down
2 changes: 2 additions & 0 deletions mindaudio/models/decoders/greedydecoder.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Greedydecoder of deepspeech2 in mindspore.
# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/src/deepspeech_pytorch/decoder.py
import Levenshtein as Lev
import numpy as np
from six.moves import xrange
Expand Down
Loading
Loading