mindspore-lab · LiTingyu1997 · May 15, 2025 · Jul 10, 2025
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 <div align="center">
 
 
-# MindAudio
+# MindSpore AUDIO
 
 [![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/mindspore-lab/mindaudio/ut_test.yaml)
 ![GitHub issues](https://img.shields.io/github/issues/mindspore-lab/mindaudio)
@@ -20,7 +20,7 @@ English | [中文](README_CN.md)
 
 ## Introduction
 
-MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https://www.mindspore.cn/). It provides a series of API for common audio data processing,data enhancement,feature extraction, so that users can preprocess data conveniently. Also provides examples to show how to build audio deep learning models with mindaudio.
+MindSpore AUDIO is a toolbox of audio models and algorithms based on [MindSpore](https://www.mindspore.cn/). It provides a series of API for common audio data processing,data enhancement,feature extraction, so that users can preprocess data conveniently. Also provides examples to show how to build audio deep learning models with mindaudio.
 
 The following is the corresponding `mindaudio` versions and supported `mindspore` versions.
 
@@ -46,15 +46,15 @@ The following is the corresponding `mindaudio` versions and supported `mindspore
 
 ### Install with PyPI
 
-The released version of MindAudio can be installed via `PyPI` as follows:
+The released version of MindSpore AUDIO can be installed via `PyPI` as follows:
 
 ```shell
 pip install mindaudio
 ```
 
 ### Install from Source
 
-The latest version of MindAudio can be installed as follows:
+The latest version of MindSpore AUDIO can be installed as follows:
 
 ```shell
 git clone https://github.com/mindspore-lab/mindaudio.git
@@ -67,7 +67,7 @@ python setup.py install
 
 ###
 
-MindAudio provides a series of commonly used audio data processing apis, which can be easily invoked for data analysis and feature extraction.
+MindSpore AUDIO provides a series of commonly used audio data processing apis, which can be easily invoked for data analysis and feature extraction.
 
 ```python
 >>> import mindaudio.data.io as io

diff --git a/README_CN.md b/README_CN.md
@@ -1,7 +1,7 @@
 <div align="center">
 
 
-# MindAudio
+# MindSpore AUDIO
 
 [![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/mindspore-lab/mindaudio/ut_test.yaml)
 ![GitHub issues](https://img.shields.io/github/issues/mindspore-lab/mindaudio)
@@ -18,7 +18,7 @@
 </div>
 
 ## 介绍
-MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算法工具箱。它提供了一系列用于常见音频数据处理、数据增强、特征提取的 API，方便用户对数据进行预处理。此外，它还提供了一些示例，展示如何利用 mindaudio 建立音频深度学习模型。
+MindSpore AUDIO 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算法工具箱。它提供了一系列用于常见音频数据处理、数据增强、特征提取的 API，方便用户对数据进行预处理。此外，它还提供了一些示例，展示如何利用 mindaudio 建立音频深度学习模型。
 
 下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。
 
@@ -44,14 +44,14 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算
 
 ### Pypi安装
 
-MindAudio的发布版本可以通过`PyPI`安装:
+MindSpore AUDIO的发布版本可以通过`PyPI`安装:
 
 ```shell
 pip install mindaudio
 ```
 
 ### 源码安装
-最新版本的 MindAudio 可以通过如下方式安装：
+最新版本的 MindSpore AUDIO 可以通过如下方式安装：
 
 ```shell
 git clone https://github.com/mindspore-lab/mindaudio.git
@@ -64,7 +64,7 @@ python setup.py install
 
 ###
 
-MindAudio 提供了一系列常用的音频数据处理 APIs，可以轻松调用这些 APIs 进行数据分析和特征提取。
+MindSpore AUDIO 提供了一系列常用的音频数据处理 APIs，可以轻松调用这些 APIs 进行数据分析和特征提取。
 
 ```python
 >>> import mindaudio.data.io as io
@@ -93,16 +93,16 @@ MindAudio 提供了一系列常用的音频数据处理 APIs，可以轻松调
 
 
 ## 贡献方式
-我们感谢开发者用户的所有贡献，一起让 MindAudio 变得更好。
+我们感谢开发者用户的所有贡献，一起让 MindSpore AUDIO 变得更好。
 贡献指南请参考[CONTRIBUTING.md](CONTRIBUTING.md) 。
 
 ## 许可证
 
-MindAudio 遵循[Apache License 2.0](LICENSE)开源协议.
+MindSpore AUDIO 遵循[Apache License 2.0](LICENSE)开源协议.
 
 ## 引用
 
-如果你觉得 MindAudio 对你的项目有帮助，请考虑引用：
+如果你觉得 MindSpore AUDIO 对你的项目有帮助，请考虑引用：
 
 ```latex
 @misc{MindSpore Audio 2022,

diff --git a/examples/ECAPA-TDNN/speaker_verification_cosine.py b/examples/ECAPA-TDNN/speaker_verification_cosine.py
@@ -1,3 +1,5 @@
+# ECAPA_TDNN in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/speaker_verification_cosine.py
 """
 Recipe for training a speaker verification system based on cosine distance.
 """

diff --git a/examples/ECAPA-TDNN/train_speaker_embeddings.py b/examples/ECAPA-TDNN/train_speaker_embeddings.py
@@ -1,3 +1,5 @@
+# ECAPA_TDNN in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/train_speaker_embeddings.py
 """
 Recipe for training speaker embeddings using the VoxCeleb Dataset.
 """

diff --git a/examples/ECAPA-TDNN/voxceleb_prepare.py b/examples/ECAPA-TDNN/voxceleb_prepare.py
@@ -1,3 +1,5 @@
+# ECAPA_TDNN in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/voxceleb_prepare.py
 """
 Data preparation, from mindaudio VoxCeleb recipe.
 """

diff --git a/examples/conformer/asr_model.py b/examples/conformer/asr_model.py
@@ -1,3 +1,5 @@
+# Conformer in mindspore.
+# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/asr_model.py
 """Definition of ASR model."""
 
 import mindspore

diff --git a/examples/conv_tasnet/data.py b/examples/conv_tasnet/data.py
@@ -1,3 +1,5 @@
+# AudioDataLoader in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/data.py
 """
 Logic:
 1. AudioDataLoader generate a minibatch from AudioDataset, the size of this
@@ -16,14 +18,11 @@
     Each targets's shape is B x C x T
 """
 
-import argparse
 import json
 import math
 import os
 
-import mindspore.dataset as ds
 import numpy as np
-from mindspore import context
 
 import mindaudio.data.io as io
 
@@ -176,27 +175,3 @@ def sort_and_pad(self, batch):
 
         sources_pad = sources_pad.transpose((0, 2, 1))
         return mixtures_pad, ilens, sources_pad
-
-
-if __name__ == "__main__":
-    context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=4)
-    args = parser.parse_args()
-    print(args)
-    tr_dataset = DatasetGenerator(
-        args.train_dir,
-        args.batch_size,
-        sample_rate=args.sample_rate,
-        segment=args.segment,
-    )
-    dataset = ds.GeneratorDataset(
-        tr_dataset, ["mixture", "lens", "sources"], shuffle=False
-    )
-    dataset = dataset.batch(batch_size=5)
-    iter_per_epoch = dataset.get_dataset_size()
-    print(iter_per_epoch)
-    h = 0
-    for data in dataset.create_dict_iterator():
-        h += 1
-        print(data["mixture"])
-        print(data["lens"])
-        print(data["sources"])
diff --git a/examples/conv_tasnet/eval.py b/examples/conv_tasnet/eval.py
@@ -1,3 +1,5 @@
+# Evaluation of Conv-TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/evaluate.py
 import mindspore
 import mindspore.dataset as ds
 import mindspore.ops as ops

diff --git a/examples/conv_tasnet/preprocess.py b/examples/conv_tasnet/preprocess.py
@@ -1,3 +1,5 @@
+# Preprocess of Conv-TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/preprocess.py
 """ Convert the relevant information in the audio wav file to a json file """
 
 import argparse

diff --git a/examples/conv_tasnet/train.py b/examples/conv_tasnet/train.py
@@ -1,3 +1,5 @@
+# Train of Conv-TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/train.py
 import os
 
 import mindspore.dataset as ds

diff --git a/examples/deepspeech2/eval.py b/examples/deepspeech2/eval.py
@@ -1,3 +1,5 @@
+# Evaluation of deepspeech2 in mindspore.
+# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/deepspeech_pytorch/validation.py
 """
 Eval DeepSpeech2
 """

diff --git a/examples/deepspeech2/train.py b/examples/deepspeech2/train.py
@@ -1,3 +1,5 @@
+# Train of deepspeech2 in mindspore.
+# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/deepspeech_pytorch/training.py
 """train_criteo."""
 
 import os

diff --git a/examples/fastspeech2/dataset.py b/examples/fastspeech2/dataset.py
@@ -1,3 +1,5 @@
+# LJSpeech dataloader in mindspore.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/dataset.py
 import os
 import sys
 from multiprocessing import cpu_count

diff --git a/examples/fastspeech2/generate.py b/examples/fastspeech2/generate.py
@@ -1,3 +1,5 @@
+# Synthesize in mindspore.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/synthesize.py
 import argparse
 import os
 import re

diff --git a/examples/fastspeech2/ljspeech.py b/examples/fastspeech2/ljspeech.py
@@ -1,3 +1,5 @@
+# LJSpeech dataloader in mindspore.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/ljspeech.py
 import csv
 import os
 

diff --git a/examples/fastspeech2/preprocess.py b/examples/fastspeech2/preprocess.py
@@ -1,5 +1,6 @@
 # Given the path to ljspeech/wavs,
 # this script converts wav files to .npy features used for training.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/preprocessor.py
 
 import argparse
 import os

diff --git a/examples/fastspeech2/text/__init__.py b/examples/fastspeech2/text/__init__.py
@@ -1,4 +1,4 @@
-""" from https://github.com/keithito/tacotron """
+# Copited from from https://github.com/keithito/tacotron
 import re
 
 from text import cleaners

diff --git a/examples/fastspeech2/text/cleaners.py b/examples/fastspeech2/text/cleaners.py
@@ -1,4 +1,4 @@
-""" from https://github.com/keithito/tacotron """
+# Copited from https://github.com/keithito/tacotron
 
 """
 Cleaners are transformations that run over the input text at both training and eval time.

diff --git a/examples/fastspeech2/text/cmudict.py b/examples/fastspeech2/text/cmudict.py
@@ -1,4 +1,4 @@
-""" from https://github.com/keithito/tacotron """
+# Copited from https://github.com/keithito/tacotron
 
 import re
 

diff --git a/examples/fastspeech2/text/numbers.py b/examples/fastspeech2/text/numbers.py
@@ -1,4 +1,4 @@
-""" from https://github.com/keithito/tacotron """
+# Copited from https://github.com/keithito/tacotron
 
 import re
 

diff --git a/examples/fastspeech2/text/pinyin.py b/examples/fastspeech2/text/pinyin.py
@@ -1,3 +1,4 @@
+# Copited from https://github.com/ming024/FastSpeech2/blob/master/text/pinyin.py
 initials = [
     "b",
     "c",

diff --git a/examples/fastspeech2/text/symbols.py b/examples/fastspeech2/text/symbols.py
@@ -1,3 +1,4 @@
+# Copited from https://github.com/ming024/FastSpeech2/blob/master/text/symbols.py
 from text import pinyin
 
 valid_symbols = [

diff --git a/examples/fastspeech2/train.py b/examples/fastspeech2/train.py
@@ -1,3 +1,5 @@
+# Train in mindspore.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/train.py
 import argparse
 import ast
 import os

diff --git a/examples/tasnet/data.py b/examples/tasnet/data.py
@@ -1,3 +1,5 @@
+# AudioDataLoader in mindspore.
+# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/train.py
 """ data """
 import json
 import os

diff --git a/examples/tasnet/eval.py b/examples/tasnet/eval.py
@@ -1,3 +1,5 @@
+# Evaluation of TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/evaluate.py
 import argparse
 import json
 import os
@@ -7,8 +9,6 @@
 import mindspore.ops as ops
 from data import DatasetGenerator
 from mindspore import (
-    Parameter,
-    Tensor,
     context,
     load_checkpoint,
     load_param_into_net,

diff --git a/examples/tasnet/preprocess.py b/examples/tasnet/preprocess.py
@@ -1,3 +1,5 @@
+# Preprocess of TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/preprocess.py
 """ Convert the relevant information in the audio wav file to a json file """
 
 import argparse

diff --git a/examples/tasnet/train.py b/examples/tasnet/train.py
@@ -1,3 +1,5 @@
+# Train of TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/train.py
 """ Train """
 import argparse
 import json

diff --git a/examples/wavegrad/dataset.py b/examples/wavegrad/dataset.py
@@ -1,3 +1,5 @@
+# AudioDataLoader in mindspore.
+# Adapted from https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/dataset.py
 from multiprocessing import cpu_count
 
 import numpy as np

diff --git a/examples/wavegrad/ljspeech.py b/examples/wavegrad/ljspeech.py
@@ -1,3 +1,5 @@
+# LJSpeech dataloader in mindspore.
+# Adapted from https://github.com/ming024/FastSpeech2/blob/master/preprocessor/ljspeech.py
 import csv
 import os
 

diff --git a/examples/wavegrad/preprocess.py b/examples/wavegrad/preprocess.py
@@ -1,3 +1,5 @@
+# Preprocess in mindspore.
+# Adapted from https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/preprocess.py
 import sys
 from multiprocessing import Pool, cpu_count
 

diff --git a/mindaudio/data/aishell.py b/mindaudio/data/aishell.py
@@ -1,3 +1,5 @@
+# AISHELL dataloader in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/AISHELL-1/aishell_prepare.py
 import argparse
 import csv
 import glob

diff --git a/mindaudio/data/librispeech.py b/mindaudio/data/librispeech.py
@@ -1,3 +1,5 @@
+# LibriSpeech dataloader in mindspore.
+# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/data/librispeech.py
 import argparse
 import json
 import os

diff --git a/mindaudio/data/voxceleb.py b/mindaudio/data/voxceleb.py
@@ -1,3 +1,5 @@
+# Voxceleb dataloader in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/voxceleb_prepare.py
 """
 Data preparation, from mindaudio VoxCeleb recipe.
 """

diff --git a/mindaudio/loss/AdditiveAngularMargin.py b/mindaudio/loss/AdditiveAngularMargin.py
@@ -1,3 +1,5 @@
+# AdditiveAngularMargin in mindspore.
+# Adapted from https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/losses.py
 import math
 
 import mindspore as ms

diff --git a/mindaudio/loss/ctc_loss.py b/mindaudio/loss/ctc_loss.py
@@ -1,3 +1,5 @@
+# CTC in mindspore.
+# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/ctc.py
 """CTC layer."""
 
 import mindspore

diff --git a/mindaudio/loss/label_smoothing_loss.py b/mindaudio/loss/label_smoothing_loss.py
@@ -1,3 +1,5 @@
+# Label_smoothing_loss in mindspore.
+# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer/label_smoothing_loss.py
 """Label smoothing module."""
 
 import mindspore

diff --git a/mindaudio/loss/separation_loss.py b/mindaudio/loss/separation_loss.py
@@ -1,3 +1,5 @@
+# Separation_loss in mindspore.
+# Adapted from https://github.com/kaituoxu/TasNet/blob/master/src/pit_criterion.py
 """ Loss """
 from itertools import permutations
 

diff --git a/mindaudio/metric/snr.py b/mindaudio/metric/snr.py
@@ -1,3 +1,5 @@
+# SNR in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/evaluate.py
 import numpy as np
 from mir_eval.separation import bss_eval_sources
 

diff --git a/mindaudio/models/conformer.py b/mindaudio/models/conformer.py
@@ -1,4 +1,6 @@
-"""Definition of ASR model."""
+# Conformer in mindspore.
+# Adapted from https://github.com/wenet-e2e/wenet/blob/main/wenet/transformer
+"""Definition of conformer model."""
 
 from typing import Optional, Tuple
 

diff --git a/mindaudio/models/conv_tasnet.py b/mindaudio/models/conv_tasnet.py
@@ -1,3 +1,5 @@
+# Conv-TasNet in mindspore.
+# Adapted from https://github.com/kaituoxu/Conv-TasNet/blob/master/src/conv_tasnet.py
 import argparse
 import math
 

diff --git a/mindaudio/models/decoders/greedydecoder.py b/mindaudio/models/decoders/greedydecoder.py
@@ -1,3 +1,5 @@
+# Greedydecoder of deepspeech2 in mindspore.
+# Adapted from https://github.com/SeanNaren/deepspeech.pytorch/blob/master/src/deepspeech_pytorch/decoder.py
 import Levenshtein as Lev
 import numpy as np
 from six.moves import xrange