This plugin uses the Kaldi-based Montreal Forced Aligner to phonetically segment audio data based on corresponding text files. It uses MaryTTS to predict the text pronunciation.
In your project directory, place the audio and text files under your build
directory like this:
build
├── text
│ ├── utt0001.txt
│ ├── utt0002.txt
│ ├── utt0003.txt
│ ├── utt0004.txt
│ └── utt0005.txt
└── wav
├── utt0001.wav
├── utt0002.wav
├── utt0003.wav
├── utt0004.wav
└── utt0005.wav
If you are on a Linux system, ensure that the libcblas.so.3
file is on the library search path, since it is not bundled with the Linux release of Montreal Forced Aligner.
For details, see the installation notes.
We hope that this issue is resolved in a future release; until then you can install the missing library by running
sudo apt-get install libatlas3-base
on Debian-based distributions (such as Ubuntu).
On Windows, an installation of Visual Studio is required.
Please see the instructions at https://plugins.gradle.org/plugin/de.dfki.mary.voicebuilding.marytts-kaldi-mfa
Applying this plugin to a project adds several tasks, which are configured as follows.
Converts text files to MaryXML for pronunciation prediction (G2P)
locale
, default:Locale.US
srcDir
, default:layout.buildDirectory.dir('text')
destDir
, default:layout.buildDirectory.dir('maryxml')
Extracts text input files from MaryXML and generates custom dictionary for MFA
srcDir
, default:convertTextToMaryXml.destDir
destDir
, default:layout.buildDirectory.dir('mfaLab')
dictFile
, default:layout.buildDirectory.file('dict.txt')
Collects audio and text input files and custom dictionary for MFA
wavDir
, default:layout.buildDirectory.dir('wav')
mfaLabDir
, default:processMaryXml.destDir
dictFile
, default:processMaryXml.dictFile
destDir
, default:layout.buildDirectory.dir('forcedAlignment')
Downloads and unpacks MFA
destinationDir
, default:layout.buildDirectory.dir('mfa')
Runs MFA to generate Praat TextGrids (for details on some properties, see the CLI documentation)
unpackMFA
srcDir
, default:prepareForcedAlignment.destDir
speakerChars
, default:0
fast
, defalt:false
numJobs
, default:gradle.startParameter.maxWorkerCount
noDict
, default:false
clean
, default:false
debug
, default:false
ignoreExceptions
, default:false
modelDir
, default:layout.buildDirectory.dir('kaldiModels')
destDir
, default:layout.buildDirectory.dir('TextGrid')
Converts Praat TextGrids to XWaves lab format (with label mapping)
srcDir
, default:runForcedAlignment.destDir
tierName
, default:'phones'
labelMapping
, default:[sil: '_', sp: '_']
destDir
, default:layout.buildDirectory.dir('lab')
To customize the directories configured as input for the forced alignment, you can override them like this in the project's build.gradle
:
convertTextToMaryXml {
srcDir = file("some/other/text/directory")
}
prepareForcedAlignment {
wavDir = file("$buildDir/wav")
dictFile = file("someDict.txt")
}
Run
./gradlew runForcedAlignment
after which the resulting TextGrid files will be under build/TextGrid
.
To use MaryTTS to predict the pronunciation for languages other than US English, two things must be configured:
- The
locale
property of theconvertTextToMaryXml
task must be set to the target locale, e.g.,for German, orconvertTextToMaryXml { locale = Locale.GERMAN }
for Polish.convertTextToMaryXml { locale = Locale.forLanguageTag("pl-PL") }
- The required MaryTTS language component must be added as a dependency to the
marytts
configuration, e.g.,for German, ordependencies { marytts group: 'de.dfki.mary', name: 'marytts-lang-de', version: '5.2.1' }
to resolve a proof-of-concept language component for Polish from OJO.repositories { maven { url 'https://oss.jfrog.org/artifactory/oss-snapshot-local/' } } dependencies { marytts group: 'de.dfki.mary', name: 'marytts-lang-pl', version: '0.1.0-SNAPSHOT' }
In order to use the TextGrids for voicebuilding with MaryTTS you have to replace the sil
and sp
labels with _
and convert them to lab
files in Xwaves format.
This can be done simply by running
./gradlew convertTextGridToXLab
Note that you can also override the label mapping:
convertTextGridToXLab {
labelMapping = [sil: 'pau', a: 'ah'] // etc.
}
If you want to use convertTextGridToXLab
alone you may want to override the default TextGrid-directory which is build/TextGrid/forcedAlignment
:
convertTextGridToXLab.tgDir = file("$buildDir/TextGrid")