Skip to content


Repository files navigation

License: LGPL v3 CI

Gradle MaryTTS Kaldi MFA plugin

This plugin uses the Kaldi-based Montreal Forced Aligner to phonetically segment audio data based on corresponding text files. It uses MaryTTS to predict the text pronunciation.


In your project directory, place the audio and text files under your build directory like this:

├── text
│   ├── utt0001.txt
│   ├── utt0002.txt
│   ├── utt0003.txt
│   ├── utt0004.txt
│   └── utt0005.txt
└── wav
    ├── utt0001.wav
    ├── utt0002.wav
    ├── utt0003.wav
    ├── utt0004.wav
    └── utt0005.wav

Linux note

If you are on a Linux system, ensure that the file is on the library search path, since it is not bundled with the Linux release of Montreal Forced Aligner. For details, see the installation notes.

We hope that this issue is resolved in a future release; until then you can install the missing library by running

sudo apt-get install libatlas3-base

on Debian-based distributions (such as Ubuntu).

Windows note

On Windows, an installation of Visual Studio is required.

How to apply this plugin

Please see the instructions at

MFA Tasks

Applying this plugin to a project adds several tasks, which are configured as follows.


Converts text files to MaryXML for pronunciation prediction (G2P)


  • locale, default: Locale.US
  • srcDir, default: layout.buildDirectory.dir('text')


  • destDir, default: layout.buildDirectory.dir('maryxml')


Extracts text input files from MaryXML and generates custom dictionary for MFA


  • srcDir, default: convertTextToMaryXml.destDir


  • destDir, default: layout.buildDirectory.dir('mfaLab')
  • dictFile, default: layout.buildDirectory.file('dict.txt')


Collects audio and text input files and custom dictionary for MFA


  • wavDir, default: layout.buildDirectory.dir('wav')
  • mfaLabDir, default: processMaryXml.destDir
  • dictFile, default: processMaryXml.dictFile


  • destDir, default: layout.buildDirectory.dir('forcedAlignment')


Downloads and unpacks MFA


  • destinationDir, default: layout.buildDirectory.dir('mfa')


Runs MFA to generate Praat TextGrids (for details on some properties, see the CLI documentation)


  • unpackMFA
  • srcDir, default: prepareForcedAlignment.destDir
  • speakerChars, default: 0
  • fast, defalt: false
  • numJobs, default: gradle.startParameter.maxWorkerCount
  • noDict, default: false
  • clean, default: false
  • debug, default: false
  • ignoreExceptions, default: false


  • modelDir, default: layout.buildDirectory.dir('kaldiModels')
  • destDir, default: layout.buildDirectory.dir('TextGrid')


Converts Praat TextGrids to XWaves lab format (with label mapping)


  • srcDir, default: runForcedAlignment.destDir
  • tierName, default: 'phones'
  • labelMapping, default: [sil: '_', sp: '_']


  • destDir, default: layout.buildDirectory.dir('lab')

To customize the directories configured as input for the forced alignment, you can override them like this in the project's build.gradle:

convertTextToMaryXml {
    srcDir = file("some/other/text/directory")

prepareForcedAlignment {
    wavDir =  file("$buildDir/wav")
    dictFile = file("someDict.txt")

How to run the forced alignement


./gradlew runForcedAlignment

after which the resulting TextGrid files will be under build/TextGrid.

G2P for other languages

To use MaryTTS to predict the pronunciation for languages other than US English, two things must be configured:

  1. The locale property of the convertTextToMaryXml task must be set to the target locale, e.g.,
    convertTextToMaryXml {
        locale = Locale.GERMAN
    for German, or
    convertTextToMaryXml {
        locale = Locale.forLanguageTag("pl-PL")
    for Polish.
  2. The required MaryTTS language component must be added as a dependency to the marytts configuration, e.g.,
    dependencies {
        marytts group: 'de.dfki.mary', name: 'marytts-lang-de', version: '5.2.1'
    for German, or
    repositories {
        maven {
            url ''
    dependencies {
        marytts group: 'de.dfki.mary', name: 'marytts-lang-pl', version: '0.1.0-SNAPSHOT'
    to resolve a proof-of-concept language component for Polish from OJO.

Optional post-processing

In order to use the TextGrids for voicebuilding with MaryTTS you have to replace the sil and sp labels with _ and convert them to lab files in Xwaves format.

This can be done simply by running

./gradlew convertTextGridToXLab

Note that you can also override the label mapping:

convertTextGridToXLab {
    labelMapping = [sil: 'pau', a: 'ah'] // etc.

Using convertTextGridToXLab alone

If you want to use convertTextGridToXLab alone you may want to override the default TextGrid-directory which is build/TextGrid/forcedAlignment:

convertTextGridToXLab.tgDir = file("$buildDir/TextGrid")