voice2json
is a collection of command-line tools for offline speech/intent recognition on Linux. It is free, open source (MIT), and supports 17 human languages.
From the command-line:
$ voice2json transcribe-wav \
< turn-on-the-light.wav | \
voice2json recognize-intent | \
jq .
produces a JSON event like:
{
"text": "turn on the light",
"intent": {
"name": "LightState"
},
"slots": {
"state": "on"
}
}
when trained with this template:
[LightState]
states = (on | off)
turn (<states>){state} [the] light
voice2json
is optimized for:
- Sets of voice commands that are described well by a grammar
- Commands with uncommon words or pronunciations
- Commands or intents that can vary at runtime
It can be used to:
- Add voice commands to existing applications or Unix-style workflows
- Provide basic voice assistant functionality completely offline on modest hardware
- Bootstrap more sophisticated speech/intent recognition systems
Supported speech to text systems include:
- CMU's pocketsphinx
- Dan Povey's Kaldi
- Mozilla's DeepSpeech 0.6
- Kyoto University's Julius
voice2json
is more than just a wrapper around open source speech to text systems!
- Training produces both a speech and intent recognizer. By describing your voice commands with
voice2json
's templating language, you get more than just transcriptions for free. - Re-training is fast enough to be done at runtime (usually < 5s), even up to millions of possible voice commands. This means you can change referenced slot values or add/remove intents on the fly.
- All of the available commands are designed to work well in Unix pipelines, typically consuming/emitting plaintext or newline-delimited JSON. Audio input/output is file-based, so you can receive audio from any source.
- print-profile - Print profile settings
- train-profile - Generate speech/intent artifacts
- transcribe-wav - Transcribe WAV file to text
- transcribe-stream - Transcribe live audio stream to text
- recognize-intent - Recognize intent from JSON or text
- wait-wake - Listen to live audio stream for wake word
- record-command - Record voice command from live audio stream
- pronounce-word - Look up or guess how a word is pronounced
- generate-examples - Generate random intents
- record-examples - Generate and record speech examples
- test-examples - Test recorded speech examples
- show-documentation - Run HTTP server locally with documentation
- print-downloads - Print profile file download information
- print-files - Print user profile files for backup