Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vosk api: allow selecting different models and automatic model download #657

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,6 @@ The `library reference <https://github.com/Uberi/speech_recognition/blob/master/

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

You have to install Vosk models for using Vosk. `Here <https://alphacephei.com/vosk/models>`__ are models avaiable. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Examples
--------

Expand Down Expand Up @@ -143,9 +141,14 @@ Vosk API is **required if and only if you want to use Vosk recognizer** (``recog

You can install it with ``python3 -m pip install vosk``.

You also have to install Vosk Models:
Languages can be selected with the language parameter e.g. ``recognizer_instance.recognize_vosk(language='de')``.
Vosk will attempt to download the respective model from https://alphacephei.com/vosk/models automatically.
Language defaults to english ``'en-us'``.

It is possible to manually download a model and place it in a directory in your project folder.
Reference this folder with the model parameter ``model='folder-name'``. This will take precedence over the language parameter.

`Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"
Models are avaiable for download `here <https://alphacephei.com/vosk/models>`__.

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
29 changes: 22 additions & 7 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1684,16 +1684,31 @@ def recognize_whisper(self, audio_data, model="base", show_dict=False, load_opti
return result["text"]


def recognize_vosk(self, audio_data, language='en'):
def recognize_vosk(self, audio_data, model='', language='en-us'):
from vosk import Model, KaldiRecognizer

assert isinstance(audio_data, AudioData), "Data must be audio data"

if not hasattr(self, 'vosk_model'):
if not os.path.exists("model"):
return "Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder."
exit (1)
self.vosk_model = Model("model")
if model:
if not os.path.exists(model):
raise RequestError(f"Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as '{model}' in the current folder.")
self.vosk_model = Model(model)
else:
try:
import requests
except ImportError:
raise RequestError("requests module is required to download model data")
# verify this language is available via api
response = requests.get('https://alphacephei.com/vosk/models/model-list.json', timeout=10)
# raise error if bad response
response.raise_for_status()

models = response.json()
languages = { m["lang"] for m in models }
if language not in languages:
raise RequestError(f"Language '{language}' not available. Available language codes are: {languages}")
self.vosk_model = Model(lang=language)

rec = KaldiRecognizer(self.vosk_model, 16000);

Expand Down