Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Functionality in Cordova #6

Open
esgraham opened this issue Jan 14, 2020 · 2 comments
Open

Async Functionality in Cordova #6

esgraham opened this issue Jan 14, 2020 · 2 comments
Assignees
Labels
design User Case around a design discussion documentation Improvements or additions to documentation

Comments

@esgraham
Copy link
Collaborator

esgraham commented Jan 14, 2020

Description

As an architect, I want to understand how Cordova implements Async functionality, so that I can determine if plugin should implement the Speech SDK async functions.

Acceptance Criteria

  • Find document that details the how Cordova works with async functionality so that we can review to make a decision.
  • Find document that details how the Cognitive Services Speech SDK implements async servcies so that we can review to make a decision on whether or not to use the functions in the plugin.
  • Update README.md with final decision of if and how to implement async functions from the Speech SDK
@esgraham esgraham added documentation Improvements or additions to documentation design User Case around a design discussion labels Jan 14, 2020
@rozele rozele self-assigned this Jan 21, 2020
@rozele
Copy link
Collaborator

rozele commented Jan 21, 2020

Android

Recognize from microphone

Recommendation: recognizeOnceAsync

  • Cordova plugins are invoked on the WebCore thread on Android, not the main UI thread, see the threading section of this Android overview.
  • The Java Speech SDK only has a recognizeOnceAsync method that returns a Future<T>.
  • Calling the get() method on the future will block the current thread.
  • We don't want to block the WebCore thread, so we should create an ExecutorService to get the results from the Future on a background thread. The ExecutorService used in the Cognitive Services samples is the cached thread pool (Executors.newCachedThreadPool()).

Stop recognizing from microphone

Recommendation: stopContinuousRecognitionAsync

  • Similar to the other methods, leverage the ExecutorService set up for speech recognition to "await" the get() call on the Future returned by stopContinuousRecognitionAsync.

Play text to speech audio

Recommendation: SpeakTextAsync and SpeakSsmlAsync

  • The Java Speech SDK has 8 methods for speaking text:
    • Return before audio played vs. return after audio completed
    • Async vs. sync
    • Text vs. SSML
  • We should use the async variants that return after audio is completed and expose one for both text and SSML. We can revisit at a later time if we should expose the variant that returns before the audio is played.
  • We should leverage the same ExecutorService that we set up for speech recognition for resolving the Future.

Stop playing text to speech audio

Recommendation: AudioTrack

  • TODO: We need to investigate if calling cancel(true) on the Future<T> returned by the SpeakTextAsync method will stop playing any audio.
  • If it does not, we'll need to set up a custom playback mechanism similar to how it was done for iOS.
  • We need to set up audio playback to support cancellation on Android.

iOS

Recognize from microphone

Recommendation: recognizeOnce

  • Cordova plugins run on the main UI thread, so long running tasks should be invoked on a background thread (see this iOS overview).
  • Using either recognizeOnceAsync or recognizeOnce seems to block the calling thread until speech recognition is complete, so the simplest option is to just use recognizeOnce and run it on a background thread.

Stop recognizing from microphone

Recommendation: stopContinuousRecognition

  • stopContinuousRecognition will block the calling thread until complete. I believe this is desired behavior, so we may not need to switch this call onto a background thread, but it may lead to a poor UX if the stop / cancel operation takes too long.

Play text to speech audio

Recommendation: speakText and speakSsml

  • iOS does not have an async variant for these methods. Same recommendation as for Android.

Stop playing text to speech audio

Recommendation: AVAudioPlayer.stop

  • There currently is no cancellation option that can be invoked from the Speech SDK.
  • We have experimented with AVAudioPlayer, on which we can call the stop method to cancel any active audio.
  • It may be fine to do this from the main thread.

@rozele
Copy link
Collaborator

rozele commented Jan 22, 2020

A minor modification to the above, I actually recommend we do not use the Speech SDK for text-to-speech playback, because it does not support cancellation. Using the Speech SDK to get the audio data to play on something like AVAudioPlayer (iOS) or AudioTrack (Android) is not as efficient as piping the bytes directly from a REST call to Cognitive Services.

I believe we can efficiently use the Speech SDK to stream audio, if we leverage the PushAudioOutputStream and direct the bytes written to the output stream directly to the AVAudioPlayer / AudioTrack.

SpeechConfig speechConfig = SpeechConfig.fromSubscription(...);
CustomPushAudioOutputStreamCallback callback = new CustomPushAudioOutputStreamCallback();
PushAudioOutputStream outputStream = PushAudioOutputStream.create(callback);
AudioConfig audioConfig = AudioConfig.fromStreamOutput(outputStream);
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design User Case around a design discussion documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants