-
Notifications
You must be signed in to change notification settings - Fork 508
Description
Hey @nickvergessen, we are halfway there implementing the live transcription exApp.
The app is connecting to the signaling server, authenticating, receiving the list of participants, getting audio streams, transcoding the streams to feed them to the transcription engine, producing transcriptions and sending them as Talk chat messages (for now).
We are still in the process of making the app react to people joining and leaving calls, managing all the parallel transcriptions subprocesses etc...
We now have a clearer idea of how Talk and the live transcription app can interact.
When a participant joins a call
On the UI side, the users could be able to choose whether they want to receive/see the transcriptions with a checkbox somewhere. Maybe in the "media settings" modal, like the call recording consent. As you wish.
If the user wants to see transcriptions, Talk can send a request to the live_transcription exApp on its /transcribeCall endpoint with those params:
- roomToken
- sessionId
Here is an example on how to make a request to an exApp endpoint from NC's backend: https://github.com/nextcloud/context_chat/blob/2ea9768bec56d0ea3dbe1551d3680b77f6ea48f4/lib/Service/LangRopeService.php#L123-L130
The transcription messages
The exApp will send signaling messages to all participants who requested the transcriptions.
We are flexible on the format of those messages. The simplest would be:
{
"sessionId": "blabla",
"transcriptionMessage": "there you go",
}We are not fully decided about sending the intermediate transcriptions or only the ones considered "definitive". Let's consider that the app will only send definitive transcriptions. If we change that, we can always add a type (intermediate/definitive) attribute in the signaling messages.
Questions
- Are you fine with all that?
- Did we forget something?
- Are there some decisions we should make all together?