Add support for live transcriptions in calls #15696

danxuliu · 2025-08-20T15:25:00Z

Description

TODO

How to test

Install the live_transcription app (you need to use the master branch rather than the current latest release, as there were API changes since then)
- Ensure that cert_pem, cert_key and rsa_private_key are commented in Janus configuration (as an ECDSA certificate is required for DTLS by the live_transcription app; this will be documented later)
Alternatively, you can fake having the live_transcription app installed and manually send signaling messages. To do that:
- return true; in
  
  spreed/lib/Service/LiveTranscriptionService.php
  
  Line 35 in 52c8792
  
  public function isLiveTranscriptionAppEnabled(?object $appApiPublicFunctions = null): bool {
- return; in
  
  spreed/lib/Service/LiveTranscriptionService.php
  
  Line 82 in 52c8792
  
  public function enable(Room $room, Participant $participant): void {
  
  and
  
  spreed/lib/Service/LiveTranscriptionService.php
  
  Line 108 in 52c8792
  
  public function disable(Room $room, Participant $participant): void {
- return [ 'en' => [ 'name' => 'English', 'metadata' => [ 'separator' => ' ', 'rtl' => false ]]]; in
  
  spreed/lib/Service/LiveTranscriptionService.php
  
  Line 129 in 52c8792
  
  public function getAvailableLanguages(): array {
- Comment
  
  spreed/lib/Service/LiveTranscriptionService.php
  
  Line 158 in 52c8792
  
  $this->requestToExAppLiveTranscription('POST', '/api/v1/call/set-language', $parameters);
Create a new conversation
In the conversation settings, set the language spoken in calls
Start a call
In a private window, join as another participant
If using a real transcription, just say something to be transcribed
If faking the transcription, execute the following in the browser console:

OCA.Talk.SimpleWebRTC.connection.sendCallMessage({ type: 'transcript', message: 'This is a test', langId: 'en', speakerSessionId: OCA.Talk.SimpleWebRTC.connection.getSessionId(), to: OCA.Talk.SimpleWebRTC.webrtc.peers[1].id })

🖌️ UI Checklist

🖼️ Screenshots / Screencasts

🚧 Tasks (or follow ups)

Use the metadata from the language for separators and text direction
Disable transcription after leaving call
Enable transcription on call reconnections
Use TypeScript for the new JavaScript files
Ignore the transcript signaling message if it does not come from an internal client
Convert the transcription button to a split button for moderators to set the language during a call without having to open the full conversation settings?
Remove no longer visible lines in a transcript block; currently the no longer visible transcript blocks are removed, but if a participant is speaking without interruption for a long time all the lines will be still there, which at some point may affect performance when getting the line boundaries (or it may not, but in any case they are unneeded)
Tests

🏁 Checklist

🛠️ API Checklist

🚧 Tasks (or follow ups)

Adjust capability? Is it right as a configuration capability? And should it also check if the external signaling server is used? Or should the clients check that instead and that the MCU feature is set in the signaling server? Should the Talk hash change when the live_transcription app is enabled or disabled (as it will affect if the live_transcription related elements are shown in the UI or not)?
Use default_language and force_language settings when transcriptions are enabled on a conversation without an explicit language set
Add the liveTranscriptionLanguageId property to the list of properties that trigger a signaling message to update the room, and update the value in the clients
Add system message when the language is changed? It does not seem to be worth it, but if it is, the message should only include the id of the language and not resolve the actual name, as that would require a call to the live_transcription app that could delay getting the messages. The name should be resolved in the clients when showing the message (but I do not know if that could have other drawbacks).
Find a way to handle incompatible changes in the live_transcription app (for example, if the API changed in certain version). Maybe the app should provide capabilities?
Tests
Federated calls

🏁 Checklist

⛑️ Tests (unit and/or integration) are included or not possible
📘 API documentation in docs/ has been updated or is not required
🔖 Capability is added or not needed

appinfo/routes/routesLiveTranscriptionController.php

docs/live-transcription.md

lib/Controller/LiveTranscriptionController.php

lib/Migration/Version22000Date20250813122342.php

lib/Service/LiveTranscriptionService.php

src/components/ConversationSettings/LiveTranscriptionSettings.vue

lib/Capabilities.php

danxuliu · 2025-08-28T14:25:29Z

Integration test failures should be unrelated.

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release. Due to that when testing this pull request the migration would need to be manually applied!

Antreesy · 2025-08-28T15:50:24Z

We're always going to have the transcription story on screen, or there's some expiration planned?
Also would be lovely to write new components in the script setup, but we can do it in follow up

Checked the faking example, still feel avatars are a bit redundant, we should have been sticking to displaynames only, but otherwise looks fine

DorraJaouad · 2025-08-28T22:21:44Z

Also would be lovely to write new components in the script setup, but we can do it in follow up

As discussed, we will do the migration.

Transcription should have time shown interval after the last signaling message received of 3 sec or similar
I noticed that some participants can receive the signaling and others not against all receiving raise hand signaling, I am not sure if it is a simulation noise or something with processing the signaling.

But overall, it is awesome 🔥

nickvergessen · 2025-08-29T05:53:26Z

Use default_locale and force_language settings when transcriptions are enabled on a conversation without an explicit language set

I hope you meant default language not default locale? I mean a default locale also would make sense if it helps writing numbers, dates and things, but using the default_language would be a good fall back if there is no force_language

nickvergessen · 2025-08-29T05:55:46Z

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release.

You can do …beta.2.1 btw, also did that in the past already: 5dc0b98

nickvergessen

Okay from PHP side

Antreesy

Frontend-wise OK

src/components/CallView/BottomBar.vue

Antreesy · 2025-08-29T06:55:50Z

src/components/CallView/shared/LiveTranscriptionRenderer.vue

+				this.pendingScrollToBottomLineByLine = undefined
+
+				this.scrollToBottomLineByLine()
+			}, 2000)


Felt to much during the faking tests, but maybe it will be noticable on live ones

Originally I used 1 second, but it felt too quick 🤷 Maybe it could be proportional to the length of the last line or something like that, I intended to implement that but in the end I went for a simpler approach (strange in me :-P )

src/services/liveTranscriptionService.ts

danxuliu · 2025-08-29T08:11:54Z

We're always going to have the transcription story on screen, or there's some expiration planned?

Do you mean removing the transcript after a few seconds without nobody speaking? I would just leave it on screen, but I do not have strong feelings about it.

Also would be lovely to write new components in the script setup, but we can do it in follow up

I tried to, but unfortunately it was taking me way longer than expected so... follow up I guess :-)

Checked the faking example, still feel avatars are a bit redundant, we should have been sticking to displaynames only, but otherwise looks fine

I used them for consistency with the chat, and also I think they are good when someone is speaking for long, as in that case the name would be hidden by the text and the avatar still provides a context.

DorraJaouad · 2025-08-29T08:12:59Z

tiny note : icon should be https://pictogrammers.com/library/mdi/icon/subtitles-outline/

danxuliu · 2025-08-29T08:14:09Z

Also would be lovely to write new components in the script setup, but we can do it in follow up

As discussed, we will do the migration.

👍

* Transcription should have time shown interval after the last signaling message received of 3 sec or similar

So increase from the current two seconds to three seconds when a new line is shown and before scrolling to a next one if more lines appear? Or what do you mean?

* I noticed that some participants can receive the signaling and others not against all receiving raise hand signaling, I am not sure if it is a simulation noise or something with processing the signaling.

That is unexpected 🤔 But let's blame it on the simulation unless you find reproducible steps :-)

But overall, it is awesome 🔥

❤️

danxuliu · 2025-08-29T08:16:21Z

Use default_locale and force_language settings when transcriptions are enabled on a conversation without an explicit language set

I hope you meant default language not default locale? I mean a default locale also would make sense if it helps writing numbers, dates and things, but using the default_language would be a good fall back if there is no force_language

Oops, definitely, typo fixed.

Edit: now really fixed. It helps to press the button to update the text after checking the preview...

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release.

You can do …beta.2.1 btw, also did that in the past already: 5dc0b98

Done.

danxuliu · 2025-08-29T08:19:43Z

tiny note : icon should be https://pictogrammers.com/library/mdi/icon/subtitles-outline/

Done.

Live transcriptions is an optional feature that is only available if the external app "live_transcription" is available. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

The endpoint just forwards the request to the external app "live_transcription", but using a standard Talk endpoint makes possible to abstract that from the clients. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

The transcripts of each participant is shown in its own block with the avatar and name of the participant, and whenever a transcript for a different participant arrives a new block is shown. The transcript area shows four lines of text, which may include the participant name; the name will be hidden once four or more lines of text for the same participant are added. When a new line is added the whole text is immediately scrolled to show the new line. Using a separate span for each transcript chunk is not strictly needed, but it will be used in following commits. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

Rather than just directly showing the next line when a new transcript arrives now the transcript scrolls smoothly to the new line. The scrolling will continue after a small delay if there are more lines until the last one is reached. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

Once a transcript block is no longer visible it is no longer needed, so it is now removed. Note that it would still be necessary to remove no longer visible lines inside the same transcript block, for example, for long speeches, but this is something for the future. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

If set, the language of the room is now used when starting live transcriptions. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

This may influence how the browser renders the transcript chunks, for example, due to specific rules for capitalization. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

Different languages use different separators (for example, Chinese does not use any between kanjis). This detail is included in the language metadata provided by the "live_transcription" app, so the separator added between each transcript chunk now respects the language; in case of a language switch a space is always added. The language metadata is explicitly loaded if it was not available yet before the live transcription is enabled to ensure that it will be available when the transcript is shown. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

The text direction of a language is included in the metadata provided by the "live_transcription" app, so now the transcripts are shown in the right direction (the characters themselves were already shown in the right direction due to the Unicode bidi support, but they could be aligned to the wrong side depending on the main text direction of the UI). Note that the whole transcript block is affected, so also the name and avatar of the author will be affected by the text direction. Due to that a new block is now added when the text direction changes. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

danxuliu added this to the 🪺 Next Major (32) milestone Aug 20, 2025

danxuliu added 2. developing enhancement feature: call 📹 Voice and video calls labels Aug 20, 2025

danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch 5 times, most recently from f138d7b to 86adf2e Compare August 21, 2025 12:42

Antreesy modified the milestones: v22.0.0-beta.1, 🪺 Next Beta (32) Aug 21, 2025

nickvergessen requested changes Aug 26, 2025

View reviewed changes

danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from 86adf2e to 52c8792 Compare August 27, 2025 11:53

nickvergessen reviewed Aug 27, 2025

View reviewed changes

lib/Capabilities.php Outdated Show resolved Hide resolved

danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch 2 times, most recently from 4f08422 to e445862 Compare August 28, 2025 13:31

danxuliu added 3. to review and removed 2. developing labels Aug 28, 2025

danxuliu marked this pull request as ready for review August 28, 2025 14:22

danxuliu requested a review from DorraJaouad August 28, 2025 14:26

nickvergessen added the pending documentation label Aug 29, 2025

nickvergessen approved these changes Aug 29, 2025

View reviewed changes

Antreesy approved these changes Aug 29, 2025

View reviewed changes

danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from e445862 to fa18e0c Compare August 29, 2025 07:17

DorraJaouad approved these changes Aug 29, 2025

View reviewed changes

danxuliu added 13 commits August 29, 2025 11:33

feat: Add capability for live transcriptions in calls

2ce8e7d

Live transcriptions is an optional feature that is only available if the external app "live_transcription" is available. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Add endpoint to enable and disable live transcriptions

6bfe44f

The endpoint just forwards the request to the external app "live_transcription", but using a standard Talk endpoint makes possible to abstract that from the clients. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Add button to enable and disable live transcriptions

9fdef06

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Add property to store the live transcription language of rooms

ee4e17b

If set, the language of the room is now used when starting live transcriptions. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Add endpoints to get and set the live transcription languages

db15bc3

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Set live transcription language from Web UI

a8a7fa9

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Show loading spinner while enabling and disabling transcriptions

a45883d

Signed-off-by: Daniel Calviño Sánchez <[email protected]>

feat: Set "lang" property on transcript chunks

dee1a50

This may influence how the browser renders the transcript chunks, for example, due to specific rules for capitalization. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from b02f384 to 24a3df4 Compare August 29, 2025 09:34

nickvergessen merged commit 0704808 into main Aug 29, 2025
82 checks passed

nickvergessen deleted the add-support-for-live-transcriptions-in-calls branch August 29, 2025 09:55

nickvergessen mentioned this pull request Aug 29, 2025

Release v22.0.0-beta.2 #15752

Merged

This was referenced Sep 1, 2025

Add subtitles to the recorded video nextcloud/nextcloud-talk-recording#51

Open

Handle partial transcript chunks in live transcriptions #15783

Merged

LexioJ mentioned this pull request Sep 10, 2025

Chat overlay in call view (reuse live-transcription layered bubbles for regular chat) #15879

Open

Add support for live transcriptions in calls #15696

Add support for live transcriptions in calls #15696

Uh oh!

Conversation

danxuliu commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How to test

🖌️ UI Checklist

🖼️ Screenshots / Screencasts

🚧 Tasks (or follow ups)

🏁 Checklist

🛠️ API Checklist

🚧 Tasks (or follow ups)

🏁 Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danxuliu commented Aug 28, 2025

Uh oh!

Antreesy commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DorraJaouad commented Aug 28, 2025

Uh oh!

nickvergessen commented Aug 29, 2025

Uh oh!

nickvergessen commented Aug 29, 2025

Uh oh!

nickvergessen left a comment

Choose a reason for hiding this comment

Uh oh!

Antreesy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Antreesy Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

danxuliu Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danxuliu commented Aug 29, 2025

Uh oh!

DorraJaouad commented Aug 29, 2025

Uh oh!

danxuliu commented Aug 29, 2025

Uh oh!

danxuliu commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danxuliu commented Aug 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danxuliu commented Aug 20, 2025 •

edited

Loading

Antreesy commented Aug 28, 2025 •

edited

Loading

danxuliu commented Aug 29, 2025 •

edited

Loading