Skip to content

Conversation

@danxuliu
Copy link
Member

@danxuliu danxuliu commented Aug 20, 2025

Description

TODO

How to test

  • Install the live_transcription app (you need to use the master branch rather than the current latest release, as there were API changes since then)
    • Ensure that cert_pem, cert_key and rsa_private_key are commented in Janus configuration (as an ECDSA certificate is required for DTLS by the live_transcription app; this will be documented later)
  • Alternatively, you can fake having the live_transcription app installed and manually send signaling messages. To do that:
  • Create a new conversation
  • In the conversation settings, set the language spoken in calls
  • Start a call
  • In a private window, join as another participant
  • If using a real transcription, just say something to be transcribed
  • If faking the transcription, execute the following in the browser console:
OCA.Talk.SimpleWebRTC.connection.sendCallMessage({ type: 'transcript', message: 'This is a test', langId: 'en', speakerSessionId: OCA.Talk.SimpleWebRTC.connection.getSessionId(), to: OCA.Talk.SimpleWebRTC.webrtc.peers[1].id })

🖌️ UI Checklist

🖼️ Screenshots / Screencasts

LiveTranscription LiveTranscription-RTL

🚧 Tasks (or follow ups)

  • Use the metadata from the language for separators and text direction
  • Disable transcription after leaving call
  • Enable transcription on call reconnections
  • Use TypeScript for the new JavaScript files
  • Ignore the transcript signaling message if it does not come from an internal client
  • Convert the transcription button to a split button for moderators to set the language during a call without having to open the full conversation settings?
  • Remove no longer visible lines in a transcript block; currently the no longer visible transcript blocks are removed, but if a participant is speaking without interruption for a long time all the lines will be still there, which at some point may affect performance when getting the line boundaries (or it may not, but in any case they are unneeded)
  • Tests

🏁 Checklist

  • 🌏 Tested with different browsers / clients:
    • Chromium (Chrome / Edge / Opera / Brave)
    • Firefox
    • Safari
    • Talk Desktop
    • Integrations with Files sidebar and other apps
    • Not risky to browser differences / client
  • 🖌️ Design was reviewed, approved or inspired by the design team
  • ⛑️ Tests are included or not possible
  • 📗 User documentation in https://github.com/nextcloud/documentation/tree/master/user_manual/talk has been updated or is not required

🛠️ API Checklist

🚧 Tasks (or follow ups)

  • Adjust capability? Is it right as a configuration capability? And should it also check if the external signaling server is used? Or should the clients check that instead and that the MCU feature is set in the signaling server? Should the Talk hash change when the live_transcription app is enabled or disabled (as it will affect if the live_transcription related elements are shown in the UI or not)?
  • Use default_language and force_language settings when transcriptions are enabled on a conversation without an explicit language set
  • Add the liveTranscriptionLanguageId property to the list of properties that trigger a signaling message to update the room, and update the value in the clients
  • Add system message when the language is changed? It does not seem to be worth it, but if it is, the message should only include the id of the language and not resolve the actual name, as that would require a call to the live_transcription app that could delay getting the messages. The name should be resolved in the clients when showing the message (but I do not know if that could have other drawbacks).
  • Find a way to handle incompatible changes in the live_transcription app (for example, if the API changed in certain version). Maybe the app should provide capabilities?
  • Tests
  • Federated calls

🏁 Checklist

  • ⛑️ Tests (unit and/or integration) are included or not possible
  • 📘 API documentation in docs/ has been updated or is not required
  • 🔖 Capability is added or not needed

@danxuliu danxuliu added this to the 🪺 Next Major (32) milestone Aug 20, 2025
@danxuliu danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch 5 times, most recently from f138d7b to 86adf2e Compare August 21, 2025 12:42
@danxuliu danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from 86adf2e to 52c8792 Compare August 27, 2025 11:53
@danxuliu danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch 2 times, most recently from 4f08422 to e445862 Compare August 28, 2025 13:31
@danxuliu danxuliu marked this pull request as ready for review August 28, 2025 14:22
@danxuliu
Copy link
Member Author

Integration test failures should be unrelated.

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release. Due to that when testing this pull request the migration would need to be manually applied!

@danxuliu danxuliu requested a review from DorraJaouad August 28, 2025 14:26
@Antreesy
Copy link
Contributor

Antreesy commented Aug 28, 2025

We're always going to have the transcription story on screen, or there's some expiration planned?
Also would be lovely to write new components in the script setup, but we can do it in follow up

Checked the faking example, still feel avatars are a bit redundant, we should have been sticking to displaynames only, but otherwise looks fine

@DorraJaouad
Copy link
Contributor

Also would be lovely to write new components in the script setup, but we can do it in follow up

As discussed, we will do the migration.

  • Transcription should have time shown interval after the last signaling message received of 3 sec or similar
  • I noticed that some participants can receive the signaling and others not against all receiving raise hand signaling, I am not sure if it is a simulation noise or something with processing the signaling.

But overall, it is awesome 🔥

@nickvergessen
Copy link
Member

Use default_locale and force_language settings when transcriptions are enabled on a conversation without an explicit language set

I hope you meant default language not default locale? I mean a default locale also would make sense if it helps writing numbers, dates and things, but using the default_language would be a good fall back if there is no force_language

@nickvergessen
Copy link
Member

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release.

You can do …beta.2.1 btw, also did that in the past already: 5dc0b98

Copy link
Member

@nickvergessen nickvergessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay from PHP side

Copy link
Contributor

@Antreesy Antreesy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend-wise OK

this.pendingScrollToBottomLineByLine = undefined
this.scrollToBottomLineByLine()
}, 2000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt to much during the faking tests, but maybe it will be noticable on live ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I used 1 second, but it felt too quick 🤷 Maybe it could be proportional to the length of the last line or something like that, I intended to implement that but in the end I went for a simpler approach (strange in me :-P )

@danxuliu danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from e445862 to fa18e0c Compare August 29, 2025 07:17
@danxuliu
Copy link
Member Author

We're always going to have the transcription story on screen, or there's some expiration planned?

Do you mean removing the transcript after a few seconds without nobody speaking? I would just leave it on screen, but I do not have strong feelings about it.

Also would be lovely to write new components in the script setup, but we can do it in follow up

I tried to, but unfortunately it was taking me way longer than expected so... follow up I guess :-)

Checked the faking example, still feel avatars are a bit redundant, we should have been sticking to displaynames only, but otherwise looks fine

I used them for consistency with the chat, and also I think they are good when someone is speaking for long, as in that case the name would be hidden by the text and the avatar still provides a context.

@DorraJaouad
Copy link
Contributor

tiny note : icon should be https://pictogrammers.com/library/mdi/icon/subtitles-outline/

@danxuliu
Copy link
Member Author

Also would be lovely to write new components in the script setup, but we can do it in follow up

As discussed, we will do the migration.

👍

* Transcription should have time shown interval after the last signaling message received of 3 sec or similar

So increase from the current two seconds to three seconds when a new line is shown and before scrolling to a next one if more lines appear? Or what do you mean?

* I noticed that some participants can receive the signaling and others not against all receiving raise hand signaling, I am not sure if it is a simulation noise or something with processing the signaling.

That is unexpected 🤔 But let's blame it on the simulation unless you find reproducible steps :-)

But overall, it is awesome 🔥

❤️

@danxuliu
Copy link
Member Author

danxuliu commented Aug 29, 2025

Use default_locale and force_language settings when transcriptions are enabled on a conversation without an explicit language set

I hope you meant default language not default locale? I mean a default locale also would make sense if it helps writing numbers, dates and things, but using the default_language would be a good fall back if there is no force_language

Oops, definitely, typo fixed.

Edit: now really fixed. It helps to press the button to update the text after checking the preview...

Note that there was no version bump, as I assumed that it would be better to keep the beta version number matching the actual beta release.

You can do …beta.2.1 btw, also did that in the past already: 5dc0b98

Done.

@danxuliu
Copy link
Member Author

tiny note : icon should be https://pictogrammers.com/library/mdi/icon/subtitles-outline/

Done.

Live transcriptions is an optional feature that is only available if the
external app "live_transcription" is available.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The endpoint just forwards the request to the external app
"live_transcription", but using a standard Talk endpoint makes possible
to abstract that from the clients.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The transcripts of each participant is shown in its own block with the
avatar and name of the participant, and whenever a transcript for a
different participant arrives a new block is shown.

The transcript area shows four lines of text, which may include the
participant name; the name will be hidden once four or more lines of
text for the same participant are added. When a new line is added the
whole text is immediately scrolled to show the new line.

Using a separate span for each transcript chunk is not strictly needed,
but it will be used in following commits.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Rather than just directly showing the next line when a new transcript
arrives now the transcript scrolls smoothly to the new line. The
scrolling will continue after a small delay if there are more lines
until the last one is reached.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Once a transcript block is no longer visible it is no longer needed, so
it is now removed.

Note that it would still be necessary to remove no longer visible lines
inside the same transcript block, for example, for long speeches, but
this is something for the future.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
If set, the language of the room is now used when starting live
transcriptions.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This may influence how the browser renders the transcript chunks, for
example, due to specific rules for capitalization.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Different languages use different separators (for example, Chinese does
not use any between kanjis). This detail is included in the language
metadata provided by the "live_transcription" app, so the separator
added between each transcript chunk now respects the language; in case
of a language switch a space is always added.

The language metadata is explicitly loaded if it was not available yet
before the live transcription is enabled to ensure that it will be
available when the transcript is shown.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The text direction of a language is included in the metadata provided by
the "live_transcription" app, so now the transcripts are shown in the
right direction (the characters themselves were already shown in the
right direction due to the Unicode bidi support, but they could be
aligned to the wrong side depending on the main text direction of the
UI).

Note that the whole transcript block is affected, so also the name and
avatar of the author will be affected by the text direction. Due to that
a new block is now added when the text direction changes.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@danxuliu danxuliu force-pushed the add-support-for-live-transcriptions-in-calls branch from b02f384 to 24a3df4 Compare August 29, 2025 09:34
@nickvergessen nickvergessen merged commit 0704808 into main Aug 29, 2025
82 checks passed
@nickvergessen nickvergessen deleted the add-support-for-live-transcriptions-in-calls branch August 29, 2025 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants