Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M.E.AI.Abstractions - Speech to Text Abstraction #5838
base: main
Are you sure you want to change the base?
M.E.AI.Abstractions - Speech to Text Abstraction #5838
Changes from all commits
db6b7e3
4bdb7b9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick observation, as you bring
Transcribe
signature back, willISpeechToTextClient
interface also haveTranslate___Async
signatures, is that the rational?My original thinking on having the
Response
naming was to acommodate those two functionalities in the same method, given the change fromIAudioTranscriptionClient
toISpeechToTextClient
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given audio in and text out, what's the difference between transcribe and translate? Isn't the latter still transcription, doing speech recognition to go from audio to text, "just" with a possibly different target language than the audio content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition those are different, and I opted for avoiding any confusion if possible.
Yes, currently this is the main difference, while translation may also have multiple outputs (for multiple different language translations from the same input).
Having a dedicated interface for each has its benefits but the underlying functionality is quite the same.
From AI that was the answer I had.
Yes, there is a difference between speech transcription and translation, and while they are related concepts, one is not necessarily a subset of the other. Let me break it down:
Speech Transcription
Speech transcription involves converting spoken language (audio) into written text in the same language. For example, if someone speaks in English, transcription would produce a written English version of what was said. The focus is on accurately capturing the words, and sometimes additional details like tone, pauses, or speaker identification (e.g., in a multi-speaker setting like a podcast or interview). It’s about representing the spoken content in a textual form without changing the language.
Translation
Translation, on the other hand, involves converting text or speech from one language to another. For example, translating spoken English into written Spanish or spoken French into written English. The goal is to preserve the meaning and intent of the original content while adapting it to a different language, which often requires cultural and linguistic adjustments beyond just word-for-word conversion.
Key Differences
Language Change: Transcription stays within the same language; translation shifts between languages.
Process: Transcription is about capturing what’s said as text, while translation involves interpreting and rephrasing meaning in another language.
Purpose: Transcription is often used for documentation (e.g., court records, subtitles), while translation is used to make content accessible to speakers of other languages.
Can One Be a Subset of the Other?
Not exactly, but they can overlap or be part of a broader process:
Transcription as a Step in Translation: In some workflows, speech is first transcribed into text in the original language, and then that text is translated into another language. For example, a Spanish speech might be transcribed into Spanish text and then translated into English. Here, transcription is a precursor to translation, but it’s not a subset—it’s a distinct step.
Real-Time Speech Translation: Modern technology (like AI-powered interpreters) can combine transcription and translation into a seamless process, where spoken words in one language are directly converted to text or speech in another. In this case, transcription might happen internally as part of the translation pipeline, but they remain separate functions conceptually.
Conclusion
Transcription and translation serve different purposes and operate at different levels of language processing. While they can work together (e.g., transcribe then translate), neither is inherently a subset of the other—they’re distinct tools in the language toolkit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any scenarios where an implementation is expected to mutate this? With chat, this is expected to be a history, but with speech-to-text, presumably it's generally more of a one-and-done kind of thing? Maybe this should be an IEnumerable instead of an IList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I just noticed, this is an
IList<IAsyncEnumerable<DataContent>>
rather than anIAsyncEnumerable<DataContent>
? The intent here is this handles multiple inputs, each of which is an asynchronously produced sequence of content?