new-contrib: Audio Whisper API with Local Device Microphones #1271

CarlKho-Minerva · 2024-07-06T06:08:01Z

Summary

This PR adds a new notebook that demonstrates how to use the Whisper API to transcribe text from your device's microphone. The notebook includes steps to record audio, transcribe it using the Whisper API, and copy the transcription to the clipboard. It aims to provide a practical guide for users who want to integrate speech-to-text functionality into their applications.

*This pull request was written by Chat GPT and reviewed by a human. The article, however, is made by a human.

Motivation

This tutorial was created because the functionality to transcribe speech to text from a microphone is not well-documented. I found the mic speech-to-text option in the ChatGPT apps (not websites) extremely helpful for day-to-day operations and wanted to save others from having to learn about different audio processing modules.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines (my previous PR message detailed on every one of these 😅):
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct, and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

CarlKho-Minerva · 2024-07-06T06:18:55Z

My previous PR message before I updated. It's mostly justification for each criteria.

Introduction:

This contribution introduces a practical guide on using the Whisper API to transcribe audio from a device's microphone. The notebook includes steps to record audio, transcribe it using the Whisper API, and copy the transcription to the clipboard, providing an accessible and useful resource for AI builders.

Justification:

1. Relevance:
This guide is relevant as it utilizes OpenAI's Whisper API, allowing users to transcribe audio directly from their devices. This functionality aligns with OpenAI's mission to provide practical applications of AI technologies.

2. Usefulness:
The contribution is highly useful for developers who need a reliable method to convert speech to text. It can be used in various applications, such as real-time transcription services, voice command systems, and accessibility tools for individuals with hearing impairments. I found the mic speech-to-text option in the ChatGPT apps (not websites) very helpful for day-to-day operations and wanted to extend this functionality.

3. Uniqueness:
While there are existing examples of using the Whisper API, this notebook uniquely combines multiple functionalities—recording audio, transcribing it, and copying the transcription to the clipboard—into one cohesive guide. This integration simplifies the process for users and provides a complete solution in a single resource. Given that this functionality isn't extensively documented yet, I believe this tutorial can fill an important gap.

4. Clarity:
The notebook is written in clear, easy-to-understand language, with step-by-step instructions and code snippets. It includes detailed comments and explanations, making it accessible even to beginners.

5. Correctness:
The code has been tested and verified for accuracy. It includes all necessary imports and setup instructions, ensuring that users can replicate the process without errors.

6. Conciseness:
The guide is concise yet thorough, covering all essential steps without unnecessary information. It is structured to provide maximum value in a compact format.

7. Completeness:
The contribution is complete, covering everything from setting up microphone permissions to troubleshooting common issues. It provides all necessary context and resources, ensuring users have a comprehensive understanding of the process.

8. Grammar:
The notebook is free from grammatical and spelling errors, ensuring professional quality and readability.

CarlKho-Minerva · 2024-08-19T12:56:28Z

gently bumping this up. willing to revise and have learned a lot about dealing after submitting my hackathon entry for Gemini API ^_^

ibigio

Criteria	Description	Score
Relevance	Is the content related to building with OpenAI technologies? Is it useful to others?	4
Uniqueness	Does the content offer new insights or unique information compared to existing documentation?	4
Clarity	Is the language easy to understand? Are things well-explained? Is the title clear?	4
Correctness	Are the facts, code snippets, and examples correct and reliable? Does everything execute correctly?	2
Conciseness	Is the content concise? Are all details necessary? Can it be made shorter?	4
Completeness	Is the content thorough and detailed? Are there things that weren’t explained fully?	4
Grammar	Are there grammatical or spelling errors present?	4

Really solid contribution, thank you! Motivation is clear, steps are broken down well, and the sections make sense. Caught a few mistakes here and there (mostly to do with using the SDK the old way), but once you correct them you're all set to merge!

examples/Whisper_transcribe_device_microphone.ipynb

…hisper API

CarlKho-Minerva · 2024-08-24T09:28:13Z

Changelog

Hi @ibigio. Heavily revised my article now that I'm a month wiser. :)

Updated Image:

Added whisper_onChatGPTApp_cvk.gif to the images/ directory.

Content Structure:

Removed duplicate table of contents.
Replaced bolded numbered lists with H3 headers for cleaner formatting.

Code Improvements:

Added translation functionality alongside transcription.
Wrapped helper functions into separate functions for better modularity.
Implemented .env for secure API key management.
Specified data types for function parameters.
Updated main function to include is_english parameter for language selection.
Added timed recording option with timed_recording and record_seconds parameters.

OpenAI API Updates:

Updated OpenAI library usage.
Implemented client.audio.translations.create for non-English audio.

Documentation:

Updated docstrings to reflect new functionality.
Added additional demos for transcription and translation.
Updated troubleshooting section and FAQ to cover new features.

Terminology:

Updated text to reflect both "transcribe" and "translate" where appropriate.

Aesthetic Improvements:

Enhanced overall formatting for better readability in VS Code.
Created a more engaging recording when showcasing ChatGPT's Whisper Button interface.

CarlKho-Minerva · 2024-10-20T16:39:05Z

Hope everything is well, @ibigio. Is there anything else you'd want me to modify for this PR? Also, hope you saw my SWE internship application too. 🤭

CarlKho-Minerva · 2024-11-07T21:58:31Z

@gabor-openai @ericning-o @danielin-openai @ray-openai hello folks!

Perhaps @ibigio is caught up in the business amidst the good work he's doing for OAI.

Gently tagging you guys so you could give the green signal in publishing this should it be satisfactory.

Thank you very much for your hardwork!

ibigio

Code seems to run now mostly free of errors, left a couple comments around code clarity and correctness.

ibigio · 2024-11-26T23:56:38Z

examples/Whisper_transcribe_device_microphone.ipynb

Unless the reader speaks Filipino they can't test this part out – how about translating from a more common second language like Spanish?

Also, an indefinite record makes many notebooks crash – set a 5-10 second limit as well.

# Demo: Transcribe lengthy Filipino speech and translate into English with proper grammar and punctuation result = transcribe_audio( debug=False, prompt="Filipino spoken. Proper grammar and punctuation. Skip fillers.", timed_recording=False, record_seconds=0, is_english=False, ) print("\nTranscription/Translation:", result)

ibigio · 2024-11-27T00:01:18Z

examples/Whisper_transcribe_device_microphone.ipynb

Combining transcribing and translating here is a bit weird in this function, and also drops the prompt param for translations. (The prompt should be in english for translation and language of choice in a transcription). I'd split this out into two clear helper functions for translate and transcribe.

def process_audio(file_name, is_english=True, prompt=""): with open(file_name, "rb") as audio_file: if is_english: response = client.audio.transcriptions.create( model="whisper-1", file=audio_file, prompt=prompt ) else: response = client.audio.translations.create( model="whisper-1", file=audio_file ) return response.text.strip()

ibigio · 2024-11-27T00:06:10Z

examples/Whisper_transcribe_device_microphone.ipynb

I don't this this is how we intend for the prompt parameter to be used – looking at our docs, it is more of an example(s) than an instruction.

CarlKho-Minerva · 2024-11-27T00:45:22Z

Thanks @ibigio! Will get these fixed within an hour.

….yaml

…://github.com/CarlKho-Minerva/openai-cookbook into carl-kho/Whisper_API-device_mic_transcription

CarlKho-Minerva added 2 commits July 6, 2024 06:01

new-contrib: submission - Audio Whisper API with Device Microphones

9c75686

chore: Updated yaml info + and typo correcting

117523f

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

687d905

ibigio requested changes Aug 21, 2024

View reviewed changes

CarlKho-Minerva added 2 commits August 24, 2024 17:24

feat: Heavily revise article for device microphone transcription in W…

aced6cd

…hisper API

chore: update authors and registry yaml files

c2eaf92

CarlKho-Minerva added 2 commits August 24, 2024 18:36

chore: *correctly* update yaml files

7b53a86

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

3ab74f5

CarlKho-Minerva requested a review from ibigio August 24, 2024 10:54

QWolfp3 mentioned this pull request Aug 25, 2024

[FEATURE] #1392

Open

CarlKho-Minerva added 5 commits August 28, 2024 20:34

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

5318114

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

64a9070

indent fix

7786f07

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

75e8e52

Merge branch 'main' into carl-kho/Whisper_API-device_mic_transcription

587df2b

ibigio requested changes Nov 27, 2024

View reviewed changes

CarlKho-Minerva added 2 commits November 26, 2024 21:49

fix: clean up merged sections and remove conflict markers in registry…

e81cc13

….yaml

Merge branch 'carl-kho/Whisper_API-device_mic_transcription' of https…

e5d4800

…://github.com/CarlKho-Minerva/openai-cookbook into carl-kho/Whisper_API-device_mic_transcription

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new-contrib: Audio Whisper API with Local Device Microphones #1271

new-contrib: Audio Whisper API with Local Device Microphones #1271

CarlKho-Minerva commented Jul 6, 2024 •

edited

Loading

CarlKho-Minerva commented Jul 6, 2024

CarlKho-Minerva commented Aug 19, 2024

ibigio left a comment

CarlKho-Minerva commented Aug 24, 2024 •

edited

Loading

CarlKho-Minerva commented Oct 20, 2024

CarlKho-Minerva commented Nov 7, 2024

ibigio left a comment

ibigio Nov 26, 2024

ibigio Nov 27, 2024

ibigio Nov 27, 2024

CarlKho-Minerva commented Nov 27, 2024

new-contrib: Audio Whisper API with Local Device Microphones #1271

Are you sure you want to change the base?

new-contrib: Audio Whisper API with Local Device Microphones #1271

Conversation

CarlKho-Minerva commented Jul 6, 2024 • edited Loading

Summary

Motivation

For new content

CarlKho-Minerva commented Jul 6, 2024

Introduction:

Justification:

CarlKho-Minerva commented Aug 19, 2024

ibigio left a comment

Choose a reason for hiding this comment

CarlKho-Minerva commented Aug 24, 2024 • edited Loading

Changelog

CarlKho-Minerva commented Oct 20, 2024

CarlKho-Minerva commented Nov 7, 2024

ibigio left a comment

Choose a reason for hiding this comment

ibigio Nov 26, 2024

Choose a reason for hiding this comment

ibigio Nov 27, 2024

Choose a reason for hiding this comment

ibigio Nov 27, 2024

Choose a reason for hiding this comment

CarlKho-Minerva commented Nov 27, 2024

CarlKho-Minerva commented Jul 6, 2024 •

edited

Loading

CarlKho-Minerva commented Aug 24, 2024 •

edited

Loading