Issue with load_audio function from UploadFile input #2482
Replies: 1 comment
-
|
The issue is most likely that you're passing the raw bytes from FastAPI's str # file pathor bytes # complete audio file contentsand FFmpeg is failing to decode the data from stdin. First thing to checkPrint: audio = await file.read()
print(type(audio))
print(len(audio))
print(file.filename)
print(file.content_type)You should see something like: If Simplest solution: save to a temporary fileInstead of piping bytes directly to FFmpeg, write the upload to disk and let Whisper load it normally. Example: from tempfile import NamedTemporaryFile
@app.post("/transcribe")
async def transcribe(file: UploadFile):
with NamedTemporaryFile(delete=False, suffix=".m4a") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
result = model.transcribe(tmp_path)
return {"text": result["text"]}This is by far the easiest and most reliable approach. Why your code may failYou're doing: audio = await file.read()
audio_array = load_audio(audio)which enters: -i pipe:and then: out.communicate(input=file)This only works if FFmpeg can correctly detect the format from the incoming byte stream. Some formats work well via stdin: Others are less reliable depending on the container and FFmpeg build: especially when metadata or seeking is involved. Improve the error loggingRight now you're suppressing useful FFmpeg output: "-loglevel", "error"Try temporarily changing it to: "-loglevel", "debug"or remove it completely: cmd = [
"ffmpeg",
"-nostdin",
"-i",
"pipe:",
...
]Then capture stderr: process = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
stdout, stderr = process.communicate(input=file)
print(stderr.decode("utf-8", errors="ignore"))The FFmpeg message will usually reveal the exact problem. Verify FFmpeg installationFrom a terminal run: ffmpeg -versionIf this fails, Whisper won't be able to decode audio. You should see something like: Alternative: use Whisper's built-in loaderOpenAI Whisper already has: import whisper
audio = whisper.load_audio("file.m4a")which is well-tested. If possible, avoid reimplementing Recommended FastAPI patternfrom tempfile import NamedTemporaryFile
@app.post("/transcribe")
async def transcribe(file: UploadFile):
with NamedTemporaryFile(delete=False, suffix=".m4a") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
result = model.transcribe(tmp_path)
return {"text": result["text"]}This avoids:
and is how many production FastAPI + Whisper deployments handle uploads. To diagnose furtherPlease share the actual FFmpeg error message shown in the screenshot (the image isn't included in the discussion text). The exact stderr output from FFmpeg will usually identify the root cause immediately. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone.
I am using whisper with FastApi and trying to pass my uploaded file (which is an audio .m4a) to transcribe function.
As suggested in this discussion https://github.com/openai/whisper/discussions/380, I understand that I need to use load_audio function to pass the result to transcribe function because there is a preprocessing by ffmeg behind.
But for a reason that I don't understand, I have an error when ffmpeg command is executed:

Here's my code:
For this one, I've inspire me from the discussion above and the beautiful work of @jianfch which give more logs here https://github.com/jianfch/stable-ts/blob/main/stable_whisper/audio/utils.py
Hope someone can help me on it!
I'm a newbie to Pyhton and this is my first discussion.
Sorry if I'm mistaken on some things.
Beta Was this translation helpful? Give feedback.
All reactions