Issue with load_audio function from UploadFile input #2482

Waligator97420 · 2025-01-03T20:27:39Z

Waligator97420
Jan 3, 2025

Hello everyone.

I am using whisper with FastApi and trying to pass my uploaded file (which is an audio .m4a) to transcribe function.

As suggested in this discussion https://github.com/openai/whisper/discussions/380, I understand that I need to use load_audio function to pass the result to transcribe function because there is a preprocessing by ffmeg behind.

But for a reason that I don't understand, I have an error when ffmpeg command is executed:

Here's my code:

@app.post("/transcribe")
async def transcribe(file: UploadFile):
    audio = await file.read()

    audio_array = load_audio(audio)
    if audio_array is not None and len(audio_array) > 0:
        result = model.transcribe(audio_array)
        return {"text": result["text"]}
    else:
        return {"text": "none"}`

For this one, I've inspire me from the discussion above and the beautiful work of @jianfch which give more logs here https://github.com/jianfch/stable-ts/blob/main/stable_whisper/audio/utils.py

def load_audio(file: Union[str, bytes], sr: int = 16000):
    """
    Open an audio file and read as mono waveform, resampling as necessary

    Parameters
    ----------
    file: (str, bytes)
        The audio file to open or bytes of audio file

    sr: int
        The sample rate to resample the audio if necessary

    Returns
    -------
    A NumPy array containing the audio waveform, in float32 dtype.
    """

    # if isinstance(file, bytes):
    #     inp = file
    #     file = 'pipe:'
    # else:
    #     inp = None

    try:
        # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
        # Requires the ffmpeg CLI package to be installed.
        cmd = [
            "ffmpeg",
            "-nostdin",
            "-threads", "0",
            "-i", file if isinstance(file, str) else "pipe:",
            "-f", "s16le",
            "-ac", "1",
            "-acodec", "pcm_s16le",
            "-ar", str(sr),
            "-"
        ]
        if isinstance(file, str):
            out = subprocess.run(cmd, capture_output=True, check=True).stdout
        else:
            cmd = cmd[:1] + ["-loglevel", "error"] + cmd[1:]
            stdin = subprocess.PIPE if isinstance(file, bytes) else file
            out = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=stdin)
            out = out.communicate(input=file if isinstance(file, bytes) else None)[0]
            if not out:
                raise RuntimeError(f"FFmpeg failed to load audio from bytes ({len(file)}).")
    except (subprocess.CalledProcessError, subprocess.SubprocessError) as e:
        raise RuntimeError(f"FFmpeg failed to load audio: {e.stderr.decode()}") from e

    return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0

Hope someone can help me on it!

I'm a newbie to Pyhton and this is my first discussion.
Sorry if I'm mistaken on some things.

Advait251206 · 2026-06-24T18:35:19Z

Advait251206
Jun 24, 2026

The issue is most likely that you're passing the raw bytes from FastAPI's UploadFile into a version of load_audio() that was originally designed to receive either:

str  # file path

or

bytes  # complete audio file contents

and FFmpeg is failing to decode the data from stdin.

First thing to check

Print:

audio = await file.read()

print(type(audio))
print(len(audio))
print(file.filename)
print(file.content_type)

You should see something like:

<class 'bytes'>
123456
audio.m4a
audio/mp4

If len(audio) is very small or zero, the upload itself may be the problem.

Simplest solution: save to a temporary file

Instead of piping bytes directly to FFmpeg, write the upload to disk and let Whisper load it normally.

Example:

from tempfile import NamedTemporaryFile

@app.post("/transcribe")
async def transcribe(file: UploadFile):

    with NamedTemporaryFile(delete=False, suffix=".m4a") as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    result = model.transcribe(tmp_path)

    return {"text": result["text"]}

This is by far the easiest and most reliable approach.

Why your code may fail

You're doing:

audio = await file.read()

audio_array = load_audio(audio)

which enters:

-i pipe:

and then:

out.communicate(input=file)

This only works if FFmpeg can correctly detect the format from the incoming byte stream.

Some formats work well via stdin:

wav
flac
mp3

Others are less reliable depending on the container and FFmpeg build:

m4a
mp4
aac

especially when metadata or seeking is involved.

Improve the error logging

Right now you're suppressing useful FFmpeg output:

"-loglevel", "error"

Try temporarily changing it to:

"-loglevel", "debug"

or remove it completely:

cmd = [
    "ffmpeg",
    "-nostdin",
    "-i",
    "pipe:",
    ...
]

Then capture stderr:

process = subprocess.Popen(
    cmd,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

stdout, stderr = process.communicate(input=file)

print(stderr.decode("utf-8", errors="ignore"))

The FFmpeg message will usually reveal the exact problem.

Verify FFmpeg installation

From a terminal run:

ffmpeg -version

If this fails, Whisper won't be able to decode audio.

You should see something like:

ffmpeg version ...

Alternative: use Whisper's built-in loader

OpenAI Whisper already has:

import whisper

audio = whisper.load_audio("file.m4a")

which is well-tested.

If possible, avoid reimplementing load_audio() unless you specifically need in-memory processing.

Recommended FastAPI pattern

from tempfile import NamedTemporaryFile

@app.post("/transcribe")
async def transcribe(file: UploadFile):

    with NamedTemporaryFile(delete=False, suffix=".m4a") as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    result = model.transcribe(tmp_path)

    return {"text": result["text"]}

This avoids:

stdin piping issues
FFmpeg format detection problems
byte-stream edge cases

and is how many production FastAPI + Whisper deployments handle uploads.

To diagnose further

Please share the actual FFmpeg error message shown in the screenshot (the image isn't included in the discussion text). The exact stderr output from FFmpeg will usually identify the root cause immediately.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with load_audio function from UploadFile input #2482

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Issue with load_audio function from UploadFile input #2482

Uh oh!

Waligator97420 Jan 3, 2025

Replies: 1 comment

Uh oh!

Advait251206 Jun 24, 2026

First thing to check

Simplest solution: save to a temporary file

Why your code may fail

Improve the error logging

Verify FFmpeg installation

Alternative: use Whisper's built-in loader

Recommended FastAPI pattern

To diagnose further

Waligator97420
Jan 3, 2025

Advait251206
Jun 24, 2026