Skip to content

Limit max request size #2155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kludex opened this issue May 26, 2023 Discussed in #1516 · 13 comments · May be fixed by #2328
Open

Limit max request size #2155

Kludex opened this issue May 26, 2023 Discussed in #1516 · 13 comments · May be fixed by #2328
Labels
feature New feature or request
Milestone

Comments

@Kludex
Copy link
Member

Kludex commented May 26, 2023

Discussed in #1516

Originally posted by aviramha April 5, 2020
As discussed in the Gitter, my opinion is that starlette should provide a default limit for request size.
The main reason is that without it, any Starlette application is vulnerable to very easy DoS.
For example, newbie me can write a program as follows:

from starlette.requests import Request
from starlette.responses import Response


async def app(scope, receive, send):
    assert scope['type'] == 'http'
    request = Request(scope, receive)
    body = b''
    json = await request.json()
    response = Response(body, media_type='text/plain')
    await response(scope, receive, send)

As a malicious user, I could send a 30GB sized JSON and cause the memory to go OOM.
Other frameworks support this also - Django, Quart.
My proposal is to add a default limit which can be overrided in the app configuration.

Important

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@adriangb
Copy link
Member

If we're going to make it a configurable middleware it might also make sense to have some sort of timeout for connections and each chunk, maybe infinite by default but definitely tunable.

Another thing to keep in mind is that this is likely something users want to control on a per-endpoint basis. That is, if I have an app that has an upload feature where I'm expecting 1GB files it's likely a single endpoint that expects 1GB files so I'd want to bump up the limits just for that endpoint. That makes me think that the best strategy may be a per-endpoint middleware w/ a companion middleware that just tweaks the config by changing it in scope. That would allow layering and overriding of these settings. This would be similar to #2026

@alex-oleshkevich
Copy link
Member

This is a good one! I also agree that we need a global setting and per-route (Route + Mount).
Use case: global limit - 1mb, photo upload endpoint - 10mb limit.

We can add max_body_size to request.form(), request.json(), and request.stream() functions.

@Kludex Kludex added the feature New feature or request label May 31, 2023
@Kludex Kludex added this to the Version 1.x milestone May 31, 2023
@Kludex
Copy link
Member Author

Kludex commented Jun 24, 2023

Why should the ASGI application be the one to set this instead of the server?

@alex-oleshkevich
Copy link
Member

alex-oleshkevich commented Jun 24, 2023

Why should the ASGI application be the one to set this instead of the server?

Example: global POST limit is 1mb, for selected endpoints that upload files - 100mb.
Setting this at the server level is global and leaves no chance to override it.

@abersheeran
Copy link
Member

Adding a LimitRequestSizeMiddleware is the simplest and forward-compatible way.

@Kludex
Copy link
Member Author

Kludex commented Nov 6, 2023

Adding a LimitRequestSizeMiddleware is the simplest and forward-compatible way.

Yeah. Shall we follow this path?

@adriangb
Copy link
Member

adriangb commented Nov 6, 2023

Yes I think someone should make a PR and we can discuss the details (override vs. min/max, should there be a default, etc.) there.

@adriangb
Copy link
Member

adriangb commented Nov 8, 2023

Yes I think someone should make a PR

I am someone, I made a PR 😆 : #2328

@defnull
Copy link

defnull commented Nov 12, 2024

The PR was closed, but the idea is still on the table, so here are my 2ct:

  • Global request size limits do no work for application that actually want to accept file uploads on some routes. Those uploads are usually spooled to temporary files and not limited by available memory. JSON is parsed into in-memory structures, though. Enforcing the same limit on both types of data is not sensible.
  • Not having reasonable default limits is an invitation for developers to forget about this aspect and write vulnerable applications.
  • How do others do it? That should not matter much, but: Bottle, Django and probably many others do have size limits. Not for the request body, but for what is loaded into memory by functions like Request.json(). Werkzeug/Flask says that calling Request.get_data() is a bad idea without checking request size first, but does it anyway when parsing json. Not the best role model perhaps.
  • Frameworks that parse the request body before calling the request handler function (e.g. FastAPI) make it extra hard to be safe. You cannot check the request body size before the parsing step is triggered.
  • Request.json() is a function, adding a size limit parameter would backwards compatible.

@raceychan
Copy link

raceychan commented Mar 27, 2025

Hi guys:

are you still interested in this? I just scan through your discussion and came up with something like this:

class Request(HTTPConnection):
    _form: FormData | None

    def __init__(
        self,
        scope: Scope,
        receive: Receive = empty_receive,
        send: Send = empty_send,
        max_content_length: int | None = None,
    ):
        super().__init__(scope)
        assert scope["type"] == "http"
        self._receive = receive
        self._send = send
        self._stream_consumed = False
        self._is_disconnected = False
        self._form = None
        if max_content_length is not None:
            assert max_content_length > 0
            if self.headers.get("content-length") > max_content_length:
                raise ValueError("Body too large")
        self._max_content_length = max_content_length

    @property
    def method(self) -> str:
        return typing.cast(str, self.scope["method"])

    @property
    def receive(self) -> Receive:
        return self._receive

    async def stream(self, chunk_size: int | None = None) -> typing.AsyncGenerator[bytes, None]:
        if hasattr(self, "_body"):
            yield self._body
            yield b""
            return
        if self._stream_consumed:
            raise RuntimeError("Stream consumed")
        buffer = bytearray()
        while not self._stream_consumed:
            message = await self._receive()
            if message["type"] == "http.request":
                body = message.get("body", b"")
                buffer.extend(body)  # Append new data to buffer
                if chunk_size:
                    while len(buffer) >= chunk_size:
                        yield buffer[:chunk_size]  # Yield chunk
                        del buffer[:chunk_size]  # Remove yielded data
                if not message.get("more_body", False):
                    self._stream_consumed = True
                    if buffer:
                        yield bytes(buffer)  # Yield remaining buffer data
            elif message["type"] == "http.disconnect":  # pragma: no branch
                self._is_disconnected = True
                raise ClientDisconnect()
        yield b""

    async def body(self, chunk_size: int | None = None) -> bytes:
        if not hasattr(self, "_body"):
            chunks: list[bytes] = []
            async for chunk in self.stream(chunk_size):
                chunks.append(chunk)
            self._body = b"".join(chunks)
        return self._body

    async def json(self) -> typing.Any:
        if not hasattr(self, "_json"):  # pragma: no branch
            body = await self.body()
            self._json = json.loads(body)
        return self._json

    async def _get_form(
        self,
        *,
        max_files: int | float = 1000,
        max_fields: int | float = 1000,
        max_part_size: int = 1024 * 1024,
        chunk_size: int | None = None
    ) -> FormData:
        if self._form is None:  # pragma: no branch
            assert (
                parse_options_header is not None
            ), "The `python-multipart` library must be installed to use form parsing."
            content_type_header = self.headers.get("Content-Type")
            content_type: bytes
            content_type, _ = parse_options_header(content_type_header)
            if content_type == b"multipart/form-data":
                try:
                    multipart_parser = MultiPartParser(
                        self.headers,
                        self.stream(chunk_size),
                        max_files=max_files,
                        max_fields=max_fields,
                        max_part_size=max_part_size,
                    )
                    self._form = await multipart_parser.parse()
                except MultiPartException as exc:
                    if "app" in self.scope:
                        raise HTTPException(status_code=400, detail=exc.message)
                    raise exc
            elif content_type == b"application/x-www-form-urlencoded":
                form_parser = FormParser(self.headers, self.stream(chunk_size))
                self._form = await form_parser.parse()
            else:
                self._form = FormData()
        return self._form

    def form(
        self,
        *,
        max_files: int | float = 1000,
        max_fields: int | float = 1000,
        max_part_size: int = 1024 * 1024,
        chunk_size: int | None = None
    ) -> AwaitableOrContextManager[FormData]:
        return AwaitableOrContextManagerWrapper(
            self._get_form(
                max_files=max_files, max_fields=max_fields, max_part_size=max_part_size, chunk_size=chunk_size
            )
        )

    async def close(self) -> None:
        if self._form is not None:  # pragma: no branch
            await self._form.close()

    async def is_disconnected(self) -> bool:
        if not self._is_disconnected:
            message: Message = {}

            # If message isn't immediately available, move on
            with anyio.CancelScope() as cs:
                cs.cancel()
                message = await self._receive()

            if message.get("type") == "http.disconnect":
                self._is_disconnected = True

        return self._is_disconnected

    async def send_push_promise(self, path: str) -> None:
        if "http.response.push" in self.scope.get("extensions", {}):
            raw_headers: list[tuple[bytes, bytes]] = []
            for name in SERVER_PUSH_HEADERS_TO_COPY:
                for value in self.headers.getlist(name):
                    raw_headers.append(
                        (name.encode("latin-1"), value.encode("latin-1"))
                    )
            await self._send(
                {"type": "http.response.push", "path": path, "headers": raw_headers}
            )
  • This assumes that a request won't have body larger than what is claimed in content-length , last time i checked either httptools or uvicorn verify this

  • Checking content-length is very cheap and easy to do so, if this is to defend ourselves from malicious users this can be quite effective. we might send some fancier error response within request_response

def request_response(
    func: typing.Callable[[Request], typing.Awaitable[Response] | Response],
) -> ASGIApp:
    """
    Takes a function or coroutine `func(request) -> response`,
    and returns an ASGI application.
    """
    f: typing.Callable[[Request], typing.Awaitable[Response]] = (
        func if is_async_callable(func) else functools.partial(run_in_threadpool, func)  # type:ignore
    )

    async def app(scope: Scope, receive: Receive, send: Send) -> None:
        try:
            request = Request(scope, receive, send)
        except ValueError: # we might want something more specific like RequestBodyOverSizedError
            return app_that_send_error_message
        async def app(scope: Scope, receive: Receive, send: Send) -> None:
            response = await f(request)
            await response(scope, receive, send)

        await wrap_app_handling_exceptions(app, request)(scope, receive, send)

    return app
  • This should be backward compatible

@alex-oleshkevich
Copy link
Member

Hi guys:

are you still interested in this? I just scan through your discussion and came up with something like this:

class Request(HTTPConnection):
_form: FormData | None

def __init__(
    self,
    scope: Scope,
    receive: Receive = empty_receive,
    send: Send = empty_send,
    max_content_length: int | None = None,
):
    super().__init__(scope)
    assert scope["type"] == "http"
    self._receive = receive
    self._send = send
    self._stream_consumed = False
    self._is_disconnected = False
    self._form = None
    if max_content_length is not None:
        assert max_content_length > 0
        if self.headers.get("content-length") > max_content_length:
            raise ValueError("Body too large")
    self._max_content_length = max_content_length

@property
def method(self) -> str:
    return typing.cast(str, self.scope["method"])

@property
def receive(self) -> Receive:
    return self._receive

async def stream(self, chunk_size: int | None = None) -> typing.AsyncGenerator[bytes, None]:
    if hasattr(self, "_body"):
        yield self._body
        yield b""
        return
    if self._stream_consumed:
        raise RuntimeError("Stream consumed")
    buffer = bytearray()
    while not self._stream_consumed:
        message = await self._receive()
        if message["type"] == "http.request":
            body = message.get("body", b"")
            buffer.extend(body)  # Append new data to buffer
            if chunk_size:
                while len(buffer) >= chunk_size:
                    yield buffer[:chunk_size]  # Yield chunk
                    del buffer[:chunk_size]  # Remove yielded data
            if not message.get("more_body", False):
                self._stream_consumed = True
                if buffer:
                    yield bytes(buffer)  # Yield remaining buffer data
        elif message["type"] == "http.disconnect":  # pragma: no branch
            self._is_disconnected = True
            raise ClientDisconnect()
    yield b""

async def body(self, chunk_size: int | None = None) -> bytes:
    if not hasattr(self, "_body"):
        chunks: list[bytes] = []
        async for chunk in self.stream(chunk_size):
            chunks.append(chunk)
        self._body = b"".join(chunks)
    return self._body

async def json(self) -> typing.Any:
    if not hasattr(self, "_json"):  # pragma: no branch
        body = await self.body()
        self._json = json.loads(body)
    return self._json

async def _get_form(
    self,
    *,
    max_files: int | float = 1000,
    max_fields: int | float = 1000,
    max_part_size: int = 1024 * 1024,
    chunk_size: int | None = None
) -> FormData:
    if self._form is None:  # pragma: no branch
        assert (
            parse_options_header is not None
        ), "The `python-multipart` library must be installed to use form parsing."
        content_type_header = self.headers.get("Content-Type")
        content_type: bytes
        content_type, _ = parse_options_header(content_type_header)
        if content_type == b"multipart/form-data":
            try:
                multipart_parser = MultiPartParser(
                    self.headers,
                    self.stream(chunk_size),
                    max_files=max_files,
                    max_fields=max_fields,
                    max_part_size=max_part_size,
                )
                self._form = await multipart_parser.parse()
            except MultiPartException as exc:
                if "app" in self.scope:
                    raise HTTPException(status_code=400, detail=exc.message)
                raise exc
        elif content_type == b"application/x-www-form-urlencoded":
            form_parser = FormParser(self.headers, self.stream(chunk_size))
            self._form = await form_parser.parse()
        else:
            self._form = FormData()
    return self._form

def form(
    self,
    *,
    max_files: int | float = 1000,
    max_fields: int | float = 1000,
    max_part_size: int = 1024 * 1024,
    chunk_size: int | None = None
) -> AwaitableOrContextManager[FormData]:
    return AwaitableOrContextManagerWrapper(
        self._get_form(
            max_files=max_files, max_fields=max_fields, max_part_size=max_part_size, chunk_size=chunk_size
        )
    )

async def close(self) -> None:
    if self._form is not None:  # pragma: no branch
        await self._form.close()

async def is_disconnected(self) -> bool:
    if not self._is_disconnected:
        message: Message = {}

        # If message isn't immediately available, move on
        with anyio.CancelScope() as cs:
            cs.cancel()
            message = await self._receive()

        if message.get("type") == "http.disconnect":
            self._is_disconnected = True

    return self._is_disconnected

async def send_push_promise(self, path: str) -> None:
    if "http.response.push" in self.scope.get("extensions", {}):
        raw_headers: list[tuple[bytes, bytes]] = []
        for name in SERVER_PUSH_HEADERS_TO_COPY:
            for value in self.headers.getlist(name):
                raw_headers.append(
                    (name.encode("latin-1"), value.encode("latin-1"))
                )
        await self._send(
            {"type": "http.response.push", "path": path, "headers": raw_headers}
        )
  • This assumes that a request won't have body larger than what is claimed in content-length , last time i checked either httptools or uvicorn verify this
  • Checking content-length is very cheap and easy to do so, if this is to defend ourselves from malicious users this can be quite effective. we might send some fancier error response within request_response

def request_response(
func: typing.Callable[[Request], typing.Awaitable[Response] | Response],
) -> ASGIApp:
"""
Takes a function or coroutine func(request) -> response,
and returns an ASGI application.
"""
f: typing.Callable[[Request], typing.Awaitable[Response]] = (
func if is_async_callable(func) else functools.partial(run_in_threadpool, func) # type:ignore
)

async def app(scope: Scope, receive: Receive, send: Send) -> None:
    try:
        request = Request(scope, receive, send)
    except ValueError: # we might want something more specific like RequestBodyOverSizedError
        return app_that_send_error_message
    async def app(scope: Scope, receive: Receive, send: Send) -> None:
        response = await f(request)
        await response(scope, receive, send)

    await wrap_app_handling_exceptions(app, request)(scope, receive, send)

return app
  • This should be backward compatible

User can put any value into the header and render server unreliable, so the proposed variant is not optimal.

@raceychan
Copy link

raceychan commented Mar 27, 2025

@alex-oleshkevich

yeah, then we might:

  1. maintain a received_bytes_num inside Request.stream and compare it to self.max_content_length
  2. if it exceeds then raise error

the thing is through, if we assume server is not realiable, and receive might return arbitrarily large body then there is nothing we can do, since once we await receive() it is in our memeory, unless we implement a server that is realiable, right?

@defnull
Copy link

defnull commented Mar 27, 2025

if we assume server is not realiable

WSGI defines that servers "should not" pass more bytes to the application than specified in the Content-Length header, if present. I skimmed the ASGI spec and it seems to be totally silent about this detail. Which means IMHO that servers cannot be trusted to actually enforce Content-Length. They should, but not doing so is not a bug.

But does that really matter for this discussion? Content-Length is optional anyway. If it's missing, then the ASGI app cannot know the content size in advance. The request may be Transfer-Encoding: chunked or terminated HTTP/1.0 style by closing half the socket. HTTP allows arbitrary large uploads of unknown size.

A safeguard to prevent OOMs or other resource exhaustion attacks should never depend on a client specified header or undefined server behavior. The only reliable way to enforce such a limit is to count received bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
6 participants