Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Replace HTTP+SSE with new "Streamable HTTP" transport #206

Merged
merged 24 commits into from
Mar 24, 2025

Conversation

jspahrsummers
Copy link
Member

@jspahrsummers jspahrsummers commented Mar 17, 2025

This PR introduces the Streamable HTTP transport for MCP, addressing key limitations of the current HTTP+SSE transport while maintaining its advantages.

Our deep appreciation to @atesgoral and @topherbullock (Shopify), @samuelcolvin and @Kludex (Pydantic), @calclavia, Cloudflare, LangChain, Vercel, the Anthropic team, and many others in the MCP community for their thoughts and input! This proposal was only possible thanks to the valuable feedback received in the GitHub Discussion.

TL;DR

As compared with the current HTTP+SSE transport:

  1. We remove the /sse endpoint
  2. All client → server messages go through the /message (or similar) endpoint
  3. All client → server requests could be upgraded by the server to be SSE, and used to send notifications/requests
  4. Servers can choose to establish a session ID to maintain state
  5. Client can initiate an SSE stream with an empty GET to /message

This approach can be implemented backwards compatibly, and allows servers to be fully stateless if desired.

Motivation

Remote MCP currently works over HTTP+SSE transport which:

  • Does not support resumability
  • Requires the server to maintain a long-lived connection with high availability
  • Can only deliver server messages over SSE

Benefits

  • Stateless servers are now possible—eliminating the requirement for high availability long-lived connections
  • Plain HTTP implementation—MCP can be implemented in a plain HTTP server without requiring SSE
  • Infrastructure compatibility—it's "just HTTP," ensuring compatibility with middleware and infrastructure
  • Backwards compatibility—this is an incremental evolution of our current transport
  • Flexible upgrade path—servers can choose to use SSE for streaming responses when needed

Example use cases

Stateless server

A completely stateless server, without support for long-lived connections, can be implemented in this proposal.

For example, a server that just offers LLM tools and utilizes no other features could be implemented like so:

  1. Always acknowledge initialization (but no need to persist any state from it)
  2. Respond to any incoming ToolListRequest with a single JSON-RPC response
  3. Handle any CallToolRequest by executing the tool, waiting for it to complete, then sending a single CallToolResponse as the HTTP response body

Stateless server with streaming

A server that is fully stateless and does not support long-lived connections can still take advantage of streaming in this design.

For example, to issue progress notifications during a tool call:

  1. When the incoming POST request is a CallToolRequest, server indicates the response will be SSE
  2. Server starts executing the tool
  3. Server sends any number of ProgressNotifications over SSE while the tool is executing
  4. When the tool execution completes, the server sends a CallToolResponse over SSE
  5. Server closes the SSE stream

Stateful server

A stateful server would be implemented very similarly to today. The main difference is that the server will need to generate a session ID, and the client will need to pass that back with every request.

The server can then use the session ID for sticky routing or routing messages on a message bus—that is, a POST message can arrive at any server node in a horizontally-scaled deployment, so must be routed to the existing session using a broker like Redis.

Why not WebSocket?

The core team thoroughly discussed making WebSocket the primary remote transport (instead of SSE), and applying similar work to it to make it disconnectable and resumable. We ultimately decided not to pursue WS right now because:

  1. Wanting to use MCP in an "RPC-like" way (e.g., a stateless MCP server that just exposes basic tools) would incur a lot of unnecessary operational and network overhead if a WebSocket is required for each call.
  2. From a browser, there is no way to attach headers (like Authorization), and unlike SSE, third-party libraries cannot reimplement WebSocket from scratch in the browser.
  3. Only GET requests can be transparently upgraded to WebSocket (other HTTP methods are not supported for upgrading), meaning that some kind of two-step upgrade process would be required on a POST endpoint, introducing complexity and latency.

We're also avoiding making WebSocket an additional option in the spec, because we want to limit the number of transports officially specified for MCP, to avoid a combinatorial compatibility problem between clients and servers. (Although this does not prevent community adoption of a non-standard WebSocket transport.)

The proposal in this doc does not preclude further exploration of WebSocket in future, if we conclude that SSE has not worked well.

To do

  • Move session ID responsibility to server
    • Define acceptable space of session IDs
    • Ensure session IDs are introspectable by middleware/WAF
  • Make cancellation explicit
  • Require centralized SSE GET for server -> client requests and notifications
  • Convert resumability into a per-stream concept
  • Design a way to proactively "end session"
  • "if the client has an auth token, it should include it in every MCP request"

Follow ups

  • Standardize support for JSON-RPC batching
  • Support for streaming request bodies?
  • Put some recommendations about timeouts into the spec, and maybe codify conventions like "issuing a progress notification should reset default timeouts."

@jspahrsummers jspahrsummers marked this pull request as ready for review March 17, 2025 10:15
@jspahrsummers jspahrsummers moved this to Consulting in Standards Track Mar 17, 2025
@daviddenton
Copy link

Firstly - thanks for the effort in driving this forward 🙃

Client provides session ID in headers; server can pay attention to this if needed
This feels very unnatural and pretty insecure to me - what was the thinking about the client generating this as opposed to the server generating and signing it (possibly based on the client identity which is determined from authentication credentials)?

An alternative would be for the header to be set on the first response after the initialise request and then for the client to reuse/share it in whatever way they deem appropriate for their use-case.

@gunta
Copy link

gunta commented Mar 17, 2025

We're also avoiding making WebSocket an additional option in the spec

I completely agree with this decision.

While WebSocket and other transports will certainly be needed for some use cases, perhaps a separate "Extended" working group could be officially created and maintained by the community to address these needs - similar to how we could have Core and Extended working groups in the future.

@mitsuhiko
Copy link

Related to the point discussed above about session IDs I think it would be reasonable to ensure that either session IDs are communicated in a way that make routing on a basic load balancer possible or a separate header is added that enables that.

(That’s for folks who do not have fancy-pantsy durable objects ;))

@ukstv

This comment was marked as off-topic.

`text/event-stream` as supported content types.
- The server **MUST** either return `Content-Type: text/event-stream`, to initiate an
SSE stream, or `Content-Type: application/json`, to return a single JSON-RPC
_response_. The client **MUST** support both these cases.
Copy link

@halter73 halter73 Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that EventSource already does not support POST requests meaning that fetch will have to be used by browser-based clients, why not go all the way and allow more than one JSON-RPC message in POST request's streaming request body? That's certainly where my mind goes to when renaming the transport from "HTTP with SSE" to "Streamable HTTP".

While this wouldn't solve the resumability issue by itself, it would vastly simplify the transport. It would be much closer to the stdio transport, and it could potentially support binary data.

And I think it would help with the resumability issue. It greatly simplifies resumability to only have one logical stream per connection like the stdio transport does. That way, you're not stuck managing multiple last-message-ids on the server which seems like a pain.

If a core design principal is that clients should handle complexity where it exists, I'd suggest forcing the client to only have one resumable server-to-client message stream at a time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-RPC supports batching. You can pass an array of requests as a JSON-RPC body.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - in OpenID Provider Commands we are proposing a POST that returns either application/json or text/event-stream https://openid.github.io/openid-provider-commands/main.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting batched client -> server messages makes sense! I honestly had forgotten that aspect of JSON-RPC, since we ignored it for so long. 😬

I would like to extend support to streaming request bodies, but I think we should kick the can on this a bit, as it will probably involve significant discussion of its own.

Copy link
Member Author

@jspahrsummers jspahrsummers Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll punt on batching in this PR as well, as it also affects the stdio transport and has some wider-ranging implications, but basically I agree we should support it.

@colombod
Copy link

So this wouldd be bringing httptreansport and quic protocol to the mix?

@canncupiscent

This comment was marked as spam.

@ahmadawais
Copy link
Contributor

ahmadawais commented Mar 28, 2025

Super excited to see this through.

P.S. @jirispilka The schema.ts link in the release notes is broken.

Sending in a PR #248

@hemanth
Copy link

hemanth commented Mar 30, 2025

Came here looking for this! Awaiting the SDKs to enable Streamable HTTP 🤓

@daviddenton
Copy link

Came here looking for this! Awaiting the SDKs to enable Streamable HTTP 🤓

Am sure the official ones will be along soon, but If you're working with the JVM, you might find the http4k SDK useful - it offers a functional approach with strong testing capabilities. 😉

@CaliViking
Copy link

CaliViking commented Mar 31, 2025

It’s hard to fully follow these threads, but the new proposal still uses HTTP and SSE—just in a revised structure.

The arguments in the “Why not WebSocket?” section seem weak. The first three technical points are factually incorrect or misleading. Basing architecture on protocol misunderstandings risks designing the wrong solution.
❌ “RPC-like use of WebSocket adds overhead” – Actually, WebSocket reduces overhead in high-frequency RPC-like interactions by keeping a persistent connection.
❌ “Can’t attach headers in browser WebSockets” – True, but there are standard workarounds (e.g., query params, subprotocols, cookies).
❌ “Only GET supports upgrade” – Technically true, but irrelevant. WebSocket upgrades over GET are well-supported and not a real-world issue.

Choose the right transport for the job:
1️⃣ HTTP = request-response
2️⃣ SSE = one-way server push
3️⃣ WebSocket = two-way full-duplex
Align transport choices with actual communication needs—especially if MCP aims to support diverse interaction patterns.

@QuantGeekDev
Copy link

@CaliViking I've been thinking about this a lot lately. It does give that impression at times. Maybe we could have a much more constructive conversation all together, it would be useful if someone implemented WS (or the one in sdk) alongside the new http specification, and performs extensive stress testing on it in an environment that approximates production. We should have real data and numbers to be able to argument what the overhead would be, and what tradeoffs that would imply

@ChrisLally
Copy link

Shipped https://inspect.mcp.garden for anyone interested in trying out HTTP+SSE (or just SSE) without needing to run the MCP Inspector locally!

Screenshot 2025-04-04 at 5 12 41 PM

@apryiomka
Copy link

apryiomka commented Apr 7, 2025

HTTP = request-response

we need #1 for agent base turn pattern. Does the SDK have any pre-release version that we can try?

@QuantGeekDev
Copy link

QuantGeekDev commented Apr 8, 2025

Shipped https://inspect.mcp.garden for anyone interested in trying out HTTP+SSE (or just SSE) without needing to run the MCP Inspector locally!

Screenshot 2025-04-04 at 5 12 41 PM

Glad you were able to deploy mcp-debug :) super cool to see my fork in the wild. did you like the stats page? i'm thinking of creating a PR for that to the inspector. good luck with the mcp garden project🚀

@apryiomka
Copy link

Shipped https://inspect.mcp.garden for anyone interested in trying out HTTP+SSE (or just SSE) without needing to run the MCP Inspector locally!

Screenshot 2025-04-04 at 5 12 41 PM

this is not in main, what branch are you running?

@apryiomka
Copy link

Does anyone have a Python SDK version for the server and client for the new HTTP protocol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.