Skip to content

RESUMABLE: Clarify client recovery behavior for 409 Conflict responses #3275

@danielresnick

Description

@danielresnick

This issue is a spin-off from the discussion in #3193 regarding chunk granularity, as suggested by @guoye-zhang.

The core issue is that the current specification is ambiguous about whether clients need to handle a 409 Conflict response. This ambiguity can lead to upload failures in recoverable scenarios, undermining the specs goals.

Problem

A fully spec complaint client can receive a 409 Conflict response due to factors outside the client's direct control e.g. a network interruption causing the client and server to become desynchronized regarding the current Upload-Offset (client thinks a PATCH request has failed, but it actually succeeded).

The Concurrency section of the spec discusses this, and RECOMMENDS an approach to avoid ever serving clients 409 Conflict responses (latest request wins; previous requests are cancelled), however this is a non-binding recommendation and is acknowledged to be onerous / potentially impractical for servers to implement. A more realistic approach for large-scale servers to meet the requirements of the Concurrency section would be to use a strongly consistent database to maintain session state (current offset, etc) along with optimistic concurrency control to ensure versioning.

Even a proactive client can't avoid this race condition. Consider a scenario where a client's PATCH request times out locally, but is still in-flight to the server. To recover, the client sends a HEAD request to get the current offset. The server might respond to this HEAD request before it has processed the original, in-flight PATCH. By the time the client acts on that HEAD response and sends a new PATCH, the original one may have finally completed, advancing the server's offset and causing the new PATCH to be correctly rejected with a 409 Conflict.

As confirmed by @guoye-zhang, major client implementations like Apple's NSURLSession today treat all 4xx responses as fatal errors and do not attempt to retry / recover.

If a client gives up on a recoverable 409 Conflict, the protocol fails to deliver on its promise of reliable uploads over unreliable connections.

Proposed Solution

The spec should be updated to unambiguously state that a client MUST (or at least SHOULD) treat a 409 Conflict response as a recoverable signal.

The recovery mechanism would be via Upload-Offset in the 409 Conflict response and/or the Offset Retrieval mechanism.

Mandating this recovery logic (or at least strongly recommending it) would ensure that it is baked into foundational libraries, providing robust, out-of-the-box reliability for application developers who may not be aware of this specific failure case.

Keen for feedback from others as I’m not sure to what extent this has been discussed previously.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions