Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer-Encoding and Content-Encoding #109

Open
ashtum opened this issue Oct 3, 2024 · 1 comment
Open

Transfer-Encoding and Content-Encoding #109

ashtum opened this issue Oct 3, 2024 · 1 comment

Comments

@ashtum
Copy link
Collaborator

ashtum commented Oct 3, 2024

To implement automatic decoding in the parser, we first need to detect the encoding of the body. This task is complicated by the existence of two headers that determine the encoding: Content-Encoding and Transfer-Encoding. While both influence the decoding process, they serve different purposes. The Transfer-Encoding header, in particular, is designed for use by proxies, as it is a hop-by-hop header applied to a message between two nodes rather than to the resource itself. Consequently, each segment of a multi-node connection may use a different Transfer-Encoding value.

Here is what RFC 7230 says about Transfer-Encoding:

   Transfer-Encoding is primarily intended to accurately
   delimit a dynamically generated payload and to distinguish payload
   encodings that are only applied for transport efficiency or security
   from those that are characteristics of the selected resource.

   A recipient MUST be able to parse the chunked transfer coding
   (Section 4.1) because it plays a crucial role in framing messages
   when the payload body size is not known in advance.  A sender MUST
   NOT apply chunked more than once to a message body (i.e., chunking an
   already chunked message is not allowed).  If any transfer coding
   other than chunked is applied to a request payload body, the sender
   MUST apply chunked as the final transfer coding to ensure that the
   message is properly framed.  If any transfer coding other than
   chunked is applied to a response payload body, the sender MUST either
   apply chunked as the final transfer coding or terminate the message
   by closing the connection.

   For example,

     Transfer-Encoding: gzip, chunked

   indicates that the payload body has been compressed using the gzip
   coding and then chunked using the chunked coding while forming the
   message body.

   Unlike Content-Encoding (Section 3.1.2.1 of [RFC7231]),
   Transfer-Encoding is a property of the message, not of the
   representation, and any recipient along the request/response chain
   MAY decode the received transfer coding(s) or apply additional
   transfer coding(s) to the message body, assuming that corresponding
   changes are made to the Transfer-Encoding field-value.  Additional
   information about the encoding parameters can be provided by other
   header fields not defined by this specification.

However, searching through the internet, it seems that in practice, only chunked Transfer-Encoding is commonly implemented by servers and client tools:

Another complicating factor is the potential for Content-Encoding to contain multiple encoding methods. These methods must be decoded in the order in which they were applied, but our current design only supports a single decoder (filter):

Content-Encoding: deflate, gzip

I couldn't find sufficient evidence to determine whether multiple encoding methods are commonly used in practice. The closest related discussion I found is : how to disable Nginx double gzip encoding.

Assuming that multiple encodings in Content-Encoding are rarely encountered, the following approach could be considered for implementation:

  • Make automatic decoding optional and configurable by the user. This feature can be helpful in cases where users may prefer to receive encoded data as-is, such as in a proxy application.
  • Disregard the possibility of any Transfer-Encoding other than chunked, though ensure it is parsed correctly.
  • To automatically select the appropriate decoder filter, check only the Content-Encoding header.
  • Provide an interface that allows users to selectively apply a decoder (or filter). This would be useful in niche scenarios, such as when interacting with a server that uses Transfer-Encoding for compression.
@vinniefalco
Copy link
Member

Currently the decision to encode or decode is a manual process delegated to the user. For now I think this is fine, as it lets us develop the rest of the code which is more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants