Skip to content

Commit

Permalink
🔖 Release 2.3.901 (#47)
Browse files Browse the repository at this point in the history
2.3.901 (2023-11-26)
====================

- Small performance improvement while in HTTP/1.1
- Any string passed down to the body will enforce a default
``Content-Type: text/plain; charset=utf-8`` for safety, unless
you specified a ``Content-Type`` header yourself. The ``charset``
parameter will always be set to ``utf-8``.
It is recommended that you pass ``bytes`` instead of a plain string. If
a conflicting charset has been set that
  does not refer to utf-8, a warning will be raised.
- Added callable argument in ``urlopen``, and ``request`` named
``on_upload_body`` that enables you to track
body upload progress for a single request. It takes 4 positional
arguments, namely:
(total_sent: int, total_to_be_sent: int | None, is_completed: bool,
any_error: bool)
total_to_be_sent may be set to None if we're unable to know in advance
the total size (blind iterator/generator).
- Fixed a rare case where ``ProtocolError`` was raised instead of
expected ``IncompleteRead`` exception.
- Improved HTTP/3 overall performance.
- Changed the default max connection per host for (http, https) pools
managed by ``PoolManager``.
If the ``PoolManager`` is instantiated with ``num_pools=10``, each
(managed) subsequent pool will have ``maxsize=10``.
- Improved performance in a multithreading context while using many
multiplexed connections.
- Changed the default max saturated multiplexed connections to 64 as the
minimum.
Now a warning will be fired if you reach the maximum capacity of stored
saturated multiplexed connections.
  • Loading branch information
Ousret authored Nov 26, 2023
1 parent 1eb7ea8 commit 2cbc25b
Show file tree
Hide file tree
Showing 16 changed files with 407 additions and 26 deletions.
20 changes: 20 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
2.3.901 (2023-11-26)
====================

- Small performance improvement while in HTTP/1.1
- Any string passed down to the body will enforce a default ``Content-Type: text/plain; charset=utf-8`` for safety, unless
you specified a ``Content-Type`` header yourself. The ``charset`` parameter will always be set to ``utf-8``.
It is recommended that you pass ``bytes`` instead of a plain string. If a conflicting charset has been set that
does not refer to utf-8, a warning will be raised.
- Added callable argument in ``urlopen``, and ``request`` named ``on_upload_body`` that enable you to track
body upload progress for a single request. It takes 4 positional arguments, namely:
(total_sent: int, total_to_be_sent: int | None, is_completed: bool, any_error: bool)
total_to_be_sent may be set to None if we're unable to know in advance the total size (blind iterator/generator).
- Fixed a rare case where ``ProtocolError`` was raised instead of expected ``IncompleteRead`` exception.
- Improved HTTP/3 overall performance.
- Changed the default max connection per host for (http, https) pools managed by ``PoolManager``.
If the ``PoolManager`` is instantiated with ``num_pools=10``, each (managed) subsequent pool will have ``maxsize=10``.
- Improved performance while in a multithreading context while using many multiplexed connections.
- Changed the default max saturated multiplexed connections to 64 as the minimum.
Now a warning will be fired if you reach the maximum capacity of stored saturated multiplexed connections.

2.3.900 (2023-11-18)
====================

Expand Down
26 changes: 26 additions & 0 deletions docs/advanced-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -784,3 +784,29 @@ You may give your certificate to urllib3.future this way::
r = https_pool.request("GET", "/")

.. note:: If your platform isn't served by this feature it will raise a warning and ignore the certificate.

Monitor upload progress
-----------------------

You can, since version 2.3.901, monitor upload progress.
To do so, pass on to the argument ``on_upload_body`` a callable that accept 4 positional arguments.

The arguments are as follow: ``total_sent: int, content_length: int | None, is_completed: bool, any_error: bool``.

- total_sent: Amount of bytes already sent
- content_length: Expected total bytes to be sent
- is_completed: Flag that indicate end of transmission (body)
- any_error: If anything goes wrong during upload, will be set to True

.. warning:: content_length might be set to ``None`` in case that we couldn't infer the actual body length. Can happen if body is an iterator or generator. In that case you still can manually provide a valid ``Content-Length`` header.

See the following example::

from urllib3 import PoolManager

def track(total_sent: int, content_length: int | None, is_completed: bool, any_error: bool) -> None:
print(f"{total_sent} / {content_length} bytes", f"{is_completed=} {any_error=}")

with PoolManager() as pm:
resp = pm.urlopen("POST", "https://httpbin.org/post", data=b"foo"*1024*10, on_upload_body=track)

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ filterwarnings = [
# https://github.com/pytest-dev/pytest/issues/10977
'''default:ast\.(Num|NameConstant|Str) is deprecated and will be removed in Python 3\.14; use ast\.Constant instead:DeprecationWarning:_pytest''',
'''default:Attribute s is deprecated and will be removed in Python 3\.14; use value instead:DeprecationWarning:_pytest''',
'''default:A conflicting charset has been set in Content-Type:UserWarning''',
]

[tool.isort]
Expand Down
2 changes: 1 addition & 1 deletion src/urllib3/_version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This file is protected via CODEOWNERS
from __future__ import annotations

__version__ = "2.3.900"
__version__ = "2.3.901"
38 changes: 35 additions & 3 deletions src/urllib3/backend/hface.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
)
from ..exceptions import (
EarlyResponse,
IncompleteRead,
InvalidHeader,
ProtocolError,
ResponseNotReady,
Expand Down Expand Up @@ -471,9 +472,7 @@ def __exchange_until(
Can be used for the initial handshake for instance."""
assert self._protocol is not None
assert self.sock is not None
assert (maximal_data_in_read is not None and maximal_data_in_read >= 0) or (
maximal_data_in_read is None
)
assert maximal_data_in_read is None or maximal_data_in_read >= 0

data_out: bytes
data_in: bytes
Expand Down Expand Up @@ -526,6 +525,14 @@ def __exchange_until(
try:
self._protocol.bytes_received(data_in)
except self._protocol.exceptions() as e:
# h2 has a dedicated exception for IncompleteRead (InvalidBodyLengthError)
# we convert the exception to our "IncompleteRead" instead.
if hasattr(e, "expected_length") and hasattr(
e, "actual_length"
):
raise IncompleteRead(
partial=e.actual_length, expected=e.expected_length
) from e # Defensive:
raise ProtocolError(e) from e # Defensive:

if receive_first is True:
Expand Down Expand Up @@ -558,6 +565,31 @@ def __exchange_until(
"TLS over QUIC did not succeed (Error 298). Chain certificate verification failed."
)

# we shall convert the ProtocolError to IncompleteRead
# so that users aren't caught off guard.
try:
if (
event.message
and "without sending complete message body" in event.message
):
msg = event.message.replace(
"peer closed connection without sending complete message body ",
"",
).strip("()")

received, expected = tuple(msg.split(", "))

raise IncompleteRead(
partial=int(
"".join(c for c in received if c.isdigit()).strip()
),
expected=int(
"".join(c for c in expected if c.isdigit()).strip()
),
)
except (ValueError, IndexError):
pass

raise ProtocolError(event.message)
elif isinstance(event, StreamResetReceived):
raise ProtocolError(
Expand Down
51 changes: 50 additions & 1 deletion src/urllib3/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
import socket
import typing
import warnings
from socket import timeout as SocketTimeout

if typing.TYPE_CHECKING:
Expand Down Expand Up @@ -317,6 +318,8 @@ def request(
preload_content: bool = True,
decode_content: bool = True,
enforce_content_length: bool = True,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None]
| None = None,
) -> ResponsePromise:
# Update the inner socket's timeout value to send the request.
# This only triggers if the connection is re-used.
Expand Down Expand Up @@ -356,10 +359,12 @@ def request(
blocksize=self.blocksize,
force=self._svn != HttpVersion.h11,
)
is_sending_string = chunks_and_cl.is_string
chunks = chunks_and_cl.chunks
content_length = chunks_and_cl.content_length

overrule_content_length: bool = False
enforce_charset_transparency: bool = False

# users may send plain 'str' and assign a Content-Length that will
# disagree with the actual amount of data to send (encoded, aka. bytes)
Expand All @@ -370,6 +375,15 @@ def request(
):
overrule_content_length = True

# We shall make our intent clear as we are sending a string.
# Not being explicit is like doing the same mistake as the early 2k years.
# No more guessing game based on "Our time make X prevalent, no need to say it! It will never change!" ><'
if is_sending_string:
if "content-type" in header_keys:
enforce_charset_transparency = True
else:
self.putheader("Content-Type", "text/plain; charset=utf-8")

# When chunked is explicit set to 'True' we respect that.
if chunked:
if "transfer-encoding" not in header_keys:
Expand All @@ -392,6 +406,25 @@ def request(
for header, value in headers.items():
if overrule_content_length and header.lower() == "content-length":
value = str(content_length)
if enforce_charset_transparency and header.lower() == "content-type":
value_lower = value.lower()
if "charset" not in value_lower:
value = value.strip("; ")
value = f"{value}; charset=utf-8"
else:
if (
"utf-8" not in value_lower
and "utf_8" not in value_lower
and "utf8" not in value_lower
):
warnings.warn(
"A conflicting charset has been set in Content-Type while sending a 'string' as the body. "
"Beware that urllib3.future always encode a string to unicode. "
f"Expected 'charset=utf-8', got: {value} "
"Either encode your string to bytes or open your file in bytes mode.",
UserWarning,
stacklevel=2,
)
self.putheader(header, value)

try:
Expand All @@ -406,6 +439,8 @@ def request(
rp.set_parameter("response_options", response_options)
return rp

total_sent = 0

try:
# If we're given a body we start sending that in chunks.
if chunks is not None:
Expand All @@ -417,10 +452,24 @@ def request(
if isinstance(chunk, str):
chunk = chunk.encode("utf-8")
self.send(chunk)
total_sent += len(chunk)
if on_upload_body is not None:
on_upload_body(total_sent, content_length, False, False)
rp = self.send(b"", eot=True)
if on_upload_body is not None:
on_upload_body(total_sent, content_length, True, False)
except EarlyResponse as e:
rp = e.promise
if on_upload_body is not None:
on_upload_body(total_sent, content_length, False, True)
except BrokenPipeError as e:
if on_upload_body is not None:
on_upload_body(
total_sent,
content_length,
total_sent == content_length,
total_sent != content_length,
)
rp = e.promise # type: ignore[attr-defined]
assert rp is not None
rp.set_parameter("response_options", response_options)
Expand Down Expand Up @@ -526,7 +575,7 @@ def __init__(
ca_cert_data: None | str | bytes = None,
ssl_minimum_version: int | None = None,
ssl_maximum_version: int | None = None,
ssl_version: int | str | None = None, # Deprecated
ssl_version: int | str | None = None,
cert_file: str | None = None,
key_file: str | None = None,
key_password: str | None = None,
Expand Down
51 changes: 43 additions & 8 deletions src/urllib3/connectionpool.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,9 @@ def __init__(

self._maxsize = maxsize
self.pool: queue.LifoQueue[typing.Any] | None = self.QueueCls(maxsize)
self.saturated_pool: queue.LifoQueue[typing.Any] = self.QueueCls(maxsize)
self.saturated_pool: queue.LifoQueue[typing.Any] = self.QueueCls(
64 if maxsize < 64 else maxsize
)
self.block = block

self.proxy = _proxy
Expand Down Expand Up @@ -316,6 +318,12 @@ def _put_conn(self, conn: HTTPConnection | None) -> None:
try:
self.saturated_pool.put(conn, block=False)
except queue.Full:
warnings.warn(
"Unable to keep aside a multiplexed connection. You will loose access to the responses. "
"You need to either increase the pool maxsize or collect responses to avoid this.",
UserWarning,
stacklevel=2,
)
self.num_connections -= 1
conn.close()
return
Expand Down Expand Up @@ -425,14 +433,14 @@ def get_response(
If none available, return None.
"""
connections = []
irrelevant_connections = []
response: HTTPResponse | None = None

while True:
try:
conn = self.saturated_pool.get(self.block)
connections.append(conn)
except queue.Empty:
break
try:
conn = self.saturated_pool.get(False)
connections.append(conn)
except queue.Empty:
pass

if not connections:
while True:
Expand All @@ -441,7 +449,14 @@ def get_response(
except ValueError:
break

connections.append(conn)
if conn.is_idle is False:
connections.append(conn)
break
else:
irrelevant_connections.append(conn)

for conn in irrelevant_connections:
self._put_conn(conn)

if not connections:
return None
Expand Down Expand Up @@ -612,6 +627,7 @@ def _make_request(
decode_content: bool = ...,
enforce_content_length: bool = ...,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = ...,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None] = ...,
*,
multiplexed: Literal[True],
) -> ResponsePromise:
Expand All @@ -633,6 +649,7 @@ def _make_request(
decode_content: bool = ...,
enforce_content_length: bool = ...,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = ...,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None] = ...,
*,
multiplexed: Literal[False] = ...,
) -> HTTPResponse:
Expand All @@ -653,6 +670,8 @@ def _make_request(
decode_content: bool = True,
enforce_content_length: bool = True,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = None,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None]
| None = None,
multiplexed: Literal[False] | Literal[True] = False,
) -> HTTPResponse | ResponsePromise:
"""
Expand Down Expand Up @@ -770,6 +789,7 @@ def _make_request(
preload_content=preload_content,
decode_content=decode_content,
enforce_content_length=enforce_content_length,
on_upload_body=on_upload_body,
)
# We are swallowing BrokenPipeError (errno.EPIPE) since the server is
# legitimately able to close the connection after sending a valid response.
Expand Down Expand Up @@ -884,6 +904,7 @@ def urlopen(
preload_content: bool = ...,
decode_content: bool = ...,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = ...,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None] = ...,
*,
multiplexed: Literal[False] = ...,
**response_kw: typing.Any,
Expand All @@ -908,6 +929,7 @@ def urlopen(
preload_content: bool = ...,
decode_content: bool = ...,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = ...,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None] = ...,
*,
multiplexed: Literal[True],
**response_kw: typing.Any,
Expand All @@ -931,6 +953,8 @@ def urlopen(
preload_content: bool = True,
decode_content: bool = True,
on_post_connection: typing.Callable[[ConnectionInfo], None] | None = None,
on_upload_body: typing.Callable[[int, int | None, bool, bool], None]
| None = None,
multiplexed: bool = False,
**response_kw: typing.Any,
) -> HTTPResponse | ResponsePromise:
Expand Down Expand Up @@ -1029,6 +1053,16 @@ def urlopen(
redirect. Typically this won't need to be set because urllib3 will
auto-populate the value when needed.
:param on_post_connection:
Callable to be invoked that will inform you of the connection specifications
for the request to be sent. See ``urllib3.ConnectionInfo`` class for more.
:param on_upload_body:
Callable that will be invoked upon body upload in order to be able to track
the progress. The values are expressed in bytes. It is possible that the total isn't
available, thus set to None. In order, arguments are:
(total_sent, total_to_be_sent, completed, any_error)
:param multiplexed:
Dispatch the request in a non-blocking way, this means that the
response will be retrieved in the future with the get_response()
Expand Down Expand Up @@ -1130,6 +1164,7 @@ def urlopen(
decode_content=decode_content,
enforce_content_length=True,
on_post_connection=on_post_connection,
on_upload_body=on_upload_body,
multiplexed=multiplexed,
)

Expand Down
Loading

0 comments on commit 2cbc25b

Please sign in to comment.