Skip to content

Commit 5d82a37

Browse files
committed
expose execute() in HttpClient and rename batch_requests() into batch_execute()
1 parent 80f95a8 commit 5d82a37

File tree

4 files changed

+103
-40
lines changed

4 files changed

+103
-40
lines changed

docs/advanced/additional-requests.rst

Lines changed: 69 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ properly extract data for some websites.
2323
with today's websites which relies on a lot of page interactions to display
2424
its contents.
2525

26+
.. _`httprequest-example`:
2627

2728
HttpRequest
2829
===========
@@ -263,21 +264,67 @@ additional requests asynchronously using ``asyncio.gather()``, ``asyncio.wait()`
263264
etc. This means that ``asyncio`` could be used anywhere inside the Page Object,
264265
including the ``to_item()`` method.
265266
266-
In the previous section, we've explored how :class:`~.HttpRequest` are defined.
267-
Fortunately, the :meth:`~.HttpClient.request`, :meth:`~.HttpClient.get`, and
268-
:meth:`~.HttpClient.post` methods of :class:`~.HttpClient` already defines the
269-
:class:`~.HttpRequest` and executes it as well. The only time you'll need to create
270-
:class:`~.HttpRequest` manually is via the :meth:`~.HttpClient.batch_requests`
271-
method which is described in this section: :ref:`http-batch-request-example`.
272-
267+
In the previous section, we've explored how :class:`~.HttpRequest` is defined.
273268
Let's see a few quick examples to see how to execute additional requests using
274269
the :class:`~.HttpClient`.
275270
271+
Executing a HttpRequest instance
272+
--------------------------------
273+
274+
.. code-block:: python
275+
276+
import attrs
277+
import web_poet
278+
279+
280+
@attrs.define
281+
class ProductPage(web_poet.ItemWebPage):
282+
http_client: web_poet.HttpClient
283+
284+
async def to_item(self):
285+
item = {
286+
"url": self.url,
287+
"name": self.css("#main h3.name ::text").get(),
288+
"product_id": self.css("#product ::attr(product-id)").get(),
289+
}
290+
291+
# Simulates clicking on a button that says "View All Images"
292+
request = web_poet.HttpRequest(f"https://api.example.com/v2/images?id={item['product_id']}")
293+
response: web_poet.HttpResponse = await self.http_client.execute(request)
294+
295+
item["images"] = response.css(".product-images img::attr(src)").getall()
296+
return item
297+
298+
As the example suggests, we're performing an additional request that allows us
299+
to extract more images in a product page that might not otherwise be possible.
300+
This is because in order to do so, an additional button needs to be clicked
301+
which fetches the complete set of product images via AJAX.
302+
303+
There are a few things to take note of this example:
304+
305+
* Recall from the :ref:`httprequest-example` tutorial section that the
306+
default method is ``GET``.
307+
* We're now using the ``async/await`` syntax inside the ``to_item()`` method.
308+
* The response from the additional request is of type :class:`~.HttpResponse`.
309+
310+
.. tip::
311+
312+
See the :ref:`http-batch-request-example` tutorial section to see how to
313+
execute a group of :class:`~.HttpRequest` in batch.
314+
315+
Fortunately, there are already some quick shortcuts on how to perform single
316+
additional requests using the :meth:`~.HttpClient.request`, :meth:`~.HttpClient.get`,
317+
and :meth:`~.HttpClient.post` methods of :class:`~.HttpClient`. These already
318+
define the :class:`~.HttpRequest` and executes it as well.
319+
276320
.. _`httpclient-get-example`:
277321
278322
A simple ``GET`` request
279323
------------------------
280324
325+
Let's use the example from the previous section and use the :meth:`~.HttpClient.get`
326+
method on it.
327+
281328
.. code-block:: python
282329
283330
import attrs
@@ -306,13 +353,8 @@ There are a few things to take note in this example:
306353
307354
* A ``GET`` request can be done via :class:`~.HttpClient`'s
308355
:meth:`~.HttpClient.get` method.
309-
* We're now using the ``async/await`` syntax inside the ``to_item()`` method.
310-
* The response from the additional request is of type :class:`~.HttpResponse`.
311-
312-
As the example suggests, we're performing an additional request that allows us
313-
to extract more images in a product page that might not otherwise be possible.
314-
This is because in order to do so, an additional button needs to be clicked
315-
which fetches the complete set of product images via AJAX.
356+
* There was no need to instantiate a :class:`~.HttpRequest` since :meth:`~.HttpClient.get`
357+
already handles it before executing the request.
316358
317359
.. _`request-post-example`:
318360
@@ -378,16 +420,17 @@ Batch requests
378420
--------------
379421
380422
We can also choose to process requests by **batch** instead of sequentially or
381-
one by one. The :meth:`~.HttpClient.batch_requests` method can be used for this
382-
which accepts an arbitrary number of :class:`~.HttpRequest` instances.
423+
one by one (e.g. using :meth:`~.HttpClient.execute`). The :meth:`~.HttpClient.batch_execute`
424+
method can be used for this which accepts an arbitrary number of :class:`~.HttpRequest`
425+
instances.
383426
384427
Let's modify the example in the previous section to see how it can be done.
385428
386429
The difference for this code example from the previous section is that we're
387430
increasing the pagination from only the **2nd page** into the **10th page**.
388431
Instead of calling a single :meth:`~.HttpClient.post` method, we're creating a
389432
list of :class:`~.HttpRequest` to be executed in batch using the
390-
:meth:`~.HttpClient.batch_requests` method.
433+
:meth:`~.HttpClient.batch_execute` method.
391434
392435
.. code-block:: python
393436
@@ -415,7 +458,7 @@ list of :class:`~.HttpRequest` to be executed in batch using the
415458
self.create_request(item["product_id"], page_num=page_num)
416459
for page_num in range(2, self.default_pagination_limit)
417460
]
418-
responses: List[web_poet.HttpResponse] = await self.http_client.batch_requests(*requests)
461+
responses: List[web_poet.HttpResponse] = await self.http_client.batch_execute(*requests)
419462
related_product_ids = [
420463
id_
421464
for response in responses
@@ -452,12 +495,12 @@ The key takeaways for this example are:
452495
It only contains the HTTP Request information for now and isn't executed yet.
453496
This is useful for creating factory methods to help create requests without any
454497
download execution at all.
455-
* :class:`~.HttpClient` has a :meth:`~.HttpClient.batch_requests` method that
498+
* :class:`~.HttpClient` has a :meth:`~.HttpClient.batch_execute` method that
456499
can process a list of :class:`~.HttpRequest` instances asynchronously together.
457500
458501
.. tip::
459502
460-
The :meth:`~.HttpClient.batch_requests` method can accept different varieties
503+
The :meth:`~.HttpClient.batch_execute` method can accept different varieties
461504
of :class:`~.HttpRequest` that might not be related with one another. For
462505
example, it could be a mixture of ``GET`` and ``POST`` requests or even
463506
representing requests for various parts of the page altogether.
@@ -466,7 +509,7 @@ The key takeaways for this example are:
466509
of async execution which could be faster in certain cases `(assuming you're
467510
allowed to perform HTTP requests in parallel)`.
468511
469-
Nonetheless, you can still use the :meth:`~.HttpClient.batch_requests` method
512+
Nonetheless, you can still use the :meth:`~.HttpClient.batch_execute` method
470513
to execute a single :class:`~.HttpRequest` instance.
471514
472515
@@ -566,7 +609,7 @@ For this example, let's improve the code snippet from the previous subsection na
566609
]
567610
568611
try:
569-
responses: List[web_poet.HttpResponse] = await self.http_client.batch_requests(*requests)
612+
responses: List[web_poet.HttpResponse] = await self.http_client.batch_execute(*requests)
570613
except web_poet.exceptions.HttpRequestError:
571614
logger.warning(
572615
f"Unable to request for more related products for product ID: {item['product_id']}"
@@ -605,17 +648,17 @@ For this example, let's improve the code snippet from the previous subsection na
605648
def parse_related_product_ids(response_page) -> List[str]:
606649
return response_page.css("#main .related-products ::attr(product-id)").getall()
607650
608-
Handling exceptions using :meth:`~.HttpClient.batch_requests` remains largely the same.
651+
Handling exceptions using :meth:`~.HttpClient.batch_execute` remains largely the same.
609652
However, the main difference is that you might be wasting perfectly good responses just
610653
because a single request from the batch ruined it.
611654
612655
An alternative approach would be salvaging good responses altogether. For example, you've
613656
sent out 10 :class:`~.HttpRequest` and only 1 of them had an exception during processing.
614657
You can still get the data from 9 of the :class:`~.HttpResponse` by passing the parameter
615-
``return_exceptions=True`` to :meth:`~.HttpClient.batch_requests`.
658+
``return_exceptions=True`` to :meth:`~.HttpClient.batch_execute`.
616659
617660
This means that any exceptions raised during request execution are returned alongside any
618-
of the successful responses. The return type of :meth:`~.HttpClient.batch_requests` could
661+
of the successful responses. The return type of :meth:`~.HttpClient.batch_execute` could
619662
be a mixture of :class:`~.HttpResponse` and :class:`web_poet.exceptions.http.HttpRequestError`.
620663
621664
Here's an example:
@@ -625,7 +668,7 @@ Here's an example:
625668
# Revised code snippet from the to_item() method
626669
627670
responses: List[Union[web_poet.HttpResponse, web_poet.exceptions.HttpRequestError]] = (
628-
await self.http_client.batch_requests(*requests, return_exceptions=True)
671+
await self.http_client.batch_execute(*requests, return_exceptions=True)
629672
)
630673
631674
related_product_ids = []

docs/advanced/meta.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,12 +111,12 @@ Let's try an example wherein :class:`~.Meta` is able to control how
111111
self.create_next_page_request(page_num)
112112
for page_num in range(2, max_pages + 1)
113113
]
114-
responses = await http_client.batch_requests(*requests)
114+
responses = await http_client.batch_execute(*requests)
115115
return [
116116
url
117117
for response in responses
118118
for product_urls in self.parse_product_urls(response)
119-
for url in product_urls:
119+
for url in product_urls
120120
]
121121
122122
@staticmethod

tests/test_requests.py

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,15 +98,26 @@ async def test_http_client_keyword_enforcing(async_mock):
9898

9999

100100
@pytest.mark.asyncio
101-
async def test_http_client_batch_requests(async_mock):
101+
async def test_http_client_execute(async_mock):
102+
client = HttpClient(async_mock)
103+
104+
request = HttpRequest("url-1")
105+
response = await client.execute(request)
106+
107+
assert isinstance(response, HttpResponse)
108+
assert response.url == "url-1"
109+
110+
111+
@pytest.mark.asyncio
112+
async def test_http_client_batch_execute(async_mock):
102113
client = HttpClient(async_mock)
103114

104115
requests = [
105116
HttpRequest("url-1"),
106117
HttpRequest("url-get", method="GET"),
107118
HttpRequest("url-post", method="POST"),
108119
]
109-
responses = await client.batch_requests(*requests)
120+
responses = await client.batch_execute(*requests)
110121

111122
assert all([isinstance(response, HttpResponse) for response in responses])
112123

@@ -120,20 +131,20 @@ async def stub_request_downloader(*args, **kwargs):
120131
async def err():
121132
raise ValueError("test exception")
122133
return await err()
123-
client.request_downloader = stub_request_downloader
134+
client._request_downloader = stub_request_downloader
124135

125136
return client
126137

127138

128139
@pytest.mark.asyncio
129-
async def test_http_client_batch_requests_with_exception(client_that_errs):
140+
async def test_http_client_batch_execute_with_exception(client_that_errs):
130141

131142
requests = [
132143
HttpRequest("url-1"),
133144
HttpRequest("url-get", method="GET"),
134145
HttpRequest("url-post", method="POST"),
135146
]
136-
responses = await client_that_errs.batch_requests(*requests, return_exceptions=True)
147+
responses = await client_that_errs.batch_execute(*requests, return_exceptions=True)
137148

138149
assert len(responses) == 3
139150
assert isinstance(responses[0], Exception)
@@ -142,9 +153,9 @@ async def test_http_client_batch_requests_with_exception(client_that_errs):
142153

143154

144155
@pytest.mark.asyncio
145-
async def test_http_client_batch_requests_with_exception_raised(client_that_errs):
156+
async def test_http_client_batch_execute_with_exception_raised(client_that_errs):
146157
requests = [
147158
HttpRequest("url-1"),
148159
]
149160
with pytest.raises(ValueError):
150-
await client_that_errs.batch_requests(*requests)
161+
await client_that_errs.batch_execute(*requests)

web_poet/requests.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ class HttpClient:
8080
"""
8181

8282
def __init__(self, request_downloader: Callable = None):
83-
self.request_downloader = request_downloader or _perform_request
83+
self._request_downloader = request_downloader or _perform_request
8484

8585
async def request(
8686
self,
@@ -104,7 +104,7 @@ async def request(
104104
headers = headers or {}
105105
body = body or b""
106106
req = HttpRequest(url=url, method=method, headers=headers, body=body)
107-
return await self.request_downloader(req)
107+
return await self.execute(req)
108108

109109
async def get(self, url: str, *, headers: Optional[_Headers] = None) -> HttpResponse:
110110
"""Similar to :meth:`~.HttpClient.request` but peforming a ``GET``
@@ -124,10 +124,19 @@ async def post(
124124
"""
125125
return await self.request(url=url, method="POST", headers=headers, body=body)
126126

127-
async def batch_requests(
127+
async def execute(self, request: HttpRequest) -> HttpResponse:
128+
"""Accepts a single instance of :class:`~.HttpRequest` and executes it
129+
using the request implementation configured in the :class:`~.HttpClient`
130+
instance.
131+
132+
This returns a single :class:`~.HttpResponse`.
133+
"""
134+
return await self._request_downloader(request)
135+
136+
async def batch_execute(
128137
self, *requests: HttpRequest, return_exceptions: bool = False
129138
) -> List[Union[HttpResponse, Exception]]:
130-
"""Similar to :meth:`~.HttpClient.request` but accepts a collection of
139+
"""Similar to :meth:`~.HttpClient.execute` but accepts a collection of
131140
:class:`~.HttpRequest` instances that would be batch executed.
132141
133142
If any of the :class:`~.HttpRequest` raises an exception upon execution,
@@ -139,7 +148,7 @@ async def batch_requests(
139148
``True`` to the ``return_exceptions`` parameter.
140149
"""
141150

142-
coroutines = [self.request_downloader(r) for r in requests]
151+
coroutines = [self._request_downloader(r) for r in requests]
143152
responses = await asyncio.gather(
144153
*coroutines, return_exceptions=return_exceptions
145154
)

0 commit comments

Comments
 (0)