-
Notifications
You must be signed in to change notification settings - Fork 10
Add retry mechanism into middleware #12
Comments
Just opened scrapy/scrapy#4993 to make this easier. |
Hi @elacuesta thanks for taking care of the request.
What do you think about adding something like existing approach
and implement this CrawleraRetryException separately. |
Hey @PyExplorer! My idea for subclassing the built-in middleware is actually to take advantage of the private |
Hey @elacuesta I see that scrapy/scrapy#4902 is merged 🎉 Would be good to leverage that and implement the retry functionality in the middleware since there are growing use cases that warrant it. |
Indeed @akshayphilar, my plan is to work on this next week 👍 |
@PyExplorer why do you need to retry the original url? I would think that if request to uncork API fails we should retry request to uncork API, and not to target website. |
re:
I see this as a bug in Uncork API. It should not return 200 if something fails. If there is a failed response, uncork did not bring good results it should return 503 service unavailable, this is what some other products from Zyte are doing e.g. Smart Proxy Manager. This way we could handle Fetch API 5** codes with default built in Scrapy middleware and there will be no need for separate PR by @elacuesta. And retrying 5** codes should be a default behaviour of middleware. Raising errors is not a good default. Most users will want to get retried response in case of Uncork failure not error in logs. |
Hi guys,
I recently faced the case when several retry requests for fetch.crawlera.com helped the spider to work well. As I got from the discussion here https://zytegroup.slack.com/archives/C014HA686ES/p1612975265044000 uncork does 3 retries but not for all failures.
I've implemented this as a temporary fix with retrying requests right in the spider. We could do this customer retry middleware sure, but we will need to add this to every spider/project.
To make things simpler - is it possible to add this right into the CrawleraFetchMiddleware and add meta parameter for retry reasons/retry times along with the existing "skip" parameter?
The reasons for failed responses that I've faced
"crawlera_status":"fail"
"crawlera_status":"ban"
"crawlera_error":"timeout"
Thanks.
The text was updated successfully, but these errors were encountered: