-
Notifications
You must be signed in to change notification settings - Fork 10
Retry #18
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18 +/- ##
===========================================
- Coverage 100.00% 96.89% -3.11%
===========================================
Files 3 4 +1
Lines 206 290 +84
===========================================
+ Hits 206 281 +75
- Misses 0 9 +9
Continue to review full report at Codecov.
|
Hi @elacuesta, thank you for starting the implementation. |
@@ -76,6 +87,21 @@ Crawlera middleware won't be able to handle them. | |||
Default values to be sent to the Crawlera Fetch API. For instance, set to `{"device": "mobile"}` | |||
to render all requests with a mobile profile. | |||
|
|||
* `CRAWLERA_FETCH_SHOULD_RETRY` (type `Optional[Callable, str]`, default `None`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use True as default because most of the cases you want to retry. Every API can fail, uncork can fail, spider should retry.
Requirement of 2.5 might be limiting for some users, we don't support this stack in Scrapy Cloud in Zyte at the moment so this would have to wait for release of stack and would force all uncork users to migrate to 2.5. Is there some way to make it compatible with all scrapy not just 2.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pawelmhm:
I would use True as default because most of the cases you want to retry. Every API can fail, uncork can fail, spider should retry.
CRAWLERA_FETCH_SHOULD_RETRY
receives a callable (or the name of a callable within the spider) to be used to determine if a request should be retried. Perhaps it could be named differently, I'm open to suggestions. CRAWLERA_FETCH_ON_ERROR
is the setting to determine what to do with errors. I made OnError.Warn
the default, just to keep backward-compatibility, but perhaps OnError.Retry
can be a better default.
Requirement of 2.5 might be limiting for some users, we don't support this stack in Scrapy Cloud in Zyte at the moment so this would have to wait for release of stack and would force all uncork users to migrate to 2.5. Is there some way to make it compatible with all scrapy not just 2.5?
AFAIK, you should be able to use 2.5 with a previous stack, by updating the requirements file. The 2.5 requirement is because of scrapy/scrapy#4902. I wanted to avoid code duplication but I guess I can just use the upstream function if available and fall back to copying the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explanation. Is there any scenario where you don't need retry? Because in my experience it is very rare not to want retry of internal server errors, timeouts, or bans?
AFAIK, you should be able to use 2.5 with a previous stack, by updating the requirements file. The 2.5 requirement is because of
I think in some projects people are using old Scrapy versions and they will have to update Scrapy to most recent versions, which will be extra work, extra effort for them. If they are stuck on some old versions like 1.6 or 1.7 updating to 2.5 might not be straigtforward.
But my main point after thinking about this is that why do we actually need custom retry? Why can't we handle this in retry middleware by default? like all other HTTP error codes? There are 2 use cases mentioned by Taras in issue on GH, but I'm not convinced about them and after talking with developer of Fetch API I hear they plan to change behavior to return 500 anbd 503 HTTP status codes instead of 200 HTTP status code with error code in response body.
@akshayphilar do you have any insights on this? Mostly about "change behavior to return 500 and 503 HTTP status codes instead of 200 HTTP status code with error code in response body" from #18 (comment). Bottom line is: should we provide a client-side retrying mechanism or should we rely on the server to return codes that will be retried by default by Scrapy? |
Closes #12
Tasks:
CRAWLERA_FETCH_RAISE_ON_ERROR
in favour of a more generalCRAWLERA_FETCH_ON_ERROR
setting (with possible valueswarn(ing)
,raise
,retry
)