-
Notifications
You must be signed in to change notification settings - Fork 10
[WIP] [Discussion] Use response url for original request #21
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 3 3
Lines 206 207 +1
=========================================
+ Hits 206 207 +1
Continue to review full report at Codecov.
|
In which case(s) would this URL be different from the original one? Wouldn't we want to keep the actual URL of the original request? |
Redirects? (I really have no idea)
I think this is a valid point. Currently I imagine the response.url has the final URL, so you already have a way to find that URL. I imagine this change is to fix the perceived inconsistency that request.url may not match response.url; or is there more to it? |
Yes, It require for cases when request has been redirected. In this case fetch API will return resolved url after all redirects. But scrapy is using |
I was under the impression that redirects would not be followed directly, leaving them to be handled by the user agent instead (in this case, to be processed by the redirect middleware). Even if that's not the case here, this still looks a little bit counterintuitive to me, since vanilla Scrapy gives you the actual last request in the |
Crawlera fetch API has auto-redirects. Later in log: So, So I'm still thinking about this fix, because it requre also some changes on smart proxy side |
Maybe it would be better to have some middleware log a message about a redirect happening within Crawlera Fetch by detecting that inconsistency, but otherwise let things be, i.e. do not change the request URL. |
I agree with @Gallaecio. It makes sense to log the final URL to which the page is redirected rather than replace request.url which hold the original URL. Users who need to retrieve this final URL should be able to do so through the meta which contains the original response right? |
Ok, thank you for feedback guys. |
Make request.url consist with final url