Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkX errors not retried #24

Open
lopuhin opened this issue Apr 13, 2016 · 5 comments
Open

NetworkX errors not retried #24

lopuhin opened this issue Apr 13, 2016 · 5 comments
Labels

Comments

@lopuhin
Copy link
Contributor

lopuhin commented Apr 13, 2016

I think this errors are not retried

2016-04-12 21:30:49 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'description': 'Error happened while executing Lua script', 'info': {'message': 'Lua
 error: [string "function get_arg(arg, default)..."]:52: network5', 'error': 'network5', 'type': 'LUA_ERROR', 'source': '[string "function get_arg(arg, default)..."]', 'line_numb
er': 52}, 'type': 'ScriptError'}
2016-04-12 21:29:31 [scrapy] DEBUG: Crawled (400) <POST http://XXX:8050/execute> (referer: None)
2016-04-12 21:29:31 [scrapy] DEBUG: Ignoring response <400 XXX/>: HTTP status code is not handled or not allowed
@lopuhin lopuhin added the bug label Apr 13, 2016
@kmike
Copy link
Contributor

kmike commented Apr 13, 2016

It makes sense to add this feature to scrapy-splash (handle network.. error codes in addition to http.. codes when http_status_from_error_code is True). But I'm not sure what should we set response status code to in these cases - there was no response in first place.

@kmike
Copy link
Contributor

kmike commented Apr 13, 2016

network... status codes: http://doc.qt.io/qt-5/qnetworkreply.html#NetworkError-enum

@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 13, 2016

I think at least for network5 this could mean that our splash script timed out?

@kmike
Copy link
Contributor

kmike commented Apr 13, 2016

It means resource_timeout was applied for the first request, and it timed out; this is a bit different from regular Splash timeouts. But yeah, it makes sense to handle network5 errors as 504 HTTP errors.

@kmike
Copy link
Contributor

kmike commented Apr 13, 2016

It also could make sense to apply a larger timeout for the first request (see Example 6 here: http://splash.readthedocs.org/en/stable/scripting-ref.html#splash-on-request) - what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants