-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Scraper maintain session across requests #18
Comments
Scraping is extracting information from the page, and scraperjs does it very good and very easily that's why there's no navigation system. |
+1 to the issue/suggestion. Also, as a first time user, I didn't know that the DynamicScraper doesn't support sessions until I tried and failed. May be, it would be useful to people if we put that explicitly on the README. |
Is this now covered by the scraper factory functionality? Maybe this issue could be closed if so... |
This is not covered by scraperjs in any way. |
It seems like the dynamic scraper works by using the 'request' library to load content and then loads it in phantom with the setContent command.
https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#webpage-setContent
It seems to do navigation based scraping that depends on sessions we would have to manually parse out the appropriate data from the response object and pass it along in subsequent requests.
It would be nice to have a persistent session mode where the dynamic scraper would work at the level of a phantom WebPage instance (like a browser tab) and load/navigate across pages using js actions (like clicks). This is important for scraping ajax based pages. Tools like CasperJs work well for this, and the use cases for the Dynamic scraper seem a little limited without it.
The text was updated successfully, but these errors were encountered: