Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Scraper maintain session across requests #18

Open
tedjt opened this issue Oct 17, 2014 · 4 comments
Open

Dynamic Scraper maintain session across requests #18

tedjt opened this issue Oct 17, 2014 · 4 comments

Comments

@tedjt
Copy link

tedjt commented Oct 17, 2014

It seems like the dynamic scraper works by using the 'request' library to load content and then loads it in phantom with the setContent command.

https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#webpage-setContent

It seems to do navigation based scraping that depends on sessions we would have to manually parse out the appropriate data from the response object and pass it along in subsequent requests.

It would be nice to have a persistent session mode where the dynamic scraper would work at the level of a phantom WebPage instance (like a browser tab) and load/navigate across pages using js actions (like clicks). This is important for scraping ajax based pages. Tools like CasperJs work well for this, and the use cases for the Dynamic scraper seem a little limited without it.

@ruipgil
Copy link
Owner

ruipgil commented Oct 18, 2014

Scraping is extracting information from the page, and scraperjs does it very good and very easily that's why there's no navigation system.
The purpose of the dynamic scraper is to get content that is loaded dynamically, like an angular app, or even to access js variables.
However there is support for things like sessions using request.

@vdraceil
Copy link
Contributor

vdraceil commented Apr 9, 2015

+1 to the issue/suggestion.
It will be really great if we can have a persistent mode as CasperJS does.

Also, as a first time user, I didn't know that the DynamicScraper doesn't support sessions until I tried and failed. May be, it would be useful to people if we put that explicitly on the README.

@chmac
Copy link
Contributor

chmac commented Sep 25, 2015

Is this now covered by the scraper factory functionality? Maybe this issue could be closed if so...

@ruipgil
Copy link
Owner

ruipgil commented Sep 25, 2015

This is not covered by scraperjs in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants