Dynamic Scraper maintain session across requests #18

tedjt · 2014-10-17T18:18:05Z

It seems like the dynamic scraper works by using the 'request' library to load content and then loads it in phantom with the setContent command.

https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#webpage-setContent

It seems to do navigation based scraping that depends on sessions we would have to manually parse out the appropriate data from the response object and pass it along in subsequent requests.

It would be nice to have a persistent session mode where the dynamic scraper would work at the level of a phantom WebPage instance (like a browser tab) and load/navigate across pages using js actions (like clicks). This is important for scraping ajax based pages. Tools like CasperJs work well for this, and the use cases for the Dynamic scraper seem a little limited without it.

ruipgil · 2014-10-18T13:20:00Z

Scraping is extracting information from the page, and scraperjs does it very good and very easily that's why there's no navigation system.
The purpose of the dynamic scraper is to get content that is loaded dynamically, like an angular app, or even to access js variables.
However there is support for things like sessions using request.

vdraceil · 2015-04-09T16:03:59Z

+1 to the issue/suggestion.
It will be really great if we can have a persistent mode as CasperJS does.

Also, as a first time user, I didn't know that the DynamicScraper doesn't support sessions until I tried and failed. May be, it would be useful to people if we put that explicitly on the README.

chmac · 2015-09-25T17:04:40Z

Is this now covered by the scraper factory functionality? Maybe this issue could be closed if so...

ruipgil · 2015-09-25T17:48:25Z

This is not covered by scraperjs in any way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Scraper maintain session across requests #18

Dynamic Scraper maintain session across requests #18

tedjt commented Oct 17, 2014

ruipgil commented Oct 18, 2014

vdraceil commented Apr 9, 2015

chmac commented Sep 25, 2015

ruipgil commented Sep 25, 2015

Dynamic Scraper maintain session across requests #18

Dynamic Scraper maintain session across requests #18

Comments

tedjt commented Oct 17, 2014

ruipgil commented Oct 18, 2014

vdraceil commented Apr 9, 2015

chmac commented Sep 25, 2015

ruipgil commented Sep 25, 2015