Skip to content
This repository has been archived by the owner on May 19, 2020. It is now read-only.

Example of how to integrate Scrapy with Chrome Debugging Protocol [very alpha stage]

License

Notifications You must be signed in to change notification settings

redapple/scrapy-chromedebugproto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-chromedebugproto

Example of how to integrate Scrapy with Chrome Debugging Protocol

WARNING: highly toxic code!! Not production-ready, not at all!! You've been warned.

Getting started

Get a recent Chrome, with headless mode if you can

Run with for example

$ google-chrome-unstable --disable-gpu --headless --remote-debugging-port=9223

Install Python dependencies

  • scrapy
  • treq
  • twisted
  • autobahn

Add the dowloader middleware

DOWNLOADER_MIDDLEWARES = {
    'middleware.HeadlesschromeDownloaderMiddleware': 543,
}

Todo

  • Handle non-HTTP 200 responses
  • switch debug logs on/off
  • configurable debugger URL
  • a proper state machine
  • make a Python package out of it
  • check how load and requests concurrency is handled
  • add tests
  • ...

Inspiration

About

Example of how to integrate Scrapy with Chrome Debugging Protocol [very alpha stage]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages