Skip to content

matiskay/manolo_scraper

 
 

Repository files navigation

Build Status

codecov.io Code Issues

All spiders go here

Spiders are based on Scrapy.

Configuration

Create a file config.yml with the following info:

    CRAWLERA_USER: abc
    CRAWLERA_PASS: abc
    drivername: postgres
    username: postgres
    host: localhost
    port: 5432
    password: pass
    database: manolo
    api_key: scrapinghub's api key
    sh_project: scrapinghub's project
    scraping_past_number_of_days: 14
    
    # spiders that are banned when working from scrapinghub.com
    banned_spiders:
      - inpe

The database credentials are needed so that the spider will upload data to the production database.

List of Entities

Run this way

scrapy crawl SPIDER_NAME -a date_start=DATE_ISO_FORMAT -a date_end=DATE_ISO_FORMAT

About

Scraper de registro de visitas online. Usa Scrapy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 81.1%
  • Python 18.8%
  • Other 0.1%