Skip to content
Hila Calderon edited this page Oct 24, 2020 · 3 revisions

Web scrapping Repository

  1. https://github.com/hicala/scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Overview

    Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

    Check the Scrapy homepage at https://scrapy.org for more information, including a list of features.

  2. https://github.com/hicala/autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Overview

    This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages.

  3. https://github.com/hicala/scraping-workshop

    [Spanish] Scraping workshop: documentación y scripts

    Overview

    Taller de extracción automatizada de datos de páginas web

    Web scraping es una técnica que emplea diferentes tecnologías para extraer datos o información de una página web. Se usa para recoger datos sin estructura y convertirlos en datos estructurados para posteriormente ser tratados en bases de datos u hojas de cálculo. El taller es una aproximación práctica al scraping con el objetivo de permitir a los asistentes el tratamiento de información útil para sus propios proyectos.