Skip to content

demokratie-live/scapacra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scapacra

Introduction

Scapactra (scraper, parser and crawler) is a framework to extract data from different data sources. The idea for scapactra bases on the ETL (extract, transform and load) process (ETL) and defines an modular design pattern providing a basic ETL workflow.

The framework is structured into three basic modules.

  1. Parser: The parser extracts the data from a defined document.
  2. Browser: The browser navigates through a structure and retrieves the desired fragments for the parser.
  3. Scraper: A scraper executes the browsers an parsers and providing their results over an centralized interface.

Parser

Parser

Browser

Browser

Scraper

Scraper

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published