TODO: description
Use the file
- Run
yarn installin the scraper repository root - Edit
run/urls.ymlto specify urls to scrape for each provider. Naming a provider starting with a.will cause all its urls to be ignored - Run
yarn scrapeto start scraping! - Results are gonna be output in
./run/{provider}-{date}-batch{number}.json
Alternative you can run using the launch option in VSCode (And it will attach the debuger!)
In order to run the stack as headless you'll need to set up a .env like the following:
# .env
HEADLESS=true- Run
yarn generate {name} - Code the scraper in the generated file at
src/providers/{name}/scraper.ts - Register your scraper by adding
export * as {name} from './{name}'atsrc/providers/index.ts - Run your scraper! :)
The Scraper uses the following environment variables:
HEADLESS: Wether to launch chromeium in headless mode or headful (with GUI).falseby default