This plugin provides a way to scrape github data from the REST api and ingest them as extractors in nodestream pipelines.
- Download and install Neo4j: https://neo4j.com/docs/desktop-manual/current/installation/download-installation/
- Create and start database (version 5.7.0: https://neo4j.com/docs/desktop-manual/current/operations/create-dbms/
- Install APOC: https://neo4j.com/docs/apoc/5/installation/
- Create and github access codes: https://docs.github.com/en/[email protected]/apps/creating-github-apps/authenticating-with-a-github-app/generating-a-user-access-token-for-a-github-app
NOTE: These values will be used in your
.env
- Install python3: https://www.python.org/downloads/
- Install poetry: https://python-poetry.org/docs/#installation
- Install nodestream: https://nodestream-proj.github.io/nodestream/0.5/docs/tutorial/
- Generate a new nodestream project
- Add
nodestream-github
to your project dependencies in your nodestream projects pyproject.toml file. - Install necessary dependencies:
poetry install
- In
nodestream.yaml
add the following:
plugins:
- name: github
config:
github_hostname: github.example.com
auth_token: !env GITHUB_ACCESS_TOKEN
user_agent: skip-jbristow-test
per_page: 100
collecting:
all_public: True
rate_limit_per_minute: 225
targets:
- my-db:
pipelines:
- name: github_repos
- name: github_teams
targets:
database: neo4j
uri: bolt://localhost:7687
username: neo4j
password: neo4j123
- Set environment variables in your terminal session for:
GITHUB_ACCESS_TOKEN
. - Verify nodestream has loaded the pipelines:
poetry run nodestream show
- Use nodestream to run the pipelines:
poetry run nodestream run <pipeline-name> --target my-db
- Install make (ie.
brew install make
) - Run
make run
- Jon Bristow
- Zach Probst