This app was created for the Airbyte + Motherduck Hackathon. It uses Open Government Data (OGD) of the city of Zurich, Switzerland: Open Data Zurich.
It displays things to do, where to sleep, eat and drink and what to see in and around Zurich, Switzerland. In total there are 11 different categories to explore.
The app ingests open API data from zurich tourism using Airbyte, stores the data on Motherduck, transforms it on MD with dbt and serves it in streamlit.
https://zurichlocal.streamlit.app/
- Data is loaded from the City of Zurich Open Data API (11 API endpoints, one per category) into the "raw" schema on Motherduck. The Airbyte API Connection was built in the Airbyte UI and exported to YAML. Mothderduck is accessed using the new Airbyte Motherduck connector. The ingestion is run through PyAirbyte.
- The data is transformed using DBT and DuckDB. Several custom DBT macros process the raw data. The final marts schema combines the processed sources into a star schema.
- Data from the marts schema is loaded into a Streamlit dashboard using duckdb and python and displayed on a folium map.
Streamlit is definitely not the best choice for this type of app (map, list and detail views), but it is a quick way to display the data, which was the main part of this challenge.
The stack and architecture were chosen to reflect a classic data engineering workflow, even though the amount data used in the app is quite small.
The used Motherduck DB:
ATTACH 'md:_share/zuerich_db/8bc853bd-f2a3-4856-897b-a65e423ccb81'
- Install python 3.11, e.g. using pyenv
- Install poetry
- Install Docker (needed for Airbyte Connector)
In the project dir run
$ poetry install --no-root
-
Create a Motherduck account and database. Copy your access token from the Motherduck settings.
-
Create a Database called "zuerich_db" on Motherduck
-
Create a .env file in the project root and set MD_ACCESS_TOKEN=<your token>
-
Run the Airbyte import to your Motherduck instance:
$ python src/ingest/ingest_data.py
- Run the DBT transformations on Motherduck
$ python src/transform/dbt_runner.py
- Run the streamlit dashboard
$ python -m streamlit run src/dashboard/app.py
- Testing - Test data for correctness and completeness, test pipelines, test dashboard.
- Orchestration - e.g. use Dagster to schedule and run the data ingesition and transformation pipelines.
- CI/CD - use github actions to build container images of the data pipeline steps.