The Compost Research & Education Foundation (CREF) researches the disintegration of compostable foodware and packaging to find correlations between different composting methodologies and the rate of disintegration. Through the Compostable Field Testing Program, facilities submit their composting results and CREF analyzes the data to find best composting practices. Facilities submit data in varying formats, so the DSI will help CREF create a database well-suited to the kinds of statistical analysis performed on composting data. CREF will then standardize their data collection process so participating facilities adhere to best practices in both running experiments and in data collection. In the future, CREF will make the data and analysis from their partner facilities available on a public dashboard.
The DSI will be extending a data pipeline to format data from new experiments into a consistent format and creating visualizations showing disintegration rates for different materials and composting methodology. We will also create a process for importing new trial data that CREF's partner facilities will use in future trials, and start building the infrastructure for a public-facing dashboard of data from composting trials.
The data pipeline for this project does the following standardizes data from multiple facilities for display on a dashboard displaying decomposition rates of different compostable plastics as well as operating conditions of the associated facilities.
Note: The pipeline was set up to handle multiple disparate files with varied input formats. Future data will come in a standardized format. The pipeline is left as one script for ease of iteration and refactoring later when the new data format is known.
The pipeline runs in Docker. If you use VS Code, this is set up to run in a dev container, so build the container the way you normally would. Otherwise, just build the Docker image from the Dockerfile
in the root of the directory.
Download the following files from the DSI Google Drive in the Results Data for DSI - Raw uploads and save them to data/
:
- CFTP Anonymized Data Compilation Overview - For Sharing
- Donated Data 2023 - Compiled Facility Conditions for DSI
- Donated Data 2023 - Compiled Field Results for DSI
- CASP004-01 - Results Pre-Processed for Analysis from PDF Tables
- Compiled Field Results - CFTP Gathered Data
- CFTP Test Item Inventory with Dimensions - All Trials.xlsx'
- old_items.json
- Item IDS for CASP004 CASP003.xlsx
- CFTP_DisintegrationDataInput_Template_sept92024.csv
These files are all read directly in scripts/pipeline-template.py
To run the pipeline:
python scripts/pipeline-template.py
Cleaned data files will be output in data/
. To update the files displayed on the dashboard, follow the instructions in Updating the Dashboard Data
This is a Next.js project.
To run the dashboard locally, do not use the dev container!
Install packages:
npm install
The dashboard expects a .env.local
file in dashboard/
with a base64-encoded Google service account JSON (with permissions to access Cloud Storage buckets). This can be found in the UChicago Organization, DSI Folder, compostable project on GCP.
DATA_SOURCE=google
GOOGLE_APPLICATION_CREDENTIALS_BASE64=<base64-encoded-service-account.json>
To run the development server go into the /dashboard
directory and then install the necessary packages using npm install
.
Once the packages install you can run the development server using the following command:
npm run dev
Open http://localhost:3000 with your browser to see the result.
The dashboard is deployed via Vercel and is hosted on CFTP's site in an iframe.
Any update to the main
branch of this repo will update the production deployment of the dashboard.
If you rerun the pipeline, you need to update data files in Google Cloud Storage.
The dashboard pulls data from Google Cloud Storage via an API. Upload the following files to the root of the cftp_data
storage bucket in the compostable
project in the DSI account:
all_trials_processed.csv
operating_conditions_avg.csv
operating_conditions_full.csv
There are two dashboards. The dashboard located in page.js
is the default one that is displayed on the CFTP site. There is also a proof of concept operating condition dashboard available at /operating-conditions
The dashboard loads via an API call in lib/data.js
. Data is managed in the same file. Menu options are fetched in page.js
when the dashboard first loads.
The dashboard consists of a Plotly dash and various filters.
The main dashboard lives in components/Dashboard.js
and the controls are in components/DashboardControls.js
.
The operating conditions dash is in one single component: componenents/OperatingConditionsDashboard.js
The data for this project is sensitive, so it is accessed and aggregated via an API. There are endpoints for the trial data (app/api/data/
), the options for populating the filter menus (app/api/options
), and for the the operating conditions dash (app/api/operating-conditions
).