Project to analyse data on CO2 levels, using data files from Aranet4 devices or manually recorded data.
Link to interactive infographic - Map and Scatter Chart
Link to interactive infographic - Scatter Chart
Link to interactive infographic - Violin Plot
Link to interactive infographic - Line Chart
Link to interactive infographic - Map and Line Chart
Link to interactive infographic - Small Multiples
Link to interactive infographic - % Rebreathed Map and Scatter Chart
Dataviz using data from CO2 monitors to analyse CO2 levels. The first file format supported is that output by the Aranet4 device, in CSV file format. Multiple CSV files can be collected and aggregated together for analysis across monitors.
An Excel template is provided to ease the manual collection of data, where that is preferred.
Calendar files can also be provided in .ics (iCalendar) format, and a Category and Location can be extracted from those and used in the analysis.
This project presents a series of interactive dashboard pages to help analyse and understand the data collected on CO2 levels. The page navigation control is at the centre-bottom, e.g. < 1 of 4 >. Click the centre of that control for a menu of the pages.
On every page, slicer controls allow the selection of a range of dates. There is a Play control immediately below to let you "play" through the selected dates in an animation. There are also slicers to filter by the Monitor (device ID), Day of the week and Hour of the day. In combination, many different analyses are possible. Each visual element e.g. a point on a line chart can be selected, then it will act as a "cross-filter" to filter every other visual on that page.
On most pages, a detailed table at the bottom lists the individual CO2 readings within the scope of the current slicer settings. The table can be resorted by clicking any column header, and the Focus mode button (which appears at the top-right corner of the table on hover) will expand the table to fill the browser window.
The page: Map and Scatter Chart presents a map showing aggregated CO2 levels alongside a scatter chart showing a dot for every CO2 level measurement. The map only shows data associated with a Location (via the Calendar data), and where Bing Maps can resolve that Location (text) into map coordinates. The size of the map bubbles represent the number of CO2 readings at that Location. The color of the map bubbles are determined by the CO2 Yellow Range slicer (see below for details), for the median CO2 level recorded at that Location. See below for the description of the Scatter chart.
The page: Scatter Chart presents a scatter chart showing a dot for every CO2 level measurement. When there are over 10,000 readings, an automatic sampling algorithm kicks in. It aims to mostly hide the points that overlap with others. The X-Axis is the time of day, expressed in decimal fractions of hours. The actual hour and minute shown is available in the tooltip for any point. The Y Axis is the CO2 level. A CO2 Yellow Range slicer at the right controls which levels of CO2 are coloured Red-Yellow-Green.
The page: Violin Plot presents a Violin Plot visual that elegantly aggregates data to show the shape of the data readings, combined with a box plot to show the median, mean and Inter-Quartile Range showing a dot for every CO2 level measurement. When there are over 30,000 readings, an automatic sampling algorithm kicks in using the Epanechnikov Kernel. The X-Axis is variable, driven by the Legend Field slicer selection. The Y Axis is the CO2 level.
The page: Line Chart presents a Line chart visual that aggregates data by Time of day. Time here is expressed as a decimal value - the whole numbers are the hour of day (24-hour format) and the decimals are fractions of a day. This format gives the best results with the standard Power BI line chart visual. A tool-tip show the time in the conventional hh:mm format. The Legend is variable, driven by the Legend Field slicer selection. The Y Axis is the CO2 level.
The page: Map and Line Chart presents a map showing aggregated CO2 levels alongside a line chart that aggregates data by Time of day. See the descriptions above for Map and Scatter Chart for the map and Line Chart.
The page: Small Multiples presents a grid of Line charts that aggregates data by Time of day on the X-Axis. The split of data among the grid cells is variable, driven by the Legend Field slicer selection. The Y Axis is the CO2 level, standardised across all the grid cells. If more than 6 cells fill the grid, you can scroll down to view them all.
The page: % Map and Scatter Chart has a similar layout and function to the page described above: Map and Scatter Chart. But instead of CO2 levels it presents the data as a % Rebreathed - what percentage of the air you are breathing has been passed through someones lungs.
The basis for that calculation is the paper Risk of indoor airborne infection transmission estimated from carbon dioxide concentration - S. N. Rudnick, D. K. Milton.
David Elfstron (P.Eng, Energy management, measurement & verification) presented that paper's key calculation in an accessible way on this Twitter thread, which includes a Google Sheet implementation of the formula. I have tried to follow that method, including adding controls to adjust the Outdoor Air CO2 ppm and Volume Fraction of CO2 added by breathing from the suggested default values.
FWIW I am a big fan of this metric, as I think it is more concrete (visceral, even) for the general public to grasp, compared to "parts per million" of a gas that is always present to some degree.
There is a hidden page: Key Influencers using the Key Influencers visual, which is a very powerful no-code, AI-powered analysis engine for understanding the relationships within the data. After every interactive filter you choose, the engine re-runs it's analysis to explore which factors are the Key Influencers for a variable - in this project the CO2 level. That visual is not supported by this publishing method, but you can install the free Power BI Desktop tool, download the .PBIX file from this project and open it to explore that page.
The data visualisation engine is Power BI, featuring the Custom Visuals:
Sample datasets have been shared:
- Roaming CAIS was provided by an anonymous user
Any Aranet4 user can connect their monitor device to a smartphone using the Aranet app and export their measurements.
An Excel template file is available to allow those who wish to gather their data manually to provide it in an acceptable format.
If you would like to add your data to the public collection, please submit it by creating an Issue for this project using the template: Public Data file submission.
If you would prefer to process your data offline, just clone this project to your local drive, load your files into the C:\Dev\co2-levels\co2-levels-data-files\aranet4 folder, then open the PBIX file using Power BI Desktop and hit the Refresh button.
The start of each file name (before the first underscore character "_") will be used as the "Monitor ID". Data for multiple Monitors must be presented in separate files.
The Ararnet4 file format is currently the only one supported for this project, but it should be relatively easy to integrate other formats. If you would like to propose a new format, please create an Issue for this project (using the blank issue template) and attach as many sample files as possible.
This project has been designed to use the Power BI date/time conversion feature, and the PBIX file is set to expect dates in Australian regional format: dd/mm/yyyy. You can change this using Power BI Desktop by going to File / Options / CURRENT FILE / Regional Settings and choosing your Locale.
Calendar data can also be integrated in .ics (iCalendar) format. This can be produced from most Calendar tools, e.g. using Outlook (for Windows), use the File / Save Calendar menu option. Outlook has limited filters when you produce an iCalendar (.ics) file, so the easiest method may be to create a separate Calendar (within Outlook) for your CO2 entries.
An Excel template file is available to allow those who wish to gather their data manually to provide it in an acceptable format.
The start of each file name (before the first underscore character "_") will be used as the "Monitor ID". Data for multiple Monitors must be presented in separate files.
Calendar entries will be filtered to those where the subject starts with "CO2:". The remaining subject text will be passed through as the "Category" field, which can be used as a filter or shown on various visuals.
Calendar entries with a Location will be passed through as the "Location" field. This can be used as a filter, shown on various visuals, and used on map visuals to plot the locations on a global map. This uses Bing's Map feature to "geocode" the coordinates for each Location on the fly.
Each Calendar entry has a start and end date and time. The Power BI logic works through the CO2 data records and looks up the relevant entry (by Date - Time) in the Calendar for that file folder and Monitor ID. If overlapping Calendar entries are provided, one is chosen at random. The assigned Calendar entry determines the "Category" and "Location" fields relevant to that CO2 data record.
Recurring Calendar entries are not supported.
As well as including a broader range of file formats, I have imagined several directions this project could go in, including:
- Standardised set of Categories - validate the Category data provided against a standardised reference list (eg an Excel table).
- Cater for recurring Calendar entries - these appear as a single entry with a string that indicates the recurrence pattern. Find and implement a method to explode those into the detailed entries.
- Automated pipeline for data updates.
Contributions, issues, feature requests and sponsorship are all welcome!
Give a ⭐️ if you like this project!