Skip to content

Commit

Permalink
Update Harmonize_Pensacola.Rmd
Browse files Browse the repository at this point in the history
  • Loading branch information
cristinamullin committed May 2, 2024
1 parent 81448a9 commit 601cd7a
Showing 1 changed file with 150 additions and 86 deletions.
236 changes: 150 additions & 86 deletions demos/Harmonize_Pensacola.Rmd
Original file line number Diff line number Diff line change
@@ -1,119 +1,172 @@
---
title: "harmonize-wq in R"
format: html
editor: visual
author: "Justin Bousquin, Cristina Mullin, Marc Weber"
date: '2022-08-31'
output: rmarkdown::html_vignette
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
fig_caption: yes
fig_height: 8
fig_width: 8
vignette: >
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{harmonize-wq in R}
%\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
editor_options:
chunk_output_type: console
markdown:
wrap: 72
---

```{r setup, include = FALSE}
# Set chunk options
library(knitr)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
echo = TRUE,
warning = FALSE,
message = FALSE
)
```

<br>

## Overview

Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats using the harmonize_wq package. US EPA’s Water Quality Portal (WQP) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using python or R. Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

* Identify differences in data units (including speciation and basis)
* Identify differences in sampling or analytic methods
* Resolve data errors using transparent assumptions
* Reduce data to the columns that are most commonly needed
* Transform data from long to wide format

Domain experts must decide what data meets their quality standards for data comparability and any thresholds for acceptance or rejection.

<br>

<br>
Standardize, clean, and wrangle Water Quality Portal data into more
analytic-ready formats using the harmonize_wq package. US EPA's Water
Quality Portal (WQP) aggregates water quality, biological, and physical
data provided by many organizations and has become an essential resource
with tools to query and retrieval data using python or R. Given the
variety of data and variety of data originators, using the data in
analysis often requires data cleaning to ensure it meets the required
quality standards and data wrangling to get it in a more analytic-ready
format. Recognizing the definition of analysis-ready varies depending on
the analysis, the harmonize_wq package is intended to be a flexible
water quality specific framework to help:

- Identify differences in data units (including speciation and basis)
- Identify differences in sampling or analytic methods
- Resolve data errors using transparent assumptions
- Reduce data to the columns that are most commonly needed
- Transform data from long to wide format

Domain experts must decide what data meets their quality standards for
data comparability and any thresholds for acceptance or rejection.

## Installation & Setup

#### Install the harmonize-wq package (Command Line)
#### Option 1: Install the harmonize-wq Package Using the Command Line

To install and set up the harmonize-wq package using the command line:

1. If needed, re-install [miniforge](https://github.com/conda-forge/miniforge). Once miniforge is installed. Go to your start menu and open the Miniforge Prompt.
2. At the Miniforge Prompt:
- conda create --name wq_harmonize
- activate wq_harmonize
- conda install geopandas pip dataretrieval pint
- may need to update conda
- conda update -n base -c conda-forge conda
- pip install harmonize-wq
- pip install git+https://github.com/USEPA/harmonize-wq.git (dev version)
1. If needed, re-install
[miniforge](https://github.com/conda-forge/miniforge). Once
miniforge is installed. Go to your start menu and open the Miniforge
Prompt.
2. At the Miniforge Prompt, run:
- conda create --name wq_harmonize
- activate wq_harmonize
- conda install geopandas pip dataretrieval pint
- may need to update conda
- conda update -n base -c conda-forge conda
- pip install harmonize-wq
- pip install git+<https://github.com/USEPA/harmonize-wq.git> (dev
version)

<br>
#### Option 2: Install the harmonize-wq Package Using R

#### Install the harmonize-wq package (R)
**Alternatively**, you may be able to set up your environment and import
the required Python packages using R.

**Alternatively**, you may be able to set up your environment and import the required Python packages using the block of R code below:
First, run the chunk below to install the reticulate package to use Python in R.

```{r, results = 'hide', eval=FALSE}
# If needed, install the reticulate package to use Python in R
```{r, results = 'hide'}
install.packages("reticulate")
library(reticulate)
```

# The reticulate package will automatically look for an installation of Conda
# However, you may specify the location if needed using options(reticulate.conda_binary = 'dir')
options(reticulate.conda_binary = '~/AppData/Local/miniforge3/Scripts/conda.exe')
Conda is required to use EPA's harmonize-wq package.

# Create a new Python environment called "wq-reticulate"
# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
conda_create("wq-reticulate")
There are multiple installers available for Conda
(see: <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>).

# Install the following packages to the newly created environment
conda_install("wq-reticulate", "geopandas")
conda_install("wq-reticulate", "pint")
conda_install("wq-reticulate", "dataretrieval")
One example installer is
[miniforge](https://github.com/conda-forge/miniforge). We use miniforge3 in this
example.

# Install the harmonize-wq package
# This only works with py_install() (pip), which defaults to virtualenvs
# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
py_install("harmonize-wq", pip = TRUE, envname = "wq-reticulate")
Once miniforge3 (or another installer of your choice) is installed, the
reticulate package will automatically look for the installation of Conda (conda.exe)
on your computer.

# To install the dev version of harmonize-wq from GitHub
# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
py_install("git+https://github.com/USEPA/harmonize-wq.git@new_release_0-3-8", pip = TRUE, envname = "wq-reticulate")
```{r, results = 'hide'}
# options(reticulate.conda_binary = 'dir')
```

# Specify the Python environment to be used
use_condaenv("wq_harmonize")
However, you may still need to specify the location. If needed, update the code chuck below to specify the location of conda.exe on your computer.

# Test that your Python environment is correctly set up
# Both imports should return "Module(package_name)"
import("harmonize_wq")
import("dataretrieval")
```{r, results = 'hide'}
# update the 'dir' in this chuck to specify the location of conda.exe on your computer
# Note that the environment name may need to include the full path (e.g. "C:/Users/USERNAME/AppData/Local/miniforge3/Scripts/conda.exe")
options(reticulate.conda_binary = "C:/Users/CMULLI01/AppData/Local/miniforge3/Scripts/conda.exe")
```

<br>
Next, update the code chunk below to create a new Python environment in the envs
folder on your computer called "wq_harmonize".

```{r, results = 'hide'}
# Note that the environment name may need to include the full path (e.g. "C:/Users/USERNAME/AppData/Local/miniforge3/envs/wq_harmonize")
reticulate::conda_create("C:/Users/CMULLI01/AppData/Local/miniforge3/envs/wq_harmonize")
```

#### Import required libraries
Install the following python and R packages to the newly created
Python environment called "wq_harmonize".

The full list of dependencies that should be installed to use the harmonize-wq package can be found in [`requirements.txt`](https://github.com/USEPA/harmonize-wq/blob/new_release_0-3-8/requirements.txt). **Note that `reticulate::repl_python()` must be called to execute these commands using the reticulate package in R.**
```{r, results = 'hide'}
reticulate::conda_install("wq_harmonize", "geopandas") # Python package
reticulate::conda_install("wq_harmonize", "pint") # Python package
reticulate::conda_install("wq_harmonize", "dataretrieval") # R package
```

Install EPA's harmonize-wq package.

```{r, results = 'hide'}
# Install the most recent release of the harmonize-wq package
# This only works with py_install() (pip = TRUE), which defaults to use virtualenvs
reticulate::py_install("harmonize-wq", pip = TRUE, envname = "wq_harmonize")
# Uncomment below to install the development version of harmonize-wq from GitHub instead (optional)
# py_install("git+https://github.com/USEPA/harmonize-wq.git@new_release_0-3-8", pip = TRUE, envname = "wq_harmonize")
```

Specify the Python environment to be used, "wq_harmonize", and test that your Python
environment is set up correctly.

```{r}
# Use reticulate to execute python commands
reticulate::repl_python()
# Specify environment to be used
reticulate::use_condaenv("wq_harmonize")
# Test set up is correct
# Both imports should return "Module(package_name)"
reticulate::import("harmonize_wq")
reticulate::import("dataretrieval")
```

```{python}
#### Import additional required libraries

The full list of dependencies that should be installed to use the
harmonize-wq package can be found in
[`requirements.txt`](https://github.com/USEPA/harmonize-wq/blob/new_release_0-3-8/requirements.txt).

```{python, results = 'hide'}
# Use these reticulate imports to test the modules are installed
import harmonize_wq
import dataretrieval
import os
import pandas
import geopandas
import dataretrieval.wqp as wqp
import pint
import mapclassify
from harmonize_wq import harmonize
from harmonize_wq import convert
from harmonize_wq import wrangle
Expand All @@ -122,24 +175,30 @@ from harmonize_wq import location
from harmonize_wq import visualize
```

<br>
## harmonize-wq Usage: FL Bays Example

<br>
The following example illustrates a typical harmonization process using
the harmonize-wq package on WQP data retrieved from Perdido and
Pensacola Bays, FL.

## Usage
**Note that `reticulate::repl_python()` must be called first to execute
these commands using the reticulate package in R.**

The following example illustrates a typical harmonization process using the harmonize-wq package on WQP data retrieved from Perdido and Pensacola Bays, FL.
```{r, results = 'hide'}
# Use reticulate to execute python commands
reticulate::repl_python()
```

First, determine an area of interest (AOI), build a query, and retrieve water temperature and Secchi disk depth data from WQP for the AOI using the dataretrieval package:
First, determine an area of interest (AOI), build a query, and retrieve
water temperature and Secchi disk depth data from the Water Quality Portal (WQP)
for the AOI using the dataretrieval package:

```{python, message=FALSE, warning=FALSE, error=FALSE}
```{python, error = F}
# File for area of interest (Pensacola and Perdido Bays, FL)
aoi_url = r'https://raw.githubusercontent.com/USEPA/harmonize-wq/main/harmonize_wq/tests/data/PPBays_NCCA.geojson'
# Build query and get WQP data with dataretrieval
query = {'characteristicName': ['Temperature, water',
'Depth, Secchi disk depth',
]}
query = {'characteristicName': ['Temperature, water', 'Depth, Secchi disk depth',]}
# Use harmonize-wq to wrangle
query['bBox'] = wrangle.get_bounding_box(aoi_url)
Expand All @@ -152,10 +211,14 @@ res_narrow, md_narrow = wqp.get_results(**query)
res_narrow
```

Next, harmonize and clean all results:
Next, harmonize and clean all results using the harmonize.harmonize_all,
clean.datetime, and clean.harmonize_depth functions.

```{python, message=FALSE, warning=FALSE, error=FALSE}
df_harmonized = harmonize.harmonize_all(res_narrow, errors='raise')
Enter a ? followed by the function name, for example ?harmonize.harmonize_all,
into the console for more details.

```{python, error = F}
df_harmonized = harmonize.harmonize_all(res_narrow, errors = 'raise')
df_harmonized
# Clean up the datetime and sample depth columns
Expand All @@ -164,9 +227,14 @@ df_cleaned = clean.harmonize_depth(df_cleaned)
df_cleaned
```

There are many columns in the data frame that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data, these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
There are many columns in the data frame that are characteristic
specific, that is they have different values for the same sample
depending on the characteristic. To ensure one result for each sample
after the transformation of the data, these columns must either be
split, generating a new column for each characteristic with values, or
removed from the table if not needed.

```{python, message=FALSE, warning=FALSE, error=FALSE}
```{python, error = F}
# Split the QA_flag column into multiple characteristic specific QA columns
df_full = wrangle.split_col(df_cleaned)
Expand All @@ -183,15 +251,11 @@ df_wide.head()

Finally, the cleaned and wrangled data may be visualized as a map:

```{python, message=FALSE, warning=FALSE, error=FALSE}
```{python, error = F}
# Get harmonized stations clipped to the AOI
stations_gdf, stations, site_md = location.get_harmonized_stations(query, aoi=aoi_url)
# Map average temperature results at each station
gdf_temperature = visualize.map_measure(df_wide, stations_gdf, 'Temperature')
gdf_temperature.plot(column='mean', cmap='OrRd', legend=True)
gdf_temperature.plot(column = 'mean', cmap = 'OrRd', legend = True)
```

<br>

<br>

0 comments on commit 601cd7a

Please sign in to comment.