Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch behind main #59

Merged
merged 2 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
# harmonize-wq
Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats

US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval). Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonixe_wq package is intended to be a flexible water quality specific framework to help:
US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval).
Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format.
Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

- Identify differences in data units (including speciation and basis)
- Identify differences in sampling or analytic methods
- Resolve data errors using transparent assumptions
Expand Down Expand Up @@ -73,7 +76,8 @@ df_cleaned
```

### Transform results from long to wide format
There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.

```python
from harmonize_wq import wrangle
Expand Down Expand Up @@ -108,8 +112,11 @@ QA_Temperature | QA | NA | harmonization processing quality issues

## Issue Tracker
harmonize_wq is under development. Please report any bugs and enhancement ideas using the issue track:

https://github.com/USEPA/harmonize-wq/issues


## Disclaimer
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use.
EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
21 changes: 11 additions & 10 deletions contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
Contributing to harmonize_wq
============================

We’re so glad you’re thinking about contributing to an EPA open source project! If you’re unsure about anything, just ask — or submit your issue or pull request anyway. The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.
We’re so glad you’re thinking about contributing to an EPA open source project!
If you’re unsure about anything, just ask — or submit your issue or pull request anyway.
The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.

We encourage you to read this project’s CONTRIBUTING policy (you are here), its
`LICENSE <https://github.com/USEPA/harmonize-wq/blob/81b172afc3b72bec0a9f5624bade59eb2527510f/LICENSE>`_,
and its `README <https://github.com/USEPA/harmonize-wq/blob/main/README.md>`_.

All contributions to this project will be released under the MIT dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.
All contributions to this project will be released under the MIT dedication.
By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.

harmonize_wq uses:

Expand All @@ -34,20 +37,18 @@ To contribute fixes, code, tests, or documentation, fork harmonize_wq in GitHub_
and submit the changes using a pull request against the **main** branch.

- If you are submitting new code, add tests (see below) and documentation.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the
`GitHub docs`_.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the `GitHub docs`_.
- Check tests and resolve any issues.

In any case, feel free to use the `issue tracker`_ to discuss ideas for new features or improvements.

Notice that we will not merge a PR if tests are failing. In certain cases tests pass in your
machine but not in GitHub actions. There might be multiple reasons for this but these are some of
the most common:
Notice that we will not merge a PR if tests are failing.
In certain cases tests pass in your machine but not in GitHub actions.
There might be multiple reasons for this but these are some of the most common:

- Your new code does not work for other operating systems or Python versions.
- The documentation is not being built properly or the examples in the docs are
not working.
- The documentation is not being built properly or the examples in the docs are not working.


.. _`issue tracker`: https://github.com/USEPA/harmonize-wq/issues
.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
5 changes: 3 additions & 2 deletions docs/source/example workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ Clean results

Transform results from long to wide format
******************************************
There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
There are many columns in the :class:`pandas.DataFrame` that are characteristic specific, that is they have different values for the same sample depending on the characteristic.
To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.

.. code-block:: python3

Expand Down Expand Up @@ -105,4 +106,4 @@ The number of columns in the resulting table is greatly reduced:
|QA_Temperature | QA |NA |Harmonization quality issues |
+----------------------------+-------------+----------------------------------------+-------------------------------+

For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
For more complete tutorial information, see: `demos <https://github.com/USEPA/harmonize-wq/tree/main/demos>`_
9 changes: 7 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ Standardize, clean, and wrangle Water Quality Portal data into more analytic-rea
Overview
========

US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_. Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_.
Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format.
Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

* Identify differences in data units (including speciation and basis)
* Identify differences in sampling or analytic methods
Expand Down Expand Up @@ -70,4 +72,7 @@ Indices and tables

Disclaimer
==========
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use.
EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information.
Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA.
The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
5 changes: 2 additions & 3 deletions harmonize_wq/basis.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
"""Functions to process characteristic basis or return basis dictionary."""
from warnings import warn
from numpy import nan
from harmonize_wq import harmonize

from harmonize_wq.clean import add_qa_flag

def unit_basis_dict(out_col):
"""Characteristic specific basis dictionary to define basis from units.
Expand Down Expand Up @@ -169,7 +168,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
if old_basis != base:
qa_mask = mask & (df[basis_col] == old_basis)
warn(f'Mismatched {flag}', UserWarning)
df = harmonize.add_qa_flag(df, qa_mask, flag)
df = add_qa_flag(df, qa_mask, flag)
# Add/update basis from unit
df = set_basis(df, mask, base, basis_col)
df[unit_col] = [new_unit if x == old_unit else x
Expand Down
Loading
Loading