Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py open sci review #61

Merged
merged 35 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4850fa0
Update README.md
jbousquin Apr 1, 2024
c9df449
Update README.md
jbousquin Apr 1, 2024
4d79ff5
Update README.md
jbousquin Apr 1, 2024
8e9efd8
Update contributing.rst
jbousquin Apr 1, 2024
f86bc05
Update contributing.rst
jbousquin Apr 1, 2024
967fb0f
Update example workflow.rst
jbousquin Apr 1, 2024
63d9ae5
Update index.rst
jbousquin Apr 1, 2024
8ea1eaa
wet_dry_drop() has become outside the normal workflow, what was being…
jbousquin Apr 1, 2024
275b6ae
from harmonize -> clean: df_checks(). add_qa_flag()
jbousquin Apr 1, 2024
c075dbb
convert_unit_series() moved harmonize -> convert
jbousquin Apr 1, 2024
2dd1bb5
Import specific functions instead of module
jbousquin Apr 1, 2024
5577ed7
Fix docs examples
jbousquin Apr 10, 2024
2309eff
Module needs to be imported for example
jbousquin Apr 10, 2024
65e59a1
'Filter/sieve residue' & 'Yield' now included in this domain. This is…
jbousquin Apr 10, 2024
47b244a
Add development environment setup to contributing.rst:
jbousquin Apr 11, 2024
1269896
Add dependency dependencies that are used directly
jbousquin Apr 11, 2024
75cb733
remove try/except
jbousquin Apr 11, 2024
01cae6e
specify exception expected in re_case.
jbousquin Apr 11, 2024
d7028ab
Revert to merge from main
jbousquin Apr 11, 2024
851be91
Py open sci review (#56) (#59)
jbousquin Apr 11, 2024
ffdabd2
Merge pull request #60 from USEPA/main
jbousquin Apr 11, 2024
7314276
Un-revert added line
jbousquin Apr 11, 2024
39d72b2
Merge branch 'pyOpenSci-review' of https://github.com/USEPA/harmonize…
jbousquin Apr 11, 2024
d8125c4
suggestion: replace x in list(set(pandas_series)) pattern w/ .unique …
jbousquin Apr 11, 2024
3544328
suggestion: replace one dict functions w/ module-level dicts (domains…
jbousquin Apr 11, 2024
20922e1
suggestion: replace one dict functions w/ module-level dicts (xy_datu…
jbousquin Apr 11, 2024
7962e1c
Updated implementations of the dicts vs functions.
jbousquin Apr 11, 2024
b60ecf0
assign w/ np.where() for masked replace
jbousquin Apr 11, 2024
3729baa
replace x in list(set(pandas_series)) pattern w/ .unique method
jbousquin Apr 11, 2024
7751b1b
replace one dict functions w/ module-level dicts (unit_basis_dict(out…
jbousquin Apr 11, 2024
6f933f5
Implemented function -> dict changes
jbousquin Apr 11, 2024
67f6c12
Reverted this line - somehow it was impacting the dataretrieval get f…
jbousquin Apr 11, 2024
e398bb9
Update basis.py
jbousquin Apr 12, 2024
3c63d92
np.where will coerce x & y to the array dtype. A problem here as it c…
jbousquin Apr 26, 2024
55cb3cd
Update basis.py
jbousquin Apr 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ and submit the changes using a pull request against the **main** branch.

- If you are submitting new code, add tests (see below) and documentation.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the `GitHub docs`_.
- Classes, methods, functions, etc. should have docstrings.
- Check tests and resolve any issues.

In any case, feel free to use the `issue tracker`_ to discuss ideas for new features or improvements.
Expand All @@ -49,6 +50,16 @@ There might be multiple reasons for this but these are some of the most common:
- Your new code does not work for other operating systems or Python versions.
- The documentation is not being built properly or the examples in the docs are not working.

Development environment setup
-----------------------------

- pip install the latest development version of the package from `GitHub <https://github.com/USEPA/harmonize-wq>`_
- Install the requirements for the development environment by pip installing the additional requirements-dev.txt file.

docs are built using sphinx
tests are run using pytest

There are workflows using GitHub actions for both docs and tests to help avoid 'it worked on my machine' type development issues.

.. _`issue tracker`: https://github.com/USEPA/harmonize-wq/issues
.. _`GitHub docs`: https://help.github.com/articles/closing-issues-via-commit-messages/
6 changes: 1 addition & 5 deletions harmonize_wq/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
from harmonize_wq import harmonize

try:
from importlib.metadata import version, PackageNotFoundError
except ImportError:
from importlib_metadata import version, PackageNotFoundError
from importlib.metadata import version, PackageNotFoundError

try:
__version__ = version('harmonize_wq')
Expand Down
211 changes: 101 additions & 110 deletions harmonize_wq/basis.py
Original file line number Diff line number Diff line change
@@ -1,97 +1,89 @@
# -*- coding: utf-8 -*-
"""Functions to process characteristic basis or return basis dictionary."""

import numpy
from warnings import warn
from numpy import nan
from harmonize_wq.clean import add_qa_flag

def unit_basis_dict(out_col):
"""Characteristic specific basis dictionary to define basis from units.

The out_col is often derived from :attr:`WQCharData.char_val`. The desired
basis can be used as a key to subset result.

Parameters
----------
out_col : str
Column name where results are written.

Returns
-------
dict
Dictionary with logic for determining basis from units string and
standard :mod:`pint` units to replace those with.
The structure is {Basis: {standard units: [unit strings with basis]}}.

Examples
--------
Get dictionary for Phosphorus and subset for 'as P':

>>> from harmonize_wq import basis
>>> basis.unit_basis_dict('Phosphorus')['as P']
{'mg/l': ['mg/l as P', 'mg/l P'], 'mg/kg': ['mg/kg as P', 'mg/kg P']}
"""
dictionary = {'Phosphorus': {'as P': {'mg/l': ['mg/l as P', 'mg/l P'],
'mg/kg': ['mg/kg as P', 'mg/kg P']},
'as PO4': {'mg/l': ['mg/l as PO4',
'mg/l PO4'],
'mg/kg': ['mg/kg as PO4',
'mg/kg PO4']}},
'Nitrogen': {'as N': {'mg/l': ['mg/l as N', 'mg/l N']}},
'Carbon': {},
}
return dictionary[out_col]


def basis_conversion():
"""Get dictionary of conversion factors to convert basis/speciation.

For example, this is used to convert 'as PO4' to 'as P'.

Returns
-------
dict
Dictionary with structure {basis: conversion factor}

See Also
--------
:func:`convert.moles_to_mass`
"""Characteristic specific basis dictionary to define basis from units.

The out_col is often derived from :attr:`WQCharData.char_val`. The desired
basis can be used as a key to subset result.

Parameters
----------
out_col : str
Column name where results are written.

Returns
-------
dict
Dictionary with logic for determining basis from units string and
standard :mod:`pint` units to replace those with.
The structure is {Basis: {standard units: [unit strings with basis]}}.

Examples
--------
Get dictionary for Phosphorus and subset for 'as P':

>>> from harmonize_wq import basis
>>> basis.unit_basis_dict['Phosphorus']['as P']
{'mg/l': ['mg/l as P', 'mg/l P'], 'mg/kg': ['mg/kg as P', 'mg/kg P']}
"""
unit_basis_dict = {
"Phosphorus": {
"as P": {"mg/l": ["mg/l as P", "mg/l P"], "mg/kg": ["mg/kg as P", "mg/kg P"]},
"as PO4": {
"mg/l": ["mg/l as PO4", "mg/l PO4"],
"mg/kg": ["mg/kg as PO4", "mg/kg PO4"],
},
},
"Nitrogen": {"as N": {"mg/l": ["mg/l as N", "mg/l N"]}},
"Carbon": {},
}

"""basis.bass_conversionGet dictionary of conversion factors to convert basis/speciation.

basis.bass_conversion. For example, this is used to convert 'as PO4' to 'as P'.

Returns
-------
dict
Dictionary with structure {basis: conversion factor}

See Also
--------
:func:`convert.moles_to_mass`

`Best Practices for Submitting Nutrient Data to the Water Quality eXchange
<www.epa.gov/sites/default/files/2017-06/documents/wqx_nutrient_best_practices_guide.pdf>`_
"""
basis_conversion = {
"NH3": 0.822,
"NH4": 0.776,
"NO2": 0.304,
"NO3": 0.225,
"PO4": 0.326,
}

"""basis.stp_dict: Get standard temperature and pressure to define basis from units.

Notes
-----
This needs to be updated to include pressure or needs to be renamed.

`Best Practices for Submitting Nutrient Data to the Water Quality eXchange
<www.epa.gov/sites/default/files/2017-06/documents/wqx_nutrient_best_practices_guide.pdf>`_
"""
return {'NH3': 0.822,
'NH4': 0.776,
'NO2': 0.304,
'NO3': 0.225,
'PO4': 0.326}
Returns
-------
dict
Dictionary with {'standard temp' : {'units': [values to replace]}}.
"""
stp_dict = {"@25C": {"mg/mL": ["mg/mL @25C"]}}


def stp_dict():
"""Get standard temperature and pressure to define basis from units.

Notes
-----
This needs to be updated to include pressure or needs to be renamed.

Returns
-------
dict
Dictionary with {'standard temp' : {'units': [values to replace]}}.

Examples
--------
Get dictionary for taking temperature basis our of units:

>>> from harmonize_wq import basis
>>> basis.stp_dict()
{'@25C': {'mg/mL': ['mg/mL @25C']}}
"""
return {'@25C': {'mg/mL': ['mg/mL @25C']}}


def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation'):
def basis_from_unit(df_in, basis_dict, unit_col="Units", basis_col="Speciation"):
"""Move basis from units to basis column in :class:`pandas.DataFrame`.

Move basis information from units in unit_col column to basis in basis_col
column based on basis_dict. If basis_col does not exist in df_in it will be
created. The unit_col column is updated in place. To maintain data
Expand Down Expand Up @@ -119,7 +111,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
Examples
--------
Build pandas DataFrame for example:

>>> from pandas import DataFrame
>>> df = DataFrame({'CharacteristicName': ['Phosphorus', 'Phosphorus',],
... 'ResultMeasure/MeasureUnitCode': ['mg/l as P', 'mg/kg as P'],
Expand All @@ -131,16 +123,16 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
1 Phosphorus mg/kg as P mg/kg as P

>>> from harmonize_wq import basis
>>> basis_dict = basis.unit_basis_dict('Phosphorus')
>>> basis_dict = basis.unit_basis_dict['Phosphorus']
>>> unit_col = 'Units'
>>> basis.basis_from_unit(df, basis_dict, unit_col)
CharacteristicName ResultMeasure/MeasureUnitCode Units Speciation
0 Phosphorus mg/l as P mg/l as P
1 Phosphorus mg/kg as P mg/kg as P
If an existing basis_col value is different, a warning is issued when it is

If an existing basis_col value is different, a warning is issued when it is
updated and a QA_flag is assigned:

>>> from numpy import nan
>>> df['Speciation'] = [nan, 'as PO4']
>>> df_speciation_change = basis.basis_from_unit(df, basis_dict, unit_col)
Expand All @@ -161,7 +153,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')
# Add flags anywhere the values are updated
flag1 = f'{basis_col}: updated from '
# List of unique basis values
basis_list = list(set(df.loc[mask, basis_col].dropna()))
basis_list = df.loc[mask, basis_col].dropna().unique()
# Loop over existing values in basis field
for old_basis in basis_list:
flag = f'{flag1}{old_basis} to {base} (units)'
Expand All @@ -178,7 +170,7 @@ def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation')

def basis_from_method_spec(df_in):
"""Copy speciation from MethodSpecificationName to new 'Speciation' column.

Parameters
----------
df_in : pandas.DataFrame
Expand All @@ -192,7 +184,7 @@ def basis_from_method_spec(df_in):
Examples
--------
Build pandas DataFrame for example:

>>> from pandas import DataFrame
>>> from numpy import nan
>>> df = DataFrame({'CharacteristicName': ['Phosphorus', 'Phosphorus',],
Expand All @@ -204,7 +196,7 @@ def basis_from_method_spec(df_in):
0 Phosphorus as P NWIS
1 Phosphorus NaN NWIS

>>> from harmonize_wq import basis
>>> from harmonize_wq import basis
>>> basis.basis_from_method_spec(df)
CharacteristicName MethodSpecificationName ProviderName Speciation
0 Phosphorus as P NWIS as P
Expand All @@ -221,23 +213,23 @@ def basis_from_method_spec(df_in):
mask = df[old_col] == base
df = set_basis(df, mask, base)
# Remove basis from MethodSpecificationName
#TODO: why update old field?
#df[old_col] = [nan if x == base else x for x in df[old_col]]
# TODO: why update old field?
# df[old_col] = [nan if x == base else x for x in df[old_col]]
# Test we didn't miss any methodSpec
#assert set(df[old_col].dropna()) == set(), (set(df[old_col].dropna()))
# assert set(df[old_col].dropna()) == set(), (set(df[old_col].dropna()))

return df


def update_result_basis(df_in, basis_col, unit_col):
"""Move basis from unit_col column to basis_col column.

This is usually used in place of basis_from_unit when the basis_col is not
'ResultMeasure/MeasureUnitCode' (i.e., not speciation).

Notes
-----
Rather than creating many new empty columns this function currently overwrites the original
Rather than creating many new empty columns this function currently overwrites the original
basis_col values. The original values are noted in the QA_flag.

Parameters
Expand All @@ -258,7 +250,7 @@ def update_result_basis(df_in, basis_col, unit_col):
Examples
--------
Build pandas DataFrame for example:

>>> from pandas import DataFrame
>>> from numpy import nan
>>> df = DataFrame({'CharacteristicName': ['Salinity', 'Salinity',],
Expand All @@ -269,8 +261,8 @@ def update_result_basis(df_in, basis_col, unit_col):
CharacteristicName ResultTemperatureBasisText Units
0 Salinity 25 deg C mg/mL @25C
1 Salinity NaN mg/mL @25C
>>> from harmonize_wq import basis

>>> from harmonize_wq import basis
>>> df_temp_basis = basis.update_result_basis(df,
... 'ResultTemperatureBasisText',
... 'Units')
Expand All @@ -290,7 +282,7 @@ def update_result_basis(df_in, basis_col, unit_col):

# Basis from unit
if basis_col == 'ResultTemperatureBasisText':
df_out = basis_from_unit(df_in.copy(), stp_dict(), unit_col, basis_col)
df_out = basis_from_unit(df_in.copy(), stp_dict, unit_col, basis_col)
# NOTE: in the test case 25 deg C -> @25C
elif basis_col == 'ResultParticleSizeBasisText':
# NOTE: These are normally 'less than x mm', no errors so far to fix
Expand Down Expand Up @@ -328,7 +320,7 @@ def set_basis(df_in, mask, basis, basis_col='Speciation'):
Examples
--------
Build pandas DataFrame for example:

>>> from pandas import DataFrame
>>> df = DataFrame({'CharacteristicName': ['Phosphorus',
... 'Phosphorus',
Expand All @@ -339,12 +331,12 @@ def set_basis(df_in, mask, basis, basis_col='Speciation'):
CharacteristicName MethodSpecificationName
0 Phosphorus as P
1 Phosphorus as PO4
2 Salinity
2 Salinity

Build mask for example:

>>> mask = df['CharacteristicName']=='Phosphorus'

>>> from harmonize_wq import basis
>>> basis.set_basis(df, mask, basis='as P')
CharacteristicName MethodSpecificationName Speciation
Expand All @@ -353,9 +345,8 @@ def set_basis(df_in, mask, basis, basis_col='Speciation'):
2 Salinity NaN
"""
df_out = df_in.copy()
# Add Basis column if it doesn't exist
if basis_col not in df_out.columns:
df_out[basis_col] = nan
# Populate Basis column where expected value with basis
df_out[basis_col] = numpy.nan
# Otherwise don't mess with existing values that are not part of mask
df_out.loc[mask, basis_col] = basis
return df_out
2 changes: 1 addition & 1 deletion harmonize_wq/clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ def methods_check(df_in, char_val, methods=None):

"""
if methods is None:
methods = accepted_methods()
methods = accepted_methods
method_col = 'ResultAnalyticalMethod/MethodIdentifier'
df2 = df_in.copy()
# TODO: check df for method_col
Expand Down
Loading
Loading