Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First crack at changing the url for one function #152

Merged
merged 20 commits into from
Sep 17, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions dataretrieval/wqp.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from .utils import BaseMetadata, query


def get_results(ssl_check=True, **kwargs):
def get_results(ssl_check=True, legacy=False, **kwargs):
"""Query the WQP for results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the documentation here and in the other what_* functions could be clearer. For example, what are the "results" that one would expect to obtain? Or what does it mean to search for sites/projects/organizations/activities/etc. "within a region with specific data?" For what purpose would someone use these functions? A simple description of these WQP endpoints would be helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the definitions of the functions are a bit murky. We can use dataRetrieval as a good example: https://doi-usgs.github.io/dataRetrieval/reference/wqpSpecials.html, though also quite light on specifics.


Any WQP API parameter can be passed as a keyword argument to this function.
Expand All @@ -30,6 +30,12 @@ def get_results(ssl_check=True, **kwargs):
ssl_check: bool
If True, check the SSL certificate. Default is True. If False, SSL
certificate is not checked.
legacy: Boolean
Defaults to False and returns the new WQX3 Result profile.
If set to True, returns data using the legacy WQX version 2 profiles.
dataProfile: string
Describes the type of columns to return with the result dataset.
Includes 'fullPhysChem', 'biological', 'narrow', and 'basicPhysChem'.
siteid: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these parameters be added to the documentation of the other WQP functions? Seems like it would be annoying to look at the get_results documentation to know how to use the what_* functions. Plus, the claim in the other functions that they "accept the same parameters" as get_results is not true; for example, none of them need the dataProfile argument. Copying + pasting the relevant parameters to the other functions' doc strings is easy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye. I'm not sure the convention with kwargs documentation, but I can look around (see bottom of this Sphinx doc page: https://pythonhosted.org/an_example_pypi_project/sphinx.html#full-code-exampl). @thodson-usgs, what do you think? I appreciate linking to the webservices guide, which means we don't need to be checking our documentation against the WQP's, but for the uninitiated, wading through the webservices guide (which covers building urls but isn't exactly specific to using a python function) might be a little confusing. Again, I think I'm going to add this as an issue and let this PR only mess with the wqx3 machinery stuff.

Concatenate an agency code, a hyphen ("-"), and a site-identification
number.
Expand Down Expand Up @@ -63,10 +69,7 @@ def get_results(ssl_check=True, **kwargs):
for available characteristic names)
mimeType: string
String specifying the output format which is 'csv' by default but can
be 'geojson'
zip: string
Parameter to stream compressed data, if 'yes', or uncompressed data
if 'no'. Default is 'no'.
be 'geojson'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeoJSON is not yet supported, so this documentation should reflect that mimeType will automatically be changed to csv.

On a related note, is there a reason why the other possible WQP file outputs (tsv, xlsx) aren't supported here? I tested out passing mimeType="tsv" and "xlsx" to the query function and it returned the expected output. Seems like it would be a simple addition to expand the functionality.

Copy link
Collaborator Author

@ehinman ehinman Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Legacy supports tsv and xlsx, but it looks like beta does not yet support anything other than csv. I might add this as an issue to follow up on, but leave as-is for this PR. #162


Returns
-------
Expand All @@ -91,7 +94,19 @@ def get_results(ssl_check=True, **kwargs):
_warn_v3_profiles_outage()

kwargs = _alter_kwargs(kwargs)
response = query(wqp_url('Result'), kwargs, delimiter=';', ssl_check=ssl_check)

if legacy is True:
warnings.warn('Legacy profiles return stale USGS data as of '
'March 2024. Please set legacy=True to use the WQX3 '
'data profiles and obtain the latest USGS data.')
url = wqp_url('Result')

else:
url = wqx3_url('Result')
if 'dataProfile' not in kwargs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check whether dataProfile is among a list of valid profiles and throw an error if not. Examples in nwis.

kwargs['dataProfile'] = 'basicPhysChem'

response = query(url, kwargs, delimiter=';', ssl_check=ssl_check)

df = pd.read_csv(StringIO(response.text), delimiter=',')
return df, WQP_Metadata(response)
Expand Down Expand Up @@ -472,6 +487,11 @@ def wqp_url(service):
base_url = 'https://www.waterqualitydata.us/data/'
return f'{base_url}{service}/Search?'

def wqx3_url(service):
ehinman marked this conversation as resolved.
Show resolved Hide resolved
"""Construct the WQP URL for a given WQX 3.0 service."""
base_url = 'https://www.waterqualitydata.us/wqx3/'
return f'{base_url}{service}/search?'


class WQP_Metadata(BaseMetadata):
"""Metadata class for WQP service, derived from BaseMetadata.
Expand Down Expand Up @@ -531,9 +551,6 @@ def _alter_kwargs(kwargs):
user so they are aware of which are being hard-set.

"""
if kwargs.get('zip', 'no') == 'yes':
warnings.warn('Compressed data not yet supported, zip set to no.')
kwargs['zip'] = 'no'

if kwargs.get('mimeType', 'csv') == 'geojson':
warnings.warn('GeoJSON not yet supported, mimeType set to csv.')
Expand Down
Loading