Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

Bad flow observations #89

Open
jsadler2 opened this issue Apr 9, 2021 · 10 comments
Open

Bad flow observations #89

jsadler2 opened this issue Apr 9, 2021 · 10 comments

Comments

@jsadler2
Copy link
Collaborator

jsadler2 commented Apr 9, 2021

I found some really wonky flow observations in obs_flow_full.csv. There are 108 observations that have ~-14150 as the value:
MicrosoftTeams-image (12)

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Apr 9, 2021

Any ideas why this is happening?

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Apr 9, 2021

It doesn't seem like these are the values in NWIS 🤔 :
image

@jzwart
Copy link
Member

jzwart commented Apr 11, 2021

Strange. looks like the negative discharges might be offset?

bad = dplyr::filter(d, discharge_cms < -1000)
plot(bad$discharge_cms~as.Date(bad$date), type = 'o')

image

@limnoliver
Copy link
Member

Hey Jeff - trying to track down more info here. Maybe this is an out of date file? I can't find where it's being created in the pipeline. The flow files are now generated from the national flow pull and then subsetted to the DRB here. This is a good reminder (for myself) to periodically clean up the google drive associated with the project.

@jsadler2
Copy link
Collaborator Author

@limnoliver - obs_flow_full.csv is built here which depends on what you linked to above. I just built obs_flow_drb.rds and I see the same bad data.

@limnoliver
Copy link
Member

Thanks Jeff! No wonder I couldn't find it in 2_observations. Will investigate!

@limnoliver
Copy link
Member

limnoliver commented Apr 12, 2021

Okay, issue partially figured out. My first clue was that the site ID was listed twice, which means there were two unique values on that day, and the data were being aggregated in some way (happening here).

Some site-parameter code combos return multiple columns when you retrieve from NWIS. This site, for example, when you pull using data retrieval, looks like this:

test <- dataRetrieval::readNWISdv(siteNumbers = '01465500', parameterCd =  '00060')

image

...which likely means discharge is being measured at two locations at the site. Usually in the national temperature pipeline pulls, I pick the "best" column by choosing the column with the most data when I have to (e.g., when there are more than one observation at that site-day). My guess is that we didn't handle this in the national flow pipeline, and so both columns were being passed and then averaged. In theory, I think this is okay, except for the fact that one of those columns had some -999999.0 values, which I assume is an error code.

The weird part is that these -999999.0 values exist in the national pull data (from 2_observations/in/daily_flow.rds) but I can't recreate them from the above NWIS pull. Maybe they were fixed sometime between the national flow pull (~10 months ago) and now?

@limnoliver
Copy link
Member

And just confirming, this appears to be what's happening in the flow pipeline - note here the column selection part is commented out, and then col_name is being dropped when data from uv and dv are bound together.

@jsadler2
Copy link
Collaborator Author

The weird part is that these -999999.0 values exist in the national pull data (from 2_observations/in/daily_flow.rds) but I can't recreate them from the above NWIS pull. Maybe they were fixed sometime between the national flow pull (~10 months ago) and now?

That is weird. It's kind of comforting that there aren't those values, but also not since now it's a phantom problem.

@aappling-usgs
Copy link
Member

For my postdoc on metabolism estimation, we re-pulled input data from NWIS about a year after the initial pull and saw groups of sites where whole sections of data changed - one change I remember seemed to have to do with correcting a timezone issue, and I think there were also cases where data that had initially been available but weird were taken off NWIS entirely. So I'm not surprised that there might be similar cases in the discharge data for our current projects.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants