Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

Investigate reaches near reservoirs with multiple sites #95

Open
limnoliver opened this issue Apr 21, 2021 · 3 comments
Open

Investigate reaches near reservoirs with multiple sites #95

limnoliver opened this issue Apr 21, 2021 · 3 comments

Comments

@limnoliver
Copy link
Member

limnoliver commented Apr 21, 2021

I noticed that seg_id_nat 1638 had a high RMSE for the RGCN model (RMSE = 6.13) after I updated with the most recent data. This is right below the Neversink. Looks like there are multiple monitoring sites on this reach, and older sites from Ecosheds were further downstream than the new USGS site, which is capturing colder dynamics from the reservoir.

image

We may want to reconsider what data we're keeping, particularly at these reservoir sites where different places along the reach can have really different temperature signals. I don't think this is the reason this site is doing so bad (I think it's doing poorly because the model is clearly not picking up the fact that there is reservoir influence, and I think EcoSheds data was added after this model was trained, so the model didn't get a chance to see any of the NYCDEC data):

compare <- feather::read_feather('3_predictions/out/compare_predictions_obs.feather')
compare <- filter(compare, seg_id_nat %in% '1638') %>%
  filter(!is.na(rgcn2_full_temp_c)) %>%
  filter(date <= as.Date('2008-01-01') & date >= as.Date('2005-12-31'))
ggplot(compare, aes(x = date, y = rgcn2_full_temp_c)) +
  geom_line() +
  geom_point(data = compare, aes(x = date, y = mean_temp_c))

image

@limnoliver limnoliver changed the title Investigate reaches with multiple sites Investigate reaches near reservoirs with multiple sites Apr 21, 2021
@aappling-usgs
Copy link
Member

I see your point that the red points still differ a lot from the model predictions, but the blue points still can't be helping the RMSE, right?

The predictions go down to near zero in winter, and the red points seem to be concentrated in the summer - is part of the impressive difference in your first plot due to the fact that the blue points are more year-round?

@jzwart
Copy link
Member

jzwart commented Apr 22, 2021

hmm, that's tough. Maybe we could add a separate distance criteria for what observations sites to keep if the segment is directly below a reservoir, like only keep sites that are within 1000 m of the top of the segment. But then again, we're trying to predict the entire segment's mean temperature so we can't really throw away sites downstream either.

I wonder if this is a scenario where satellite temperatures could be useful since they have more spatial coverage - it might help represent the segment's mean temperature rather than training / testing on sites from either end of the segment. Or maybe we could somehow tell the model about where in the segment the data are coming from or add observation error?

@aappling-usgs
Copy link
Member

I think PRMS seeks to predict temperatures at the downstream point of each reach, so our observed temperatures should actually prefer the downstream end when there are choices (or just accept the noise and average them all anyway).

We might want to keep a separate copy of nearest-to-reservoir observations for validation of reservoir model predictions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants