Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

getting unusually bad performance for flow in DRB #99

Closed
jsadler2 opened this issue Apr 20, 2021 · 29 comments
Closed

getting unusually bad performance for flow in DRB #99

jsadler2 opened this issue Apr 20, 2021 · 29 comments

Comments

@jsadler2
Copy link
Collaborator

I have been training the model on the full DRB and have been getting unusually bad performance.

@jsadler2
Copy link
Collaborator Author

I was getting NSE's of basically 0. When I looked into the training data, I found that there were some bad flow observations (USGS-R/delaware-model-prep#89). When I filtered those out, the performance was better, NSE of 0.38 for Train (1985-2006) and 0.42 for Val (2006-2010).

@jsadler2
Copy link
Collaborator Author

When I look back at results for the full DRB that I was running in Fall 2020, I was getting NSEs of around 0.7, so that was what I was expecting.

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Apr 20, 2021

If I train on the Christina River Basin, I get NSE's around 0.7. So that's comforting and interesting that there'd be that difference.

@jzwart
Copy link
Member

jzwart commented Apr 20, 2021

You're using the same drivers as before?

@jsadler2
Copy link
Collaborator Author

That's I think probably the first thing to look at. I am not using the same drivers. One of the problems is that in the Fall, I wasn't saving the names of the drivers or their mean/std in my prepared data file. I have been doing that for several months now, but I had not started doing that at that time.

I know that they are different though because there are 19 variables in the Fall x_trn array and for a while now, I've been only using 8: x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

@jsadler2
Copy link
Collaborator Author

One thing I tried is plugging in the data from Fall and keeping everything else the same. What I found was this:

new data/new code

image
image

old data/new code

image
image

old data/old code

image
image

So it seems like it boils down to differences in data not code

@jzwart
Copy link
Member

jzwart commented Apr 20, 2021

ah, yeah the other drivers might have had some more flow influence with the intermediate variables?

@jsadler2
Copy link
Collaborator Author

yeah. maybe. but I feel like i would've noticed by now. I guess I should just try with all those drivers and see if that makes the difference.

@jsadler2
Copy link
Collaborator Author

good call @jzwart. that was it. Now I'm getting train/val NSEs of .88/.84 😮.

@wdwatkins
Copy link

wdwatkins commented Apr 20, 2021

Are hidden layer sizes, etc all the same with the different numbers of inputs?

The initial drop in loss with the new data over the first few epochs seems very unhealthy, like there isn't much signal to fit to. Must be something important in those variables?

@jsadler2
Copy link
Collaborator Author

Yeah. That's the next question :) . Why did that make that big of a difference? Which of those variables is causing that big of an increase in performance? Is the model just taking advantage of those attributes to somehow make individualized models for each reach?

@jsadler2
Copy link
Collaborator Author

The pretraining can get super low because there is a direct relationship between "seg_outflow" (what we are predicting) and "seg_width" in PRMS:
image

@jsadler2
Copy link
Collaborator Author

So once the model fines that, it basically means the error is zero for pretraining. For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

@wdwatkins
Copy link

How hard is it to repeatedly train while holding out the new variables one at time?

@jsadler2
Copy link
Collaborator Author

For the record:
8 vars used in the NSE=~0.4 model:

x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

16 vars used in NSE = ~0.8 model:

x_vars: ['seg_ccov', 'seg_elev', 'seg_length', 'seg_rain', 'seg_slope', 'seg_tave_air', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', 'seginc_potet', 'seginc_sroff', 'seginc_ssflow', 'seginc_swrad']

added:

['seg_ccov', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', seginc_sroff', 'seginc_ssflow']

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Apr 20, 2021

How hard is it to repeatedly train while holding out the new variables one at time?

Not too hard

@jsadler2
Copy link
Collaborator Author

jsadler2 commented Apr 20, 2021

For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

It's not "slope". Ha. My guess is wrong. That's in the 8-var version

@jsadler2
Copy link
Collaborator Author

This would probably be a good use case for a partial dependence plot or looking into LIME

@jzwart
Copy link
Member

jzwart commented Apr 21, 2021

This might be another case where we want to make the case for using pre-training data during the test phase. But it is a little concerning that the improvement is so great over the met data alone. I agree that it would be good for partial dependence and/or LIME

@aappling-usgs
Copy link
Member

make the case for using pre-training data during the test phase

Can you elaborate on that, Jake?

@aappling-usgs
Copy link
Member

Seems like seg_width just has to go...

@jzwart
Copy link
Member

jzwart commented Apr 21, 2021

Can you elaborate on that, Jake?

Can we make an argument that it is appropriate to use seg_width , seg_upstream_inflow, etc... as drivers for training and testing? They are 'free' data but PRMS is calibrated on some flow data from 1982 to ~2018

@jordansread
Copy link
Member

Butting in on this because I may have a mistaken understanding of how PRMS calibration works.

Is it calibrated to some flow data because we have calibrated it to the flow data, or is it assumed that the national version (of which this is a cut-out) has used flow data? If the latter, that is where I am confused because I thought PRMS calibrated to intermediate targets but was validated with actual flow data.

@aappling-usgs
Copy link
Member

aappling-usgs commented Apr 21, 2021

Jordan, Jake put notes about PRMS calibration here, including reference to using observed streamflow from 1417 headwater gages for calibration. I'm not sure what qualified as headwater - Jake, do you know?

But also, calibration of a PB model is intentionally much less flexible than training an ML model. We deliberated about this in our train-test-split discussions, especially because we haven't been able to find good ways to exclude all PRMS calibration years from the ML test sets, and I think the difference between PB calibration and ML training is substantial enough that we can probably justify using PRMS outputs even when the calibration period included data in our test period. It's not ideal, but not nearly as bad as building one ML model on another ML model that was trained on the test period.

@aappling-usgs
Copy link
Member

aappling-usgs commented Apr 21, 2021

Thanks for elaborating, Jake. I think we could make an argument that it's feasible in the real world to pass in seg_width, and thus our model development and testing could include seg_width as an input. For intermediate PRMS variables other than seg_width, I agree that including them as inputs in all phases (pretraining, training, and testing) is a reasonable option and may legitimately improve prediction accuracy.

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

@jzwart
Copy link
Member

jzwart commented Apr 21, 2021

I'm not sure what qualified as headwater - Jake, do you know?

I don't know, but they cover a pretty big area in the Eastern US - the 'headwater' basins calibrated with flow data are those in red below:
image

@jsadler2 jsadler2 changed the title getting unusually bad performance in DRB getting unusually bad performance for flow in DRB Apr 21, 2021
@jordansread
Copy link
Member

ahh, I see. So it is actually a calibration target, but follows those intermediate ones. I appreciate the clarification here, as it will keep me from spreading misinformation about what data it has seen and not seen.

@aappling-usgs
Copy link
Member

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

Hmm - but what happens if we use dropout?

@jsadler2
Copy link
Collaborator Author

I don't think we ever came to a solid understanding why this was happening, but am closing for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants