getting unusually bad performance for flow in DRB #99

jsadler2 · 2021-04-20T19:19:25Z

I have been training the model on the full DRB and have been getting unusually bad performance.

jsadler2 · 2021-04-20T19:22:15Z

I was getting NSE's of basically 0. When I looked into the training data, I found that there were some bad flow observations (USGS-R/delaware-model-prep#89). When I filtered those out, the performance was better, NSE of 0.38 for Train (1985-2006) and 0.42 for Val (2006-2010).

jsadler2 · 2021-04-20T19:23:03Z

When I look back at results for the full DRB that I was running in Fall 2020, I was getting NSEs of around 0.7, so that was what I was expecting.

jsadler2 · 2021-04-20T19:23:30Z

If I train on the Christina River Basin, I get NSE's around 0.7. So that's comforting and interesting that there'd be that difference.

jzwart · 2021-04-20T19:28:01Z

You're using the same drivers as before?

jsadler2 · 2021-04-20T19:41:11Z

That's I think probably the first thing to look at. I am not using the same drivers. One of the problems is that in the Fall, I wasn't saving the names of the drivers or their mean/std in my prepared data file. I have been doing that for several months now, but I had not started doing that at that time.

I know that they are different though because there are 19 variables in the Fall x_trn array and for a while now, I've been only using 8: x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

jsadler2 · 2021-04-20T19:46:40Z

One thing I tried is plugging in the data from Fall and keeping everything else the same. What I found was this:

new data/new code

old data/new code

old data/old code

So it seems like it boils down to differences in data not code

jzwart · 2021-04-20T20:02:09Z

ah, yeah the other drivers might have had some more flow influence with the intermediate variables?

jsadler2 · 2021-04-20T20:04:38Z

yeah. maybe. but I feel like i would've noticed by now. I guess I should just try with all those drivers and see if that makes the difference.

jsadler2 · 2021-04-20T21:30:29Z

good call @jzwart. that was it. Now I'm getting train/val NSEs of .88/.84 😮.

wdwatkins · 2021-04-20T21:32:38Z

Are hidden layer sizes, etc all the same with the different numbers of inputs?

The initial drop in loss with the new data over the first few epochs seems very unhealthy, like there isn't much signal to fit to. Must be something important in those variables?

jsadler2 · 2021-04-20T21:33:24Z

Yeah. That's the next question :) . Why did that make that big of a difference? Which of those variables is causing that big of an increase in performance? Is the model just taking advantage of those attributes to somehow make individualized models for each reach?

jsadler2 · 2021-04-20T21:37:54Z

The pretraining can get super low because there is a direct relationship between "seg_outflow" (what we are predicting) and "seg_width" in PRMS:

jsadler2 · 2021-04-20T21:39:44Z

So once the model fines that, it basically means the error is zero for pretraining. For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

wdwatkins · 2021-04-20T21:41:08Z

How hard is it to repeatedly train while holding out the new variables one at time?

jsadler2 · 2021-04-20T21:45:28Z

For the record:
8 vars used in the NSE=~0.4 model:

x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

16 vars used in NSE = ~0.8 model:

x_vars: ['seg_ccov', 'seg_elev', 'seg_length', 'seg_rain', 'seg_slope', 'seg_tave_air', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', 'seginc_potet', 'seginc_sroff', 'seginc_ssflow', 'seginc_swrad']

added:

['seg_ccov', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', seginc_sroff', 'seginc_ssflow']

jsadler2 · 2021-04-20T21:45:48Z

How hard is it to repeatedly train while holding out the new variables one at time?

Not too hard

jsadler2 · 2021-04-20T21:46:39Z

For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

It's not "slope". Ha. My guess is wrong. That's in the 8-var version

jsadler2 · 2021-04-20T21:48:16Z

This would probably be a good use case for a partial dependence plot or looking into LIME

jzwart · 2021-04-21T12:43:45Z

This might be another case where we want to make the case for using pre-training data during the test phase. But it is a little concerning that the improvement is so great over the met data alone. I agree that it would be good for partial dependence and/or LIME

aappling-usgs · 2021-04-21T13:13:16Z

make the case for using pre-training data during the test phase

Can you elaborate on that, Jake?

aappling-usgs · 2021-04-21T13:15:25Z

Seems like seg_width just has to go...

jzwart · 2021-04-21T13:58:04Z

Can you elaborate on that, Jake?

Can we make an argument that it is appropriate to use seg_width , seg_upstream_inflow, etc... as drivers for training and testing? They are 'free' data but PRMS is calibrated on some flow data from 1982 to ~2018

jordansread · 2021-04-21T14:01:22Z

Butting in on this because I may have a mistaken understanding of how PRMS calibration works.

Is it calibrated to some flow data because we have calibrated it to the flow data, or is it assumed that the national version (of which this is a cut-out) has used flow data? If the latter, that is where I am confused because I thought PRMS calibrated to intermediate targets but was validated with actual flow data.

aappling-usgs · 2021-04-21T14:13:15Z

Jordan, Jake put notes about PRMS calibration here, including reference to using observed streamflow from 1417 headwater gages for calibration. I'm not sure what qualified as headwater - Jake, do you know?

But also, calibration of a PB model is intentionally much less flexible than training an ML model. We deliberated about this in our train-test-split discussions, especially because we haven't been able to find good ways to exclude all PRMS calibration years from the ML test sets, and I think the difference between PB calibration and ML training is substantial enough that we can probably justify using PRMS outputs even when the calibration period included data in our test period. It's not ideal, but not nearly as bad as building one ML model on another ML model that was trained on the test period.

aappling-usgs · 2021-04-21T14:13:20Z

Thanks for elaborating, Jake. I think we could make an argument that it's feasible in the real world to pass in seg_width, and thus our model development and testing could include seg_width as an input. For intermediate PRMS variables other than seg_width, I agree that including them as inputs in all phases (pretraining, training, and testing) is a reasonable option and may legitimately improve prediction accuracy.

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

jzwart · 2021-04-21T14:19:06Z

I'm not sure what qualified as headwater - Jake, do you know?

I don't know, but they cover a pretty big area in the Eastern US - the 'headwater' basins calibrated with flow data are those in red below:

jordansread · 2021-04-21T14:26:43Z

ahh, I see. So it is actually a calibration target, but follows those intermediate ones. I appreciate the clarification here, as it will keep me from spreading misinformation about what data it has seen and not seen.

aappling-usgs · 2021-04-21T14:31:26Z

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

Hmm - but what happens if we use dropout?

jsadler2 · 2021-06-22T17:22:01Z

I don't think we ever came to a solid understanding why this was happening, but am closing for now.

jsadler2 changed the title ~~getting unusually bad performance in DRB~~ getting unusually bad performance for flow in DRB Apr 21, 2021

jsadler2 closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting unusually bad performance for flow in DRB #99

getting unusually bad performance for flow in DRB #99

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021 •

edited

Loading

jzwart commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jzwart commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

wdwatkins commented Apr 20, 2021 •

edited

Loading

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

wdwatkins commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021 •

edited

Loading

jsadler2 commented Apr 20, 2021 •

edited

Loading

jsadler2 commented Apr 20, 2021

jzwart commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

jzwart commented Apr 21, 2021

jordansread commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021 •

edited

Loading

aappling-usgs commented Apr 21, 2021 •

edited

Loading

jzwart commented Apr 21, 2021 •

edited

Loading

jordansread commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

jsadler2 commented Jun 22, 2021

getting unusually bad performance for flow in DRB #99

getting unusually bad performance for flow in DRB #99

Comments

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021 • edited Loading

jzwart commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

new data/new code

old data/new code

old data/old code

jzwart commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

wdwatkins commented Apr 20, 2021 • edited Loading

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

wdwatkins commented Apr 20, 2021

jsadler2 commented Apr 20, 2021

jsadler2 commented Apr 20, 2021 • edited Loading

jsadler2 commented Apr 20, 2021 • edited Loading

jsadler2 commented Apr 20, 2021

jzwart commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

jzwart commented Apr 21, 2021

jordansread commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021 • edited Loading

aappling-usgs commented Apr 21, 2021 • edited Loading

jzwart commented Apr 21, 2021 • edited Loading

jordansread commented Apr 21, 2021

aappling-usgs commented Apr 21, 2021

jsadler2 commented Jun 22, 2021

jsadler2 commented Apr 20, 2021 •

edited

Loading

wdwatkins commented Apr 20, 2021 •

edited

Loading

jsadler2 commented Apr 20, 2021 •

edited

Loading

jsadler2 commented Apr 20, 2021 •

edited

Loading

aappling-usgs commented Apr 21, 2021 •

edited

Loading

aappling-usgs commented Apr 21, 2021 •

edited

Loading

jzwart commented Apr 21, 2021 •

edited

Loading