-
Notifications
You must be signed in to change notification settings - Fork 15
getting unusually bad performance for flow in DRB #99
Comments
I was getting NSE's of basically 0. When I looked into the training data, I found that there were some bad flow observations (USGS-R/delaware-model-prep#89). When I filtered those out, the performance was better, NSE of 0.38 for Train (1985-2006) and 0.42 for Val (2006-2010). |
When I look back at results for the full DRB that I was running in Fall 2020, I was getting NSEs of around 0.7, so that was what I was expecting. |
If I train on the Christina River Basin, I get NSE's around 0.7. So that's comforting and interesting that there'd be that difference. |
You're using the same drivers as before? |
That's I think probably the first thing to look at. I am not using the same drivers. One of the problems is that in the Fall, I wasn't saving the names of the drivers or their mean/std in my prepared data file. I have been doing that for several months now, but I had not started doing that at that time. I know that they are different though because there are 19 variables in the Fall |
ah, yeah the other drivers might have had some more flow influence with the intermediate variables? |
yeah. maybe. but I feel like i would've noticed by now. I guess I should just try with all those drivers and see if that makes the difference. |
good call @jzwart. that was it. Now I'm getting train/val NSEs of .88/.84 😮. |
Are hidden layer sizes, etc all the same with the different numbers of inputs? The initial drop in loss with the new data over the first few epochs seems very unhealthy, like there isn't much signal to fit to. Must be something important in those variables? |
Yeah. That's the next question :) . Why did that make that big of a difference? Which of those variables is causing that big of an increase in performance? Is the model just taking advantage of those attributes to somehow make individualized models for each reach? |
So once the model fines that, it basically means the error is zero for pretraining. For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess. |
How hard is it to repeatedly train while holding out the new variables one at time? |
For the record:
16 vars used in NSE = ~0.8 model:
added:
|
Not too hard |
It's not "slope". Ha. My guess is wrong. That's in the 8-var version |
This would probably be a good use case for a partial dependence plot or looking into LIME |
This might be another case where we want to make the case for using pre-training data during the test phase. But it is a little concerning that the improvement is so great over the met data alone. I agree that it would be good for partial dependence and/or LIME |
Can you elaborate on that, Jake? |
Seems like |
Can we make an argument that it is appropriate to use |
Butting in on this because I may have a mistaken understanding of how PRMS calibration works. Is it calibrated to some flow data because we have calibrated it to the flow data, or is it assumed that the national version (of which this is a cut-out) has used flow data? If the latter, that is where I am confused because I thought PRMS calibrated to intermediate targets but was validated with actual flow data. |
Jordan, Jake put notes about PRMS calibration here, including reference to using observed streamflow from 1417 headwater gages for calibration. I'm not sure what qualified as headwater - Jake, do you know? But also, calibration of a PB model is intentionally much less flexible than training an ML model. We deliberated about this in our train-test-split discussions, especially because we haven't been able to find good ways to exclude all PRMS calibration years from the ML test sets, and I think the difference between PB calibration and ML training is substantial enough that we can probably justify using PRMS outputs even when the calibration period included data in our test period. It's not ideal, but not nearly as bad as building one ML model on another ML model that was trained on the test period. |
Thanks for elaborating, Jake. I think we could make an argument that it's feasible in the real world to pass in However, I think it's a bad methods idea to pass in |
ahh, I see. So it is actually a calibration target, but follows those intermediate ones. I appreciate the clarification here, as it will keep me from spreading misinformation about what data it has seen and not seen. |
Hmm - but what happens if we use dropout? |
I don't think we ever came to a solid understanding why this was happening, but am closing for now. |
I have been training the model on the full DRB and have been getting unusually bad performance.
The text was updated successfully, but these errors were encountered: