-
Notifications
You must be signed in to change notification settings - Fork 13
Higher tmax / tmin for new driver data #41
Comments
So I think this is telling us it doesn't have to do with differences basin weights, right? Since we'd expect uniformity in pattern across temp and precip? It also suggests there's not just a few bad segments where something got scrambled. |
I think that's right, Sam, re: weights. The clustering of differences (high or low) are a bit odd and it almost seems like it's matching data from somewhere else |
Could there be some confusion between |
We're looking into the indexing in the code, which could be similar to a |
Ok new figure showing 2015-2016 average June-August tmax and tmin throughout the basin, based on both the previously pulled driver data (left) and the newly pulled driver data (right). Based on this, Jake and I are thinking it is an indexing issue, likely due to a work-around I put in place since we were missing one csv file with ids. I'm going to email Rich McDonald to see if we can get that file and try re-running the code. |
Ok - quick update. Rich McDonald pointed me to his new repo, with cleaned-up code and a revised conversion script. I pulled data for 2015 using the new repo and code, and we're still seeing disagreement between the original driver data and the newest data (which I'm referring to as gridmetetl, after the name of the repo, to distinguish it from the data I pulled earlier this spring [and plotted up above] using the code in the onhm-fetcher-parser repo). Unfortunately, the newly pulled 2015 gridmetetl driver data (here denoted as 'new) also does not match the original 2015 driver data (denoted as 'old): Here's a comparison of 2015 gridmetetl tmin data and original tmin data for segment 2047: The gridmetetl driver data show a nearly identical spatial pattern to the onhm-fetcher-parser driver data (see above). It seems like there is still an indexing issue. Next steps: Reach out to Rich to report on how implementing the new code went (well), and how the new data looks (same issues as before) |
From Rich:
That order is not the same as the .cbh HRU order which is based on model index. Will try reordering to model index and compare again. |
Ok - so it indeed comes down to a difference in how the .cbh files are indexed. There does not appear to be any indexing error in how the current set of scripts downloads netCDF files or in how those files are converted to .cbh. The issue we were seeing when mapping data to the segment level was that the 'new' driver data used a different set of indices than the old driver data, so our script to map values to segments based on the hru values, when supplied with the new driver data, was computing area weighted averages for the wrong hrus (thinking they were indexed based on model_idx [like the old driver data], when they were actually index based on hru-ids]. We didn't catch this before because the .cbh files actually don't have any header information, so there is no way to know how they are indexed without digging into their generation. We assumed that the old and new data were indexed the same way, when they actually weren't. Note: Accounting for the indexing issues solves the majority of our problem, but there still appear to be some discrepancies between the old and new driver data (at least for the date shown: 2015-01-01). I'll dig more into that tomorrow. I can also regenerate the segment-level plots tomorrow |
🕵️♀️ Nice work! |
Ok - so I modified the ncf2cbh script to take the data imported from netCDF, re-indexed it to model_idx, and then export it as .cbh files. At the hru-level, the re-indexed data matches the gridmetetl data I first downloaded (i.e., reindexing worked correctly), and now the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data are both using model_idx to index the hrus. As I mentioned in my last post, there are clearly still some discrepancies between the old and new driver data -- the plots below dig into this more: Differences notable here, on June 1st, 2015: I then used the re-indexed .cbh files as input to Jake's script to derive segment-level driver data. Here are a series of plots comparing the 'new' driver data to the 'old' driver data . Still some weird things happening @jzwart any thoughts?: But these time-series plot for segments look much better than before (though there are clearly discrepancies): Here are the segment level plots - Differences between 2015-2016 year-round averages for each segment. For themajority of segments the data is different between the two datasets, but the magnitude of the differences is much smaller: 2015-2016 June-August average for each segment. Looks much better than before, in that the new driver data now shows the N-S gradient. However note that the ranges of temperatures differ somewhat: Then I re-did the weather data plots for segment 2047. These look weird (esp. tmin), so I'm going to pull some weather data for segments for which driver data differs substantially between the 'old' and 'new' data sets (based on the segment plots, above), and plot that up as well. |
Nice job on this, @hcorson-dosch ! It would be good to know how different the HRU driver data is too, in addition to the segment driver data, just to rule out any issues with the function that grabs driver data for each segment. A quick comparison by reading in the old and new .cbh files and then plotting column 8 (for example) from each .cbh file against each other would be a quick test. Maximum temperature looks pretty good, but it's strange that tmin seems adjusted one-two days before the other data. Is that the case for every segment? Glad that the mean differences are much better now, that's encouraging 👍 |
@hcorson-dosch I agree with Jake, The direct link between the Gridmet data as pulled and the Delaware model is the HRU, Also there have been some changes to the raw Gridmet data that could potentially result in differences between what currently pulled and the original data: http://www.climatologylab.org/gridmet.html Look at the Updates tab |
Thanks for the link re: gridmet updates @rmcd-mscb! I'll make some hru-level plots today |
Meant to mention I pulled the raw data in netCDF format from the gridmet site using the Aggregated THREDDS Catalog (OPENDAP) |
WOW @hcorson-dosch! This is a ton of work, and thanks for pulling together all these comparisons. The shifted +1 day in some locations and times is really strange to me, while other time periods it actually matches really well. I guess I'm naive about whether this is an issue with gridded met data in general or something specific to gridmet. I know there are some biases in NLDAS data when compared to weather data, for example, but those seemed to be a consistent offset rather than the messiness that is happening here. I think it's encouraging that the original driver data matches the weather data fairly well. It would be good to know how that was created and the source of the driver data. |
Oooof. On one hand, bummer that there are multiple issues, but I think you're narrowing in! I'm not sure what to do/think about the Gridmet data -- and it would be difficult to (comprehensively) know which segments or what time periods to shift by a day. Maybe as one last check of the raw data, download directly from Gridmet instead of THREDDS? Hayley - one way to view error might be to look at cumulative error (either from weather data or from old driver data) to get a sense for how bad it is across reaches. So for your three reach examples, sum absolute error every day (of reindexed and offset data), and then plot cumulative error through time? This might help identify the time periods that are offset, and should ID February as a true outlier. @jzwart how are you feeling about this now? Where do we go from here if the Gridmet data is "bad"? Edited - oops, sorry Jake, I had not refreshed before I commented! Thanks for your input. I will ask Steve about the origins of the driver data. |
This pub from 2018 about NHM says it was configured with Daymet. |
Only because it seems oddly reminiscent of a similar offset issue we had in the past that was based on how the metadata was used on OPeNDAP, sharing this: https://github.com/USGS-CIDA/stream_metab_usa/issues/68 |
Thanks all. @jread-usgs that's really interesting that you and Dave saw a similar issue with OPENDAP and reading data in from .netcdfs. Something similar could be happening here. @limnoliver I did download some data directly from gridmet, but couldn't get it to read in correctly using two different netcdf packages in Python that I've been using. Sometimes all the temp values were NaN, and in other cases the dates were 1 day off, and then in others the dates were way off. So that might all be tied back to the issue Jordan pointed to. Perhaps I should try again, but read it in using a different package, or try reading it in in R? That's also a good idea to look at cumulative error. I'll plan to dig back into this tomorrow. |
I think it's important to compare the HRU data rather than the segment data, because Gridmet and Daymet are specifically mapped to the HRU. Also, you should be able to read that raw gridmet data directly into xarray. I'll also double check the gridmetetl code to make sure there are no indexing errors. |
Hi @rmcd-mscb thanks for double-checking the gridmetetl code. All of the plots in my latest comment plot the HRU data, not the segment-level data except for nine of the plots near the end that compare weather data to driver data derived at the segment level (as well as raw gridmet data at the grid cell [lat, long] level). I was using xarray to read in the netCDF data. It worked great for the netCDF files generated by your script and for the netCDF files I pulled from THREDDS, but for some reason when xarray read in raw netCDFs directly from gridmet I was running into errors. For the individual day netCDFs, the dates were incorrect: And for the annual netCDFs I tried, xarray could not read in the temperature variable correctly, giving an error that it had an unsigned attribute but was not of integer type: I also tried the Dataset module from the netCDF package, but was getting null temperature values from that as well. Since I could read in the netCDF data from THREDDS just fine with xarray, I used that data and didn't really dig into those issues, but I'll now try to troubleshoot why the raw gridmet data wasn't being read correctly by xarray. |
Hi Hayley, Anyway look at this notebook here: https://nbviewer.jupyter.org/github/nhm-usgs/gridmetetl/blob/delaware/Examples/Delaware_xarray.ipynb you should be able to extract the data. If you know the lat/lon of your weather station you can grab it from xarray with something like this: ds.sel(lat=-35.28, lon=149.13, method='nearest'). If you now what hru the weather station lies in, you could plot weather station data against the climate data for that HRU rather than the segment. It won't be perfect remember GM is 4km resolution HOWEVER - I think I found the offset error. If you look at the attributes you will see the note below which I've never noticed before - Days are in MST! Also note it says approximately note5 : Finally, if the original forcing data is Daymet, Maybe Parker could pull you a new bandit version with updated daymet data. 2019 Daymet data just became available. |
@rmcd-mscb Thanks for all your work digging into this! I'll take a look at your notebook tomorrow, and will try addressing the offset you pointed out to see if that fixes our issues. I'll also add the weather data to the hru-level plots that compare the hru-level driver data to the raw gridmet data I had extracted from the grid at the location of the weather data. |
Ok - I was able to read in the raw gridmet data - It had read in correctly before, but appeared to be null until I specified a particular date and/or grid cell for which to view data. Not sure why. I've updated the HRU-level plots to include the raw gridmet data alongside the THREDDS gridmet data and the 'old' (original 1980-2016 driver data) and 'new' (re-indexed driver data pulled with the gridmetetl script) driver data, as well as the 'new' driver data shifted by +1 day. I've included plots for three locations (Wilmington, DE; Trenton, NJ; Allentown, PA) for three periods (June 2015; January 15 - March 15, 2016; October 2016). Note that the driver-data values for each HRU are generated by weighting the gridded data for all cells that overlap each HRU, so may not match the raw gridmet data exactly. TmaxOverall, there is very good agreement between the raw gridmet data, the reindexed driver data, and weather data for all of the periods and locations, except for February 2016, when the raw gridmet data and reindexed driver data do not track the weather data at all. June 1-30, 2015 TminWith these plots that are temporally zoomed in (relative to my past plots), it appears that the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016. June 1-30, 2015 PrecipitationOverall, the raw gridmet data and the reindexed driver data are pretty consistently offset from the weather data for all of the periods and locations. It looks like the offset is ~1 day. As for Tmax and Tmin, the raw gridmet data and reindexed driver data do not track the weather data at all in February 2016. June 1-30, 2015 Overall, it appears that the actual raw gridmet data match the gridmet data pulled from THREDDS and the reindexed driver data pulled using Rich's gridmetetl repo (👍 ) but that suggests that the actual raw gridmet data are incorrect in February 2016 (👎 👎 👎 ). Also - the tmin and prcp data appear to be consistently offset from weather data by approximately 1 day (👎 ❓ ). Interestingly, the tmax data do not appear to be offset (😕 ❔). I'm still pretty mystified by this offset. According to the documentation,
And as @rmcd-mscb noticed, it also states in the metadata that:
Here's how I am thinking about it, which may be incorrect. If so, please let me know!
Does that thinking seem correct? Or am I missing something obvious? If you are curious, here are descriptions of the metadata for the netCDF files on the Northwest Knowledge network: tmax, tmin, prcp. According to this documentation, they all have the same datetime format. It seems odd that the tmax data is not offset, yet the tmin and prcp data do seem to be offset. Maybe there is some error in how datetime is being encoded for the tmin and prcp data? Thoughts about the offset or what could be happening in February 2016? |
Thanks @hcorson-dosch! I agree with your assessment. The Feb 2016 issue is definitely strange and the offset that is dependent on data product / location / time of year is very confusing. |
Hi @hcorson-dosch and @jzwart I've pointed this thread to Steve M, Parker N and Jacob LF, and it's on our list to look at early next week. They have more experience with this kind of data and correlations to weather station data, in the context of PRMS. In anticipation of showing this to the gridMET folks, can I suggest redoing the plots above by relabeling the data as: Raw Gridmet data -> gridMET point interpolation Also note that the validation of gridMET data was made for stations in the Western US: https://webpages.uidaho.edu/jabatzoglou/PDF/IJOC_Abatzoglou_2012.pdf I think it would be good to hear what the NHM group thinks first but then I'm happy to help and be involved in a conversation with the gridMET folks in anyway that would be helpful to you. |
Hi @rmcd-mscb thanks for looping in Steve, Parker, and Jacob. After talking everything over with our stream temperature modeling team yesterday, we decided to reach out to Gridmet, so I sent John Abatzoglou and Katherine Hegewisch an email yesterday. I included some cleaned up figures that only showed the weather data and the raw Gridmet (gridMET point interpolation) data. The figures were selected to show:
Here are the three figures I sent, along with the text I included to explain what we were seeing:
Both John Abatzoglou and Katherine Hegwisch responded today, with some helpful information. John dig some digging into the Wunderground precipitation data to which I had compared the Gridmet data. By inspecting the hourly Wunderground data, he noticed that Wunderground does not accurately report daily precipitation totals, which was unknown to both him and to our team:
He also had some thoughts about the tmin data offset:
And noted that he and his team would look into the February 2016 data:
Katherine then chimed in, and provided some very helpful information about a previous user's experience reading the Gridmet NetCDF files using
I've downloaded new weather data (from NOAA), which I will use for updated comparisons. I am also going to read in the Gridmet data using I'll add new plots here as soon as I have them put together. |
Just a quick note to keep things up-to-date here. John Abatzoglou fixed the February 2016 data, so that issue has been resolved, which is great 👍. John's hunch about the apparent offset in prcp data being due to the faulty Wunderground data was correct. The NOAA weather data I downloaded aligns with the 2015 gridMET prcp data 👍. However, we are still seeing some temporal offsets in the updated 2016 data 👎. I've sent some updated plots to John and Katherine with some additional discussion and questions for them. Hopefully we can get to the bottom of the offset issue shortly. |
Awesome work, @hcorson-dosch -- seems like we're getting close! |
Another quick update: since I last reported here John Abatzoglou fixed the temporal offsets in the updated 2016 tmax and prcp data. Following that fix, I downloaded the raw gridMET netCDF files for each year between 1980 and 2019. I also downloaded NOAA weather data. Comparisons of the gridMET and NOAA data show that the gridMET tmax and tmin data in February 2017 (and possibly in part of February 2019) deviate from recorded weather data -- similar to the deviations we'd observed in February 2016. I reached out to John Abatzoglou about these discrepancies. He is in the midst of transitioning their systems, but will fix those variables when the systems are stable again - likely later this month or in August. Once that data is updated we'll be ready to pull and process the 1980-2019 gridMET data and use it as a driver for a new SNTemp run. |
Copying issue from delaware-water-temp repo.
The new driver data (2017 - 2019) that was appended (# onto old drivers (1980-2016) seems to be shifted slightly higher for both tmin and tmax.
Less obvious is there is an issue with the precipitation. Elevated annual precipitation in 2018 but not shifted up like tmin / tmax
The text was updated successfully, but these errors were encountered: