Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure day-specific Kd feeder data for GLM #3

Open
jordansread opened this issue Mar 24, 2017 · 34 comments
Open

Configure day-specific Kd feeder data for GLM #3

jordansread opened this issue Mar 24, 2017 · 34 comments
Assignees

Comments

@jordansread
Copy link
Member

No description provided.

@gjahansen2
Copy link
Collaborator

Seasonal pattern plus annual variability. How do we balance being true to the observed data with tracking noise?

@jordansread
Copy link
Member Author

My suggestion is that we create a characteristic seasonal curve for the lake and do a simple parameterization of that curve that stretches or amplifies it to fit the year. Maybe we could end up parameterizing it with just the annual median/mean, or perhaps we would need a second parameter for timing.

@lawinslow
Copy link
Member

Ok, quick crack at this. I'm not seeing any crazy seasonal variability. Generally looks like Secchi generally declines throughout the year, but I don't see a big clear water phase "bump" you sometimes see. Overall, we also see a bit of an increase towards the end of the year.

Looked at seasonal variability for pre and post 1997 (the year which splits the data into equal halves).
image

image

My vote is to use the empirical seasonal variability. Basically, use the median of each bi-weekly bin for the overall pattern. Then use a multiplier to bump it down early in the season and bump it up later in the season (tie it together with a winter linear interp, doesn't matter much in the winter). This will amplify the annual change a bit, which lines up with the trends we see (which are steeper in the clearer periods, and muted in the less clear periods).

image

Trend is somewhat harder. Do we want to capture the big bump we see in 1997? If so, we can't just apply a monotonic trend. We could use a more empirical approach where we try to re-construct the overall long-term variability and use the seasonal variability reconstruction as above.

image

Thoughts?

@gjahansen2
Copy link
Collaborator

I don't want to impose a linear trend over time (year to year) - the peak in 1997 is real I think (many reports from that time reference the unusually high water clarity). There seems to also be a peak in 2013.

As for seasonal patterns, I agree that the general trend appears to be the same from year to year (declines later in the year). I am not sure how much creedence we should put in the seasonal patterns early and late in the year since they are so infrequently sampled. So I don't want to tie it too strongly to the bi-weekly data in those parts of the year where we only have one or two sampling dates.

So Jordan's suggestion of a single seasonal curve, parameterized by the annual median or mean seems reasonable to me. But then how do we link Dec 31 of one year to Jan 1 of the next? Or does it not matter in ice-covered periods, and we can just have clarity jump up or down to the year-specific pattern after the ice goes off?

@lawinslow
Copy link
Member

I am not sure how much creedence we should put in the seasonal patterns early and late in the year since they are so infrequently sampled.

Yeah, sure, the far edges are uncertain, but the lake seems to have reasonable coverage much of the open-water season.

So Jordan's suggestion of a single seasonal curve, parameterized by the annual median or mean seems reasonable to me.

Question here though is what is the parameterizable "single seasonal curve"?

But then how do we link Dec 31 of one year to Jan 1 of the next? Or does it not matter in ice-covered periods, and we can just have clarity jump up or down to the year-specific pattern after the ice goes off?

I'm not worried about a dec 31 to jan 1 jump. Won't affect much that time of year. We can pick something reasonable for when there is no observation coverage. I think high during that time makes sense. Secchi of 5 or so.

@jordansread
Copy link
Member Author

The curve I was referring to is like a somewhat complex shape (maybe some loess or something) but boiled down to two parameters: 1) the mean, and 2) the amplitude. A third would be stretching vertically, but that seems harder since you need to know how long the season would be a priori. For mean and amplitude, it is akin to stretching a shape vertically or translating it vertically (amplitude and mean respectively).

@lawinslow
Copy link
Member

Ok, you want to take first crack at that? Seems like you have a clear vision.

@gjahansen2
Copy link
Collaborator

what about some kind of moving average?

@jordansread
Copy link
Member Author

A moving average of that year's data, or the whole thing flattened to DOY?

@jordansread
Copy link
Member Author

Was thinking something simple like this that we can adjust w/ two params:
image

@lawinslow
Copy link
Member

Hmmm, do we not want to fit to that portion of the year? There are very few obs then.

@jordansread
Copy link
Member Author

or something like this that flattens out to an early spring mean and flattens to a fall mean:
image

difference being what I can quickly code vs draw in ppt...

@lawinslow
Copy link
Member

Yeah, got an equation for that? The challenging part there can be finding something well behaved with the right number of parameters. (hence my advocacy for a more empirical approach).

@jordansread
Copy link
Member Author

empirical sounds good here, but often we may not have enough secchi data and may need to pick a curve off of the shelf (e.g., dystrophic, eutrophic) and apply it.

@jordansread
Copy link
Member Author

The first plot is just a weibull distribution

@gjahansen2
Copy link
Collaborator

gjahansen2 commented Mar 30, 2017 via email

@lawinslow
Copy link
Member

For future reference, here is the best simple model I could come up with for seasonal variability of secchi in the LTER lakes.
image

ratio is defined as median(cw_phase_secchi)/median(summer_secchi), though they aren't defined by season, I just eyed up the time window of the clear-water phase in those lakes (cw_phase - week 22:23, summer - week 27:42). Adding DOC into the linear model improves it (R^2~0.8, and has lower AIC) so there's that.

@jordansread
Copy link
Member Author

interesting...cool

@gjahansen2
Copy link
Collaborator

gjahansen2 commented Mar 31, 2017

I am reading some old (early 1990's) PCA reports about Mille Lacs. They describe Secchi as follows: "Under ice readings range from 4.5-5.5m with very low levels of algae and suspended solids. Following ice out, Secchi transparency declines quickly to 3 m or less. Transparency continues to decline, coincident with peak algae concentrations, until late August with a minimum reading of 1.3 m".

Given the paucity of under ice data, I think the shape should be something like a flat line (with high Secchi) until ice out, then a fairly steady decline from right around ice out through the end of August, then flat until ice on. The actual values of the starting and ending points could be parameterized based on observed open water medians for the year (where available). When we look at annual mean/medians, it is probably important to remove samples from winter, since they only exist for a few years and will skew the data for those years to be more clear than if only open water season was sampled. This figure has data only from April-Oct.

mille_lacs_secchi_timeseries_april_oct
.

@gjahansen2
Copy link
Collaborator

OK, I played around with some simple logistic functions, parameterized with year-specific highs and lows.
mille_lacs_secchi_seasonal_april_oct_logistic

This might be close to what we want, but I don't like how (potentially) bad measurements (or few measurements) in a year can drag the values up or down. I think a better way would be to construct a mixed-effects non-linear model of this same form, with a random year effect that lets the max and min thresholds vary by year. BUT those year specific estimates will be influenced by all measurements, so if a year is not well-sampled (or not at all), it will drag that year's estimates closer to the overall mean.
What do you think?

@gjahansen2
Copy link
Collaborator

Logistic function is just seasonal=function(x,max, min){min+((max-min)/(1+exp(.05*(x-200))))}, where I chose the rate of change as 0.05 and the mid point of transition period as 200. We could fit those to data as well if we wanted to.

@lawinslow
Copy link
Member

Yeah, I think that would be a decent approach, though my only concern would be the big jump in clarity at the new year transition. It would be a nice fusion of model and empiricism.

@jordansread
Copy link
Member Author

Maybe we apply this to a certain DOY range (100-320?) and then interpolate between the years so that the transition changes aren't abrupt?

@gjahansen2
Copy link
Collaborator

^^ this is what I was thinking too - apply the modeled curve throughout the open water season and then interpolate between ice on and ice off to get from ending value of year x to starting value of year x+1.
I fit the nonlinear mixed effects model and estimated year-specific maxima and minima. Looks like some of the years are kind of funky. The black line is the generic fit.

secchi_function_mixed_model

@gjahansen2
Copy link
Collaborator

gjahansen2 commented Apr 4, 2017

customLogitModel=function(x,max, min){min+((max-min)/(1+exp(.046*(x-192))))}

Where 0,046 and 192 were estimated from a simple nonlinear model, and max and min are estimated as year-specific random effects.

@jordansread
Copy link
Member Author

yes, this seems doable.

@lawinslow
Copy link
Member

Can someone re-create a daily Kd value based on that? I could start running the model then.

@gjahansen2
Copy link
Collaborator

gjahansen2 commented Apr 4, 2017

Are we just doing kd=secchi/something? What is the 'something' again?

Here is what Optical habitat looks like (3-year moving average) with the new seasonal pattern included (note generic parameters used for much of the 1980's but word on the street is that the Mille Lacs Band of Ojibwe might have data from that time).
mille_lacs_oha_timeseries_seasonal

@jordansread
Copy link
Member Author

I think it is 1.7/Kd

@gjahansen2
Copy link
Collaborator

ok, so we want daily values for all years, with linear interpolation between end of season in year x and beginning of season in year x+1. I can create this.

Question:
Do we want the end and beginning of years to be based on some arbitrary day numbers, or based on ice on and ice off? I just used DOY 110-310 in the figure shown above, but it seems reasonable to use the modeled ice on/ice off dates as those cutoffs. And that is not something I can incorporate at this point. Also it is perhaps a bit of a chicken vs. egg situation - if i need modeled outputs to calculate daily kd and you need daily kd to calculate modeled outputs...

@lawinslow
Copy link
Member

I'd just say arbitrary. You can be conservative if you want. I really don't think the model will be terribly sensitive to these numbers. You're right that we don't have ice on/off so it would be hard to pick them now.

@jordansread
Copy link
Member Author

I think starting out with the system you have above w/ an easy to change set of parameters that we can use to tell us how sensitive the model is to that time range (if we choose to look into it).

@gjahansen2
Copy link
Collaborator

gjahansen2 commented Apr 4, 2017

Hmm.. okay here is a stab at it. Modeled Secchi as a function of DOY from DOY 100-320, then linearly interpolated between missing values.
daily_kd_interpolated.txt

@lawinslow data are there ^^^

secchi_seasonal_interpolated

kd_time_series_interpolated

@gjahansen2
Copy link
Collaborator

`
library(ggplot2)
library(dplyr)
library(lme4)
library(reshape2)
library(readr)
library(ggrepel)
library(RColorBrewer)

#data from Heidi
mydata=read_csv("ML_secchi.csv")
mydata$date=as.Date(mydata$date, format="%m/%d/%Y")
mydata$doy=as.numeric(format(mydata$date, format="%j"))

year.medians=summarise(group_by(mydata, year), median.secchi=median(secchi_m))

#exclude data outside window used for TOHA
toha.secchi=subset(mydata, doy>100 & doy<320)
year.medians.toha=summarise(group_by(toha.secchi, year), median.secchi=median(secchi_m))
year.medians.toha$date=as.Date(paste(year.medians.toha$year,"-07-01", sep=""))

#create function for seasonal pattern
toha.secchi$week=as.numeric(format(toha.secchi$date, "%U"))
toha.secchi$year.f=factor(toha.secchi$year)

customLogitModel=function(x,max, min){min+((max-min)/(1+exp(.046*(x-192))))}
#k=0.046, midseason=192 as estimated from nls

customLogitModelGradient <- deriv(
body(customLogitModel)[[2]],
namevec = c("max", "min"),
function.arg=customLogitModel
)

#starting guesses
#secchi.nls=nls(secchi_m~customLogitModel(doy, max, min, k, mid.season), data=toha.secchi,start = c(max=4, min=1.5, mid.season=200, k=1))

Fit the model

model <- nlmer(
secchi_m ~ customLogitModelGradient(x=doy, max, min) ~
# Random effects with a second ~
(max | year.f) + (min|year.f) ,
data = toha.secchi,
start = c(max=4, min=2.3)
)

summary(model)
year.estimates=coef(model)$year.f
general.estimates=fixef(model)
write.csv(year.estimates, "year_secchi_max_min_estimates.csv", row.names=T)
write.csv(data.frame(general.estimates), "generic_secchi_max_min_estimates.csv",row.names=F)

#create time series - if year is missing or bad estimate, use generic
yearly.secchi=data.frame(year.estimates )
yearly.secchi$year=row.names(yearly.secchi)

secchi.params=data.frame(year=seq(1974,2016))
secchi.params=merge(secchi.params, yearly.secchi, by="year", all=T)
#bad estimates if min is bigger than max
secchi.params$max[secchi.params$max<secchi.params$min]=NA
secchi.params$min[secchi.params$max<secchi.params$min]=NA
secchi.params$max[is.na(secchi.params$max)]=general.estimates[1]
secchi.params$min[is.na(secchi.params$min)]=general.estimates[2]

#create data frame of secchi depths
day.list=data.frame(date= seq(as.Date("1974-03-31"), as.Date("2016-12-31"), by="1 day"))
day.list$doy=as.numeric(format(day.list$date, "%j"))
day.list$year=as.numeric(format(day.list$date, "%Y"))
daily.kd=data.frame("doy"=0, "year"=0, "date"=as.Date(0), "Secchi"=0)

for(i in 1:nrow(secchi.params))
{
this.year=subset(day.list, year==secchi.params$year[i])
open.water=data.frame(doy=seq(100,320), Secchi=customLogitModel(x=seq(100,320), max=secchi.params$max[i], min=secchi.params$min[i]), year=as.numeric(secchi.params$year[i]))
this.year=merge(this.year, open.water, by=c("doy", "year"), all=T)
daily.kd=rbind(daily.kd, this.year)
}
daily.kd=daily.kd[2:nrow(daily.kd),]

daily.kd$Secchi=na.approx(daily.kd$Secchi, na.rm=F)
daily.kd$kd=1.7/daily.kd$Secchi
write_tsv(daily.kd, "daily_kd_interpolated.txt")

ggplot()+geom_path(data=daily.kd, aes(date, kd))+theme_classic()
ggsave("kd_time_series_interpolated.png", height=5, width=8, units="in")

ggplot()+geom_path(data=daily.kd, aes(doy, Secchi, group=year, colour=factor(year)))+theme_classic()
ggsave("secchi_seasonal_interpolated.png", height=5, width=8, units="in")
`

@gjahansen2 gjahansen2 self-assigned this Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants