PAS_Rivers.qmd

---
title: "Analysis of ice Cycle Variables for Northern Rivers in Canada: Peace, Athabasca, and Slave"
output:
  quarto::html_document:
    output_dir: docs
editor: visual
---

## Analysis of ice Cycle Variables for Northern Rivers in Canada: Peace, Athabasca, and Slave 

by Jacqui Levy

#### Scientists can now analyze a river’s ice cycle using both historical and current data from the Water Survey of Canada database. Newly developed functions are available and applicable at scale on any river in Canada with seasonal ice coverage, and are demonstrated here. These functions grant scientists and watershed managers the tools to conduct reproducible research at scale, and enable data-driven water use decisions.

#### The following is an analysis of river ice regimes within the Peace-Athabasca Delta (PAD), using these newly developed functions.

Note: These functions are undergoing unit-testing and being developed into an open-source R package for public use. 

### What is an ice cycle and why is it important?

River ice is a major component of the hydrological cycle for northern rivers like the Peace, Slave, and Athabasca Rivers. These rivers experience thick ice coverage every winter, and a predictable seasonal spring-break up season called freshet. This ice-break up is linked to key ecological processes, such as overland flooding. This flooding replenishes nearby wetlands with water and sustains rich habitat for a biodiverse ecosystem.

Climate change is causing broad spatial and temporal shifts in river ice regimes across Canada, causing harmful downstream impacts to ecosystems and wildlife. Understanding how river ice cycles are shifting is key to building adaptive watershed management plans in a changing environment.  

### What statistical analysis was used and why? 

I analyzed trends in the Peace-Athabasca watershed for the following variables over the past 60 years:

1) The timing of the onset of freshet 

2) The timing of freeze-up and break-up dates

3) The length of continuous ice coverage each year 

To understand changes in these variables, a Mann-Kendall statistical test was used. A Mann-Kendall test is a non-parametric test that detects a monotonic trend (upwards or downwards) in the data. 

Analyzing hydrological data (and other time-series data) is challenging as the daily or monthly observations are not independent of one another. They experience serial autocorrelation: a river's flow today will be pretty similar to its flow tomorrow, and these observations do not meet the Mann-Kendall test's assumption of independence. 

However, yearly variables like annual ice coverage can be assumed to be independent of one another. For example, the day a river froze this year does not depend on which day it froze last year. Yearly variables such as these are frequently used in a Mann-Kendall trend analysis.  

### Packages and Functions:

A set of functions were developed to calculate these ice variables: https://github.com/Jacqui-123/EFlows-Project/blob/main/Eflows_FUNCTIONS.R

Open-source hydrological data from the Water Survey of Canada: 
https://www.canada.ca/en/environment-climate-change/services/water-overview/quantity/monitoring/survey.html 

Open-sourced R packages: tidyverse, tidyhydat, zoo, lubridate, ggplot, MannKendall, dataRetrieval 

Source code for this project with complete tidying workflow: https://github.com/Jacqui-123/EFlows-Project/blob/main/6_PAS_Rivers.Rmd 

### Data Processing Methods:

The following is a technical roadmap of how to use these new functions to calculate ice cycle variables for multiple stations and years. This method is scalable and works for multiple stations. 


```{r, echo=FALSE, include = FALSE, warning= FALSE}
library(IHA)
library(tidyverse)
library(tidyhydat)
library(zoo)
library(lubridate)
library(ggplot2)
library(dataRetrieval)
library(timetk)
library(trend)
library(forecast)
library(Kendall)
library(knitr)
library(rmarkdown)
library(quarto)

#install.packages("IHA", repos="http://R-Forge.R-project.org")
```


```{r, include = FALSE, echo= FALSE}
library(tidyverse)
library(tidyhydat)
library(zoo)
library(lubridate)
library(caTools)
library(dataRetrieval)

# Various functions for calculating: IHA, Percent Change, and ice Variables 
# Created by Jacqui Levy


#######PART 1 - MISC FUNCTIONS#####

# MISSING YEARS FUNCTION

  calc_missing_yrs <- function(df, Date) {
  #Find out if there are missing years in the data set. Will return "false" if there are no missing years, or a df of missing years. 
  #Date should be in yyyy-mm-dd, col title is "Date"
  #cal year not wy 
  years <- format(df$Date, "%Y")
  unique_years <- unique(years)
  all_years <- seq(min(unique_years), max(unique_years), by = 1)
  missing_years <- setdiff(all_years, unique_years)
  
  View(missing_years) #should be zero
  any(missing_years) #says if there is anything in the list
  print(missing_years)
  }
  
  
#RLE FUNCTION
  

  calc_rle <- function(df) {
    #function to remove years that have > 14 consecutive NA values (ie 14 days in a row with no data) 
    #and return the original df, without the offending years
    na_rows <- with(rle(is.na({{df}}$Value)), rep(values & lengths > 14, lengths))
    yearstoremove <- unique({{df}}$waterYear[na_rows])
    output <- {{df}}[!{{df}}$waterYear %in% yearstoremove, ]
    return(output)
  }
  
  
#DAY OF THE WATER YEAR FUNCTION
  #Date should be in yyyy-mm-dd

  calc_day_of_wyear <- function(data){
    #function sequences by number of days in each water year
    
    df <- data.frame(waterYear = character(), sequence = character())
    
    for (i in unique({{data}}$waterYear)){
      df_subset <- {{data}}[{{data}}$waterYear == i,]
      days <- seq(1:nrow(df_subset))
      temp_df <- data.frame(waterYear = i, day_of_year = days)
      df <- rbind(df, temp_df)
    }
    df2 <- cbind(df, {{data}}) 
    df3 <- df2[, !duplicated(colnames(df2))]
    
    return(df3)
  }


#######PART 2 - IHA VARIABLES#####
  

#IHA FUNCTION
#Date must be in dd-mm-yyyy format - use mutate(Date = format(Date,"%d-%m-%Y"))

  
  calc_IHA <- function(data){
    
    flow_data <-  {{data}} %>%
      select(Date, Value)
    
    flow_data <- zoo(flow_data$Value, order.by = as.Date(as.character(flow_data$Date), format = "%d-%m-%Y"))
    ## Run IHA analyses
    group1_output <- group1(flow_data, year = "water", FUN = median)
    group2_output <- group2(flow_data, year = "water", mimic.tnc = TRUE)
    group3_output <- group3(flow_data, year = "water", mimic.tnc = FALSE)
    group4_output <- group4(flow_data, year = "water")
    group5_output <- group5(flow_data, year = "water")
    
    ## Convert outputs
    group1_output <- as.data.frame(group1_output)
    group2_output <- group2_output[,-1]
    group3_output <- as.data.frame(group3_output)
    group4_output <- as.data.frame(group4_output)
    group5_output <- as.data.frame(group5_output)
    
    #to deal with one less row for group4 variables 
    if (nrow(group4_output) < nrow(group1_output)) {
      group4_output <- group4_output %>%
        add_row()
    }
    ## Create output dataframe 
    IHA_output <- bind_cols(list(group1_output, group2_output, group3_output, group4_output, group5_output))
    
    #make the years the column names instead of the rownames
    IHA_output <- tibble::rownames_to_column(IHA_output, "Year") 
    
  }

  
#######PART 3 - ice VARIABLES#####
  
#Used to calculate the freeze-up dates, ice break-up dates, and continuous ice coverage
#df must have col names: "day_of_year", "Value", "Symbol", "Date", "waterYear"
#Date must be in yyyy-mm-dd format for all ice variables functions

#GROUP 1 FUNCTION - ice COVER 
  
  Group_1_ice_cover <- function(data) {
    #function returns a df for the length of ice coverage, per water year
    #works with one station, multiple years
    #for multiple stations, split-apply-combine station
    lst <- list()
    
    for (i in unique({{data}}$waterYear)) {
      
      length_B_date  <- max(rle({{data}}$Symbol[{{data}}$waterYear == i] == "B")[[1]]) 
      #append each value to a list
      length_B_date <- length_B_date - 1
      lst[[i]] <- length_B_date
    }
    
    ice_coverage_wy <- data.frame(waterYear = names(lst), ice_coverage_wy = unlist(lst))
    rownames(ice_coverage_wy) <-NULL
   
    return(ice_coverage_wy)
    
  }
  

#GROUP 2 FUNCTION - FREEZE AND THAW DATES
  
  Group_2_freeze_thaw <- function(data) {
    #This function calculates the freeze and thaw dates and their flow values, using the longest consecutive run of "B" Symbols in the df
    
    start_date_lst <- list()
    end_date_lst <- list()
    start_flow_lst <- list()
    end_flow_lst <- list()
    start_doy_lst <- list()
    end_doy_lst <- list()
    
    for (i in unique({{data}}$waterYear)){
      
      df_subset <- {{data}}[{{data}}$waterYear == i,]
      
      #calc rle for B symbol
      rle_m = rle(df_subset$Symbol == "B")
      
      #find index for max run of B symbols
      max_run_index <- which.max(rle_m$lengths)
      
      #find end start and index of max run of B symbols
      end <- cumsum(rle_m$lengths)[max_run_index]
      start <- end - rle_m$lengths[max_run_index] +1
      
      #find date at end, start index of the max run 
      date_end = df_subset$Date[end]
      date_start = df_subset$Date[start]
      
      #find flow at end, start index
      flow_end = df_subset$Value[end]
      flow_start = df_subset$Value[start]
      
      #find doy at end, start index
      doy_end = df_subset$day_of_year[end]
      doy_start = df_subset$day_of_year[start]
      
      #append dates to the list
      start_date_lst[[i]] <- date_start
      end_date_lst[[i]] <- date_end
      
      #append flows to the list
      start_flow_lst[[i]] <- flow_start
      end_flow_lst[[i]] <- flow_end
      
      #append doy to the list
      start_doy_lst[[i]] <- doy_start
      end_doy_lst[[i]] <- doy_end
      
    }
    
    Freeze_Date <- as.Date(unlist(start_date_lst))
    Thaw_Date <- as.Date(unlist(end_date_lst))
    Flow_Freeze <- unlist(start_flow_lst)
    Flow_Thaw <- unlist(end_flow_lst)
    Freeze_DOY <-unlist(start_doy_lst)
    Thaw_DOY <- unlist(end_doy_lst)
    
    df <- cbind.data.frame(Freeze_Date,Freeze_DOY,Flow_Freeze,Thaw_Date,Thaw_DOY, Flow_Thaw)
    
    ice_coverage_dates_flow <- rownames_to_column(df, "waterYear")
    return(ice_coverage_dates_flow)
    
  }
  
  
#GROUP 3 FUNCTION - ONSET OF FRESHET
  
  
  Group_3_freshet <- function(data) {
    index <- 0
    f_index <- 16
    date_lst <- list()
    flow_lst <- list()
    stn_nu <- list()
    doy_lst <- list()
    
    for (i in unique({{data}}$waterYear)) { #first loop
      #subset data by year, resetting at each year
      index = 0  
      f_index = 16 
      df_subset <- {{data}}[{{data}}$waterYear == i,]
      df_subset <- df_subset %>%
        #data tidying:delete dates before Feb 12, so rolling mean calc starts on March 1
        mutate(Date = as.Date(Date)) %>%
        #filter(month(Date) %in% c(3,4,5,6))
        mutate(new_col = format(Date,"%m-%d")) %>%
        filter(month(Date) >= 2 & month(Date) < 7) %>% 
        filter(!(new_col %in% c("02-01", "02-02", "02-03", "02-04", "02-05", "02-06", "02-07", "02-08", "02-09", "02-10", "02-11")))
      
      #calc rolling 16 day mean
      rollmn <- rollmean(df_subset$Value, k = 16, width = 16)
      #rollmn <- as.data.frame(rollmn)
      
      for (j in rollmn) { #second loop
        #increment index
        index = index + 1
        f_index = f_index + 1
        
        #find rolling mean value at the index and multiply by 1.5
        rollmnvalue <- rollmn[index] #roll mean = 0.81
        rollmnvalue1.5 <- rollmnvalue*1.5 #1.215
        
        #get the flow value at that index
        flowvalue <- df_subset$Value[f_index] #flow = 0.654
        
        if (flowvalue > rollmnvalue1.5 & f_index < 123 ) { #third loop
          #append date, flow value to a list using the index numbers
          dt <- df_subset$Date[f_index]
          fl <- df_subset$Value[f_index]
          st <- df_subset$STATION_NUMBER[f_index]
          doy <- df_subset$day_of_year[f_index]
          # print(dt) 
          #  print(fl)
          #  print("end")
          date_lst[[i]] <- dt 
          flow_lst[[i]] <- fl 
          stn_nu[[i]] <- st
          doy_lst[[i]] <- doy
          Freshet_Date <- as.Date(unlist(date_lst))
          Freshet_Flow <- unlist(flow_lst)
          Station_Number <- unlist(stn_nu)
          Freshet_Dayofyear <- unlist(doy_lst)
          df <- cbind.data.frame(Station_Number, Freshet_Date, Freshet_Dayofyear, Freshet_Flow)
          break
        }
        
      }
      
    }
      Freshet_dates_flow <- rownames_to_column(df, "waterYear")
      return(Freshet_dates_flow)
  }  
  
  
#######PART 4 - PERCENT CHANGE#####
  
  
  # Calculate percent change

  calc_percent_change <- function(data_pre, data_post, stn, year_col){
    
    #calculates percent change from IHA pre and post stns, once tidying is complete. 
    #stn = i if looping through multiple stns
    
    years_post <- {{data_post}}[[year_col]] 
    
    IHA_medians_pre <- {{data_pre}} %>%
      filter(STATION_NUMBER == {{stn}}) %>% 
      select(-c(STATION_NUMBER))
    
    IHA_pst <- {{data_post}} %>%
      filter(STATION_NUMBER == {{stn}})%>% 
      select(-c(STATION_NUMBER, {{year_col}}))
    
    IHA_pre_expand_rows <- IHA_medians_pre[rep(1, nrow(IHA_pst)),]
    
    output <- ((IHA_pst - IHA_pre_expand_rows) /IHA_pre_expand_rows ) * 100
    
    percent_change <- merge(years_post, output, by.x = 0, by.y = 0) %>%
      rename("Year" = "x")
  }
  
#######PART 5 - MANN KENDALL#####
  
  #Perform mann-kendall test for each "STATION_NUMBER" in a dataframe
  #parameter = col_variable to calculate MK test (must be annual variable)
  #start = start year
  
  
  calc_MK <- function(data, parameter, start) { 
    plst <- list()
    stn_list <- list()
    for (i in unique({{data}}$STATION_NUMBER)) {
      #subset the data by stn number
      df_subset <- {{data}}[{{data}}$STATION_NUMBER == i,]
      #get the stn number for each iteration and append to a list
      #stn_num <- i
      stn_list[[i]] <- i
      
      col_var <- df_subset %>% pull({{parameter}}) 
      #df subset to a ts object and run MK analysis 
      TS <- ts(col_var, frequency = 1, start = c({{start}}, 1))
      MK <- MannKendall(TS)
      #append pvalue to a list 
      pval <- as.numeric(MK$sl)
      plst[[i]] <- pval
      #unlist, rename etc
      df <- as.data.frame(unlist(plst))
      names(df) <- "P_Value"
      final_MK <- rownames_to_column(df, "STATION_NUMBER") %>% 
        mutate(Interpretation = case_when(P_Value <= .05 ~ "Signficant", .default =  "Not Significant")) 
    }
    return(final_MK)
  }  
  
  
#######PART 6 - MISC FUNCTIONS NO LONGER USING BUT STILL USEFUL#####
  
 # Find missing years
  

  #Find out if there are missing years in the data set. Will return "false" if there are no missing years, or a df of missing years. Date should be in yyyy-mm-dd
  
  yrs_missing <- function(df, Date) {
    years <- format(df$Date, "%Y")
    unique_years <- unique(years)
    all_years <- seq(min(unique_years), max(unique_years), by = 1)
    missing_years <- setdiff(all_years, unique_years)
    
    View(missing_years) #should be zero
    any(missing_years) #says if there is anything in the list
    print(missing_years)
  }

  
```

#### 1) First, retrieve water flow data from the Water Survey of Canada, and add a column for "water year" and a column for "day of the water year".

A "water year" or hydrological year, is measured from October 1-September 30th, and measures the cycle of a watershed from its lowest baseflow point. 

A "day of the year" or Julian day, is a continuous count of days from the beginning of the water year. These methods are used in hydrology to standardize the reporting of water statistics and forecasting.

```{r}
#extract stations from the Water Survey of Canada database
stn_all <- tidyhydat::hy_daily_flows(station_number = c('07BE001','07DA001', '07HA001','07NB001', '07KC001', '07AD002','07DA001', '07AA002')) 

stn_all <- stn_all %>%
group_by(STATION_NUMBER) %>%
#make a complete set of days for each year 
tidyr::complete(Date = seq.Date(as.Date("1920/10/1"), as.Date("2022/09/30"), by="day"))

#add water year
stn_all <- addWaterYear(stn_all) %>%
mutate(waterYear = as.character(waterYear))

#add day of the year from "Eflows_FUNCTIONS.R" function 
stn_all_split <- split(stn_all, stn_all$STATION_NUMBER )
stn_all_split_doy <- lapply(stn_all_split, calc_day_of_wyear)
stn_all <- bind_rows(stn_all_split_doy, .id = "STATION_NUMBER")

#rm(stn_all_split, stn_all_split_doy)
#View(stn_all_split)

```
 
 
#### 2) Tidy data: detect and resolve anomalies and deal with missing data. 

Use the functions built for this project to delete years with more than 14 consecutive days of missing data, and the tidyverse package to estimate missing observations using the last existing observation.
 
```{r, results = 'hide'}

stn_cln <- stn_all %>%
  mutate(Date = as.Date(Date)) %>%
  filter(waterYear > 1970 & waterYear < 2021) 

#delete years that have >14 days missing data using calc_rle function from functions I built for this project
lst_stns_cln <- split(stn_cln, stn_cln$STATION_NUMBER )
lst_stns_cln_rle <- lapply(lst_stns_cln, calc_rle)

#unlist the station numbers
stns_ready <- bind_rows(lst_stns_cln_rle, .id = "STATION_NUMBER") 

#reformat date and fill remaining missing values with preceding values
stns_daymonthyear <- stns_ready %>%
  mutate(Date = format(Date,"%d-%m-%Y")) %>%
  group_by(STATION_NUMBER) %>%
  fill(Value, .direction = 'downup')

```


```{r, echo=FALSE, include = FALSE}
# IHA calcs - not included in ice cycle rpubs document

#make a list of dfs- split by station number
lst_stns <- split(stns_daymonthyear, stns_daymonthyear$STATION_NUMBER)

#apply calc_IHA function to all dfs in the list
lst_stns_IHA <- lapply(lst_stns, calc_IHA)

#Combine all IHA outputs into one df (unnlist the dfs)
#make a "watershed" column for graphing later
IHA <- bind_rows(lst_stns_IHA, .id = "STATION_NUMBER")
View(IHA)

#write.csv(IHA, "IHAlongterm.csv")
```


```{r, echo=FALSE, include = FALSE}
#data prep for ice variables
#change the date format, and fill any values that are missing with preceding values 
stns_yearmonthday <- stns_ready %>%
  mutate(Date = as.Date(Date)) %>%
  group_by(STATION_NUMBER) %>%
  fill(Value, .direction = 'downup') 

```
 

#### 3) Calculate the ice cycle variables using the ice functions

These functions calculate the timing of the onset of freshet, the freeze and thaw dates, and the days of total continuous ice coverage. Use the split-apply-combine method to loop over a list of data frames and calculate all variables for multiple stations.

See source code for how to combine the output dataframes, especially if there are an uneven number of years for each station.
```{r} 
#first make a list of dfs- split by station number
lst_stns <- split(stns_yearmonthday, stns_yearmonthday$STATION_NUMBER )

#apply ice variable function to all dfs in the list, by station 
stn_g1 <- lapply(lst_stns, Group_1_ice_cover)
stn_g2 <- lapply(lst_stns, Group_2_freeze_thaw)
stn_g3 <- lapply(lst_stns, Group_3_freshet)

#Unnlist the outputs
output1 <- bind_rows(stn_g1, .id = "STATION_NUMBER") 
output2 <- bind_rows(stn_g2, .id = "STATION_NUMBER")
output3 <- bind_rows(stn_g3, .id = "STATION_NUMBER")

test <- stn_all %>%
  filter(STATION_NUMBER == "07AA002")

```


```{r, echo=FALSE, include = FALSE}

#Deal with each output df having different lengths and not being able to combine them

#find max and min years for each result df
min_yro1 <- min(output1$waterYear)
max_yro1 <- max(output1$waterYear)

min_yro2 <- min(output2$waterYear)
max_yro2 <- max(output2$waterYear)

min_yro3 <- min(output3$waterYear)
max_yro3 <- max(output3$waterYear)

#expand data set to include all years for all output dfs
output1_ex <- output1 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro1:max_yro1) #fills NA for values for missing years 

output2_ex <- output2 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro2:max_yro2) #fills NA for values for missing years 

output3_ex <- output3 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro3:max_yro3) #fills NA for values for missing years 

df_ice <- cbind(output1_ex, output2_ex, output3_ex)

df_ice_final <- df_ice[!duplicated(as.list(df_ice))] #removed duplicated columns

View(df_ice_final)

#write.csv(df_ice_final, "ice.csv")

```
 
 
#### 4) Mann-Kendall Trend Analysis: Use the MK function created for this project 

Tests were performed at an alpha = .05, a conservative level to ensure a low probability of a type I error.


#### __Results table for select stations:__

   
```{r, echo=FALSE, include = FALSE}
#Mann-Kendall Trend Analysis - IHA Variables 

#Expand IHA_post df to have all years for all stations so they can be used in the MK function

for (i in unique(IHA$STATION_NUMBER)) {
  min_yr <- min(IHA$Year)
  max_yr <- max(IHA$Year)
  IHA_post_expand <- IHA %>%
    mutate(Year = as.integer(Year)) %>%
    tidyr::complete(STATION_NUMBER, Year = min_yr:max_yr)
  return(IHA_post_expand)
}

IHA_post_expand <- IHA_post_expand %>%
rename("One_Day_Max" = "1 Day Max",
           "One_Day_Min" = '1 Day Min',
           "Three_Day_Max" = "3 Day Max",
           "Three_Day_Min" = "3 Day Min",
           "Seven_Day_Min" ="7 Day Min",
          "Seven_Day_Max" = "7 Day Max",
          "Thirty_Day_Max" = "30 Day Max",
          "Thirty_Day_Min" = "30 Day Min",
          "Ninety_Day_Max" = "90 Day Max",
          "Ninety_Day_Min" = "90 Day Min",
          "High_pulse_number" = "High pulse number",
          "Low_pulse_number" = "Low pulse number",
          "High_pulse_length" = "High pulse length",
          "Low_pulse_length" = "Low pulse length")

```


```{r, echo=FALSE, include = FALSE}
#Use calc_Mk to do a MK for all Stations for selected variables for IHA

Reversals <- calc_MK(IHA_post_expand, parameter = Reversals, start = 1970) %>%
  rename("Result_Reversals" = 'P_Value')

One_Day_Min <- calc_MK(IHA_post_expand, parameter = One_Day_Min , start = 1970) %>% rename("Result_One_Day_Min" = 'P_Value')

One_Day_Max <-  calc_MK(IHA_post_expand, parameter = One_Day_Max , start = 1970) %>% rename("Result_1_Day_Max" = 'P_Value')

Seven_Day_Min <- calc_MK(IHA_post_expand, parameter = Seven_Day_Min , start = 1970) %>% rename("Result_7_Day_Min" = 'P_Value')

Seven_Day_Max <-  calc_MK(IHA_post_expand, parameter = Seven_Day_Max , start = 1970) %>% rename("Result_7_Day_Max" = 'P_Value')

Thirty_Day_Min <- calc_MK(IHA_post_expand, parameter = Thirty_Day_Min , start = 1970) %>% rename("Result_30_Day_Min" = 'P_Value')

Thirty_Day_Max <-  calc_MK(IHA_post_expand, parameter = Thirty_Day_Max , start = 1970) %>% rename("Result_30_Day_Max" = 'P_Value')

High_pulse_number <-  calc_MK(IHA_post_expand, parameter = High_pulse_number , start = 1970) %>% rename("Result_High_pulse_number" = 'P_Value')

Low_pulse_number <-  calc_MK(IHA_post_expand, parameter = Low_pulse_number , start = 1970) %>% rename("Result_Low_pulse_number" = 'P_Value')

High_pulse_length <-  calc_MK(IHA_post_expand, parameter = High_pulse_length , start = 1970) %>% rename("Result_High_pulse_length" = 'P_Value')

Low_pulse_length <-  calc_MK(IHA_post_expand, parameter = Low_pulse_length , start = 1970) %>% rename("Result_Low_pulse_length" = 'P_Value')

result_MK_IHA <- cbind(Reversals, One_Day_Min, One_Day_Max, Seven_Day_Min, Seven_Day_Max, Thirty_Day_Min, Thirty_Day_Max, High_pulse_number, Low_pulse_number, High_pulse_length, Low_pulse_length )

View(result_MK_IHA)

#write.csv(result_MK_IHA, "IHAresult.csv")

```

 
```{r, echo = FALSE, include = FALSE}
DY <- calc_MK(df_ice_final, parameter = Freshet_Dayofyear, start = 1970) %>%
  rename("Freshet_Onset" = 'P_Value' )

icecover <- calc_MK(df_ice_final, parameter = ice_coverage_wy, start = 1970)%>%
  rename("ice_cover" = 'P_Value' )

frdoy <- calc_MK(df_ice_final, parameter = Freeze_DOY, start = 1970) %>%
  rename("Freeze_doy" = 'P_Value' )

thawdoy <- calc_MK(df_ice_final, parameter = Thaw_DOY, start = 1970)%>%
  rename("Thaw_doy" = 'P_Value' )

result <- cbind(DY, icecover, frdoy, thawdoy)
MK_result_final <- result[!duplicated(as.list(result))] #removed duplicated columns
MK_result_final <- MK_result_final %>% mutate_if(is.numeric, ~round(., 3)) #round 
View(MK_result_final)

```


```{r, echo = FALSE}
#show a few results
kable(MK_result_final[3:6,], caption = " ") 

```

```{r, echo = FALSE, include = FALSE, message=FALSE}
#new for this rpubs document

#HA001
stn_07HA001 <-  df_ice_final %>%
  filter(STATION_NUMBER == "07HA001") %>% 
  na.omit()

TS_07HA001  <- ts(stn_07HA001$Freshet_Dayofyear, frequency = 1, start = c(1970,1))

MannKendall(TS_07HA001)
sens.slope(TS_07HA001)
pettitt.test(TS_07HA001)

#AA002
stn_07AA002 <- df_ice_final %>%
    filter(STATION_NUMBER == "07AA002") %>%
  na.omit()

TS_07AA002  <- ts(stn_07AA002$ice_coverage_wy, frequency = 1, start = c(1970,1))
MannKendall(TS_07AA002)
sens.slope(TS_07AA002)
pettitt.test(TS_07AA002)

#AD002
stn_07AD002 <- df_ice_final %>%
  filter(STATION_NUMBER == "07AD002") %>% 
  na.omit()

TS_07AD002  <- ts(stn_07AD002$Freshet_Dayofyear, frequency = 1, start = c(1970,1))

MannKendall(TS_07AD002)
sens.slope(TS_07AD002)
pettitt.test(TS_07AD002)

#KC001
stn_07KC001 <- df_ice_final %>%
  filter(STATION_NUMBER == "07KC001") %>% 
  na.omit()

TS_07KC001  <- ts(stn_07KC001$ice_coverage_wy, frequency = 1, start = c(1970,1))

MannKendall(TS_07KC001)
sens.slope(TS_07KC001)
pettitt.test(TS_07KC001)


```
 
  
#### __Interpretation__ 

Most stations had non-significant values for these ice variables, and the Mann-Kendall test did not detect a monotonic trend in ice coverage for most variables. 

This does not mean that there have been no changes in ice regimes, but only that this test did not detect an effect at the selected level. More precise analysis of across the full watershed are needed, and will be forthcoming.

It is notable from the results table that the test detected a trend for two stations (07HA001 & 07KC001) along the Peace River, across several of the ice variables. 

The ice cycle variables for these two stations are graphed below using the lm smooth method to fit a linear trend to the data.
```{r, fig-side, fig.show="hold", out.width="50%", echo = FALSE, message= FALSE}


df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Freeze_DOY, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
  geom_line(linewidth = .75) + 
    geom_smooth(method = lm, se = F) +
  #annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
 #annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") +    #07BK007
  # annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
  #geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(0, 150) +
theme_classic() +
ggtitle("Timing of Winter ice Freeze") +
  xlab(" ") +
  ylab("Day of the Water Year") +
  theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal)  +
  theme(legend.position = "None")


df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Thaw_DOY, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
  geom_line(linewidth = .75) + 
    geom_smooth(method = lm, se = F) +
  #annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
 #annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") +    #07BK007
  # annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
  #geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(115, 250) +
theme_classic() +
ggtitle("Timing of Spring Thaw") +
  xlab(" ") +
  ylab("Day of the Water Year") +
  theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal)  +
  theme(legend.position = "None")


```
  
```{r, figures-side, fig.show="hold", out.width="50%", echo = FALSE, message= FALSE}

df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Freshet_Dayofyear, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
  geom_line(linewidth = .75) + 
  geom_smooth(method = lm, se = F) +
   #annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
 #annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") +    #07BK007
  # annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002 
  #geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(100, 275) +
theme_classic() +
ggtitle("Timing of Freshet") +
  xlab(" ") +
  ylab("Day of the Water Year") +
  theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal)  + 
  theme(legend.position = "None")


df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= ice_coverage_wy, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
  geom_line(linewidth = .75) + 
    geom_smooth(method = lm, se = F) +
  #annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
 #annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") +    #07BK007
  # annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
  #geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(0, 250) +
theme_classic() +
ggtitle("Days of Continuous ice Coverage ") +
  ylab("Day of the Water Year") +
    xlab(" ") +
  theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal)  +
  theme(legend.position = "None")
```


From the graphs above, it appears that the Peace River is freezing later in the year, and thawing earlier, with less total days of continual ice coverage. This indicates a shift to a warmer and drier regime. Other studies have found similar results, but a more thorough investigation would be needed to draw any conclusions from the data here.

It would also be useful to conduct a Mann-Kendall test on the freeze, thaw, and freshet flows measurements to see if these regime shifts might be caused by lower flows, or if there are other mechanistic pathways.

Conversely, the timing of freshet appears to be later in the year. This could be due to lower or more variable flows, making it difficult for the function's algorithm to detect the onset of freshet at the precision is was built for. Perhaps a different algorithm is needed for different watersheds, or perhaps we are seeing unprecedented changes in watershed ice regimes. 

#### __Conclusion__

This document describes the workflow and methodology for using ice variable functions to assess trends in ice regimes for  rivers across Canada. 

The Canadian River ice Database Project previously calculated some of these ice cycle variables, however, they were only calculated for approximately 300 water stations, and only through 2015. These newly developed functions allow scientists to calculate present-day data, in addition to historical data, for any of the 1000+ water stations from the Water Survey of Canada.  

The next step in this project is to create an accessible, open-source R package, and apply this analysis broadly for approximately 900+ reference stations within Canada. My hope is that scientists and watershed managers will be able to use this tool in conjunction with climatic data and models, to better assess and predict broad-scale watershed changes. 


#### __References__ 


Beltaos, S., & Bonsal, B. (2021). Climate change impacts on Peace River ice thickness and implications to ice-jam flooding of Peace-Athabasca Delta, Canada. Cold Regions Science and Technology, 186, 103279. https://doi.org/10.1016/j.coldregions.2021.103279

Beltaos, S. (2018). Frequency of ice-jam flooding of Peace-Athabasca Delta. Canadian Journal of Civil Engineering, 45(1), 71–75. https://doi.org/10.1139/cjce-2017-0434

de Rham, L., Dibike, Y., Beltaos, S., Peters, D., Bonsal, B., & Prowse, T. (2020). A Canadian River ice Database from the National Hydrometric Program Archives. Earth System Science Data, 12(3), 1835–1860. https://doi.org/10.5194/essd-12-1835-2020