-
Notifications
You must be signed in to change notification settings - Fork 0
/
PAS_Rivers.qmd
860 lines (597 loc) · 32.3 KB
/
PAS_Rivers.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
---
title: "Analysis of ice Cycle Variables for Northern Rivers in Canada: Peace, Athabasca, and Slave"
output:
quarto::html_document:
output_dir: docs
editor: visual
---
## Analysis of ice Cycle Variables for Northern Rivers in Canada: Peace, Athabasca, and Slave
by Jacqui Levy
#### Scientists can now analyze a river’s ice cycle using both historical and current data from the Water Survey of Canada database. Newly developed functions are available and applicable at scale on any river in Canada with seasonal ice coverage, and are demonstrated here. These functions grant scientists and watershed managers the tools to conduct reproducible research at scale, and enable data-driven water use decisions.
#### The following is an analysis of river ice regimes within the Peace-Athabasca Delta (PAD), using these newly developed functions.
Note: These functions are undergoing unit-testing and being developed into an open-source R package for public use.
### What is an ice cycle and why is it important?
River ice is a major component of the hydrological cycle for northern rivers like the Peace, Slave, and Athabasca Rivers. These rivers experience thick ice coverage every winter, and a predictable seasonal spring-break up season called freshet. This ice-break up is linked to key ecological processes, such as overland flooding. This flooding replenishes nearby wetlands with water and sustains rich habitat for a biodiverse ecosystem.
Climate change is causing broad spatial and temporal shifts in river ice regimes across Canada, causing harmful downstream impacts to ecosystems and wildlife. Understanding how river ice cycles are shifting is key to building adaptive watershed management plans in a changing environment.
### What statistical analysis was used and why?
I analyzed trends in the Peace-Athabasca watershed for the following variables over the past 60 years:
1) The timing of the onset of freshet
2) The timing of freeze-up and break-up dates
3) The length of continuous ice coverage each year
To understand changes in these variables, a Mann-Kendall statistical test was used. A Mann-Kendall test is a non-parametric test that detects a monotonic trend (upwards or downwards) in the data.
Analyzing hydrological data (and other time-series data) is challenging as the daily or monthly observations are not independent of one another. They experience serial autocorrelation: a river's flow today will be pretty similar to its flow tomorrow, and these observations do not meet the Mann-Kendall test's assumption of independence.
However, yearly variables like annual ice coverage can be assumed to be independent of one another. For example, the day a river froze this year does not depend on which day it froze last year. Yearly variables such as these are frequently used in a Mann-Kendall trend analysis.
### Packages and Functions:
A set of functions were developed to calculate these ice variables: https://github.com/Jacqui-123/EFlows-Project/blob/main/Eflows_FUNCTIONS.R
Open-source hydrological data from the Water Survey of Canada:
https://www.canada.ca/en/environment-climate-change/services/water-overview/quantity/monitoring/survey.html
Open-sourced R packages: tidyverse, tidyhydat, zoo, lubridate, ggplot, MannKendall, dataRetrieval
Source code for this project with complete tidying workflow: https://github.com/Jacqui-123/EFlows-Project/blob/main/6_PAS_Rivers.Rmd
### Data Processing Methods:
The following is a technical roadmap of how to use these new functions to calculate ice cycle variables for multiple stations and years. This method is scalable and works for multiple stations.
```{r, echo=FALSE, include = FALSE, warning= FALSE}
library(IHA)
library(tidyverse)
library(tidyhydat)
library(zoo)
library(lubridate)
library(ggplot2)
library(dataRetrieval)
library(timetk)
library(trend)
library(forecast)
library(Kendall)
library(knitr)
library(rmarkdown)
library(quarto)
#install.packages("IHA", repos="http://R-Forge.R-project.org")
```
```{r, include = FALSE, echo= FALSE}
library(tidyverse)
library(tidyhydat)
library(zoo)
library(lubridate)
library(caTools)
library(dataRetrieval)
# Various functions for calculating: IHA, Percent Change, and ice Variables
# Created by Jacqui Levy
#######PART 1 - MISC FUNCTIONS#####
# MISSING YEARS FUNCTION
calc_missing_yrs <- function(df, Date) {
#Find out if there are missing years in the data set. Will return "false" if there are no missing years, or a df of missing years.
#Date should be in yyyy-mm-dd, col title is "Date"
#cal year not wy
years <- format(df$Date, "%Y")
unique_years <- unique(years)
all_years <- seq(min(unique_years), max(unique_years), by = 1)
missing_years <- setdiff(all_years, unique_years)
View(missing_years) #should be zero
any(missing_years) #says if there is anything in the list
print(missing_years)
}
#RLE FUNCTION
calc_rle <- function(df) {
#function to remove years that have > 14 consecutive NA values (ie 14 days in a row with no data)
#and return the original df, without the offending years
na_rows <- with(rle(is.na({{df}}$Value)), rep(values & lengths > 14, lengths))
yearstoremove <- unique({{df}}$waterYear[na_rows])
output <- {{df}}[!{{df}}$waterYear %in% yearstoremove, ]
return(output)
}
#DAY OF THE WATER YEAR FUNCTION
#Date should be in yyyy-mm-dd
calc_day_of_wyear <- function(data){
#function sequences by number of days in each water year
df <- data.frame(waterYear = character(), sequence = character())
for (i in unique({{data}}$waterYear)){
df_subset <- {{data}}[{{data}}$waterYear == i,]
days <- seq(1:nrow(df_subset))
temp_df <- data.frame(waterYear = i, day_of_year = days)
df <- rbind(df, temp_df)
}
df2 <- cbind(df, {{data}})
df3 <- df2[, !duplicated(colnames(df2))]
return(df3)
}
#######PART 2 - IHA VARIABLES#####
#IHA FUNCTION
#Date must be in dd-mm-yyyy format - use mutate(Date = format(Date,"%d-%m-%Y"))
calc_IHA <- function(data){
flow_data <- {{data}} %>%
select(Date, Value)
flow_data <- zoo(flow_data$Value, order.by = as.Date(as.character(flow_data$Date), format = "%d-%m-%Y"))
## Run IHA analyses
group1_output <- group1(flow_data, year = "water", FUN = median)
group2_output <- group2(flow_data, year = "water", mimic.tnc = TRUE)
group3_output <- group3(flow_data, year = "water", mimic.tnc = FALSE)
group4_output <- group4(flow_data, year = "water")
group5_output <- group5(flow_data, year = "water")
## Convert outputs
group1_output <- as.data.frame(group1_output)
group2_output <- group2_output[,-1]
group3_output <- as.data.frame(group3_output)
group4_output <- as.data.frame(group4_output)
group5_output <- as.data.frame(group5_output)
#to deal with one less row for group4 variables
if (nrow(group4_output) < nrow(group1_output)) {
group4_output <- group4_output %>%
add_row()
}
## Create output dataframe
IHA_output <- bind_cols(list(group1_output, group2_output, group3_output, group4_output, group5_output))
#make the years the column names instead of the rownames
IHA_output <- tibble::rownames_to_column(IHA_output, "Year")
}
#######PART 3 - ice VARIABLES#####
#Used to calculate the freeze-up dates, ice break-up dates, and continuous ice coverage
#df must have col names: "day_of_year", "Value", "Symbol", "Date", "waterYear"
#Date must be in yyyy-mm-dd format for all ice variables functions
#GROUP 1 FUNCTION - ice COVER
Group_1_ice_cover <- function(data) {
#function returns a df for the length of ice coverage, per water year
#works with one station, multiple years
#for multiple stations, split-apply-combine station
lst <- list()
for (i in unique({{data}}$waterYear)) {
length_B_date <- max(rle({{data}}$Symbol[{{data}}$waterYear == i] == "B")[[1]])
#append each value to a list
length_B_date <- length_B_date - 1
lst[[i]] <- length_B_date
}
ice_coverage_wy <- data.frame(waterYear = names(lst), ice_coverage_wy = unlist(lst))
rownames(ice_coverage_wy) <-NULL
return(ice_coverage_wy)
}
#GROUP 2 FUNCTION - FREEZE AND THAW DATES
Group_2_freeze_thaw <- function(data) {
#This function calculates the freeze and thaw dates and their flow values, using the longest consecutive run of "B" Symbols in the df
start_date_lst <- list()
end_date_lst <- list()
start_flow_lst <- list()
end_flow_lst <- list()
start_doy_lst <- list()
end_doy_lst <- list()
for (i in unique({{data}}$waterYear)){
df_subset <- {{data}}[{{data}}$waterYear == i,]
#calc rle for B symbol
rle_m = rle(df_subset$Symbol == "B")
#find index for max run of B symbols
max_run_index <- which.max(rle_m$lengths)
#find end start and index of max run of B symbols
end <- cumsum(rle_m$lengths)[max_run_index]
start <- end - rle_m$lengths[max_run_index] +1
#find date at end, start index of the max run
date_end = df_subset$Date[end]
date_start = df_subset$Date[start]
#find flow at end, start index
flow_end = df_subset$Value[end]
flow_start = df_subset$Value[start]
#find doy at end, start index
doy_end = df_subset$day_of_year[end]
doy_start = df_subset$day_of_year[start]
#append dates to the list
start_date_lst[[i]] <- date_start
end_date_lst[[i]] <- date_end
#append flows to the list
start_flow_lst[[i]] <- flow_start
end_flow_lst[[i]] <- flow_end
#append doy to the list
start_doy_lst[[i]] <- doy_start
end_doy_lst[[i]] <- doy_end
}
Freeze_Date <- as.Date(unlist(start_date_lst))
Thaw_Date <- as.Date(unlist(end_date_lst))
Flow_Freeze <- unlist(start_flow_lst)
Flow_Thaw <- unlist(end_flow_lst)
Freeze_DOY <-unlist(start_doy_lst)
Thaw_DOY <- unlist(end_doy_lst)
df <- cbind.data.frame(Freeze_Date,Freeze_DOY,Flow_Freeze,Thaw_Date,Thaw_DOY, Flow_Thaw)
ice_coverage_dates_flow <- rownames_to_column(df, "waterYear")
return(ice_coverage_dates_flow)
}
#GROUP 3 FUNCTION - ONSET OF FRESHET
Group_3_freshet <- function(data) {
index <- 0
f_index <- 16
date_lst <- list()
flow_lst <- list()
stn_nu <- list()
doy_lst <- list()
for (i in unique({{data}}$waterYear)) { #first loop
#subset data by year, resetting at each year
index = 0
f_index = 16
df_subset <- {{data}}[{{data}}$waterYear == i,]
df_subset <- df_subset %>%
#data tidying:delete dates before Feb 12, so rolling mean calc starts on March 1
mutate(Date = as.Date(Date)) %>%
#filter(month(Date) %in% c(3,4,5,6))
mutate(new_col = format(Date,"%m-%d")) %>%
filter(month(Date) >= 2 & month(Date) < 7) %>%
filter(!(new_col %in% c("02-01", "02-02", "02-03", "02-04", "02-05", "02-06", "02-07", "02-08", "02-09", "02-10", "02-11")))
#calc rolling 16 day mean
rollmn <- rollmean(df_subset$Value, k = 16, width = 16)
#rollmn <- as.data.frame(rollmn)
for (j in rollmn) { #second loop
#increment index
index = index + 1
f_index = f_index + 1
#find rolling mean value at the index and multiply by 1.5
rollmnvalue <- rollmn[index] #roll mean = 0.81
rollmnvalue1.5 <- rollmnvalue*1.5 #1.215
#get the flow value at that index
flowvalue <- df_subset$Value[f_index] #flow = 0.654
if (flowvalue > rollmnvalue1.5 & f_index < 123 ) { #third loop
#append date, flow value to a list using the index numbers
dt <- df_subset$Date[f_index]
fl <- df_subset$Value[f_index]
st <- df_subset$STATION_NUMBER[f_index]
doy <- df_subset$day_of_year[f_index]
# print(dt)
# print(fl)
# print("end")
date_lst[[i]] <- dt
flow_lst[[i]] <- fl
stn_nu[[i]] <- st
doy_lst[[i]] <- doy
Freshet_Date <- as.Date(unlist(date_lst))
Freshet_Flow <- unlist(flow_lst)
Station_Number <- unlist(stn_nu)
Freshet_Dayofyear <- unlist(doy_lst)
df <- cbind.data.frame(Station_Number, Freshet_Date, Freshet_Dayofyear, Freshet_Flow)
break
}
}
}
Freshet_dates_flow <- rownames_to_column(df, "waterYear")
return(Freshet_dates_flow)
}
#######PART 4 - PERCENT CHANGE#####
# Calculate percent change
calc_percent_change <- function(data_pre, data_post, stn, year_col){
#calculates percent change from IHA pre and post stns, once tidying is complete.
#stn = i if looping through multiple stns
years_post <- {{data_post}}[[year_col]]
IHA_medians_pre <- {{data_pre}} %>%
filter(STATION_NUMBER == {{stn}}) %>%
select(-c(STATION_NUMBER))
IHA_pst <- {{data_post}} %>%
filter(STATION_NUMBER == {{stn}})%>%
select(-c(STATION_NUMBER, {{year_col}}))
IHA_pre_expand_rows <- IHA_medians_pre[rep(1, nrow(IHA_pst)),]
output <- ((IHA_pst - IHA_pre_expand_rows) /IHA_pre_expand_rows ) * 100
percent_change <- merge(years_post, output, by.x = 0, by.y = 0) %>%
rename("Year" = "x")
}
#######PART 5 - MANN KENDALL#####
#Perform mann-kendall test for each "STATION_NUMBER" in a dataframe
#parameter = col_variable to calculate MK test (must be annual variable)
#start = start year
calc_MK <- function(data, parameter, start) {
plst <- list()
stn_list <- list()
for (i in unique({{data}}$STATION_NUMBER)) {
#subset the data by stn number
df_subset <- {{data}}[{{data}}$STATION_NUMBER == i,]
#get the stn number for each iteration and append to a list
#stn_num <- i
stn_list[[i]] <- i
col_var <- df_subset %>% pull({{parameter}})
#df subset to a ts object and run MK analysis
TS <- ts(col_var, frequency = 1, start = c({{start}}, 1))
MK <- MannKendall(TS)
#append pvalue to a list
pval <- as.numeric(MK$sl)
plst[[i]] <- pval
#unlist, rename etc
df <- as.data.frame(unlist(plst))
names(df) <- "P_Value"
final_MK <- rownames_to_column(df, "STATION_NUMBER") %>%
mutate(Interpretation = case_when(P_Value <= .05 ~ "Signficant", .default = "Not Significant"))
}
return(final_MK)
}
#######PART 6 - MISC FUNCTIONS NO LONGER USING BUT STILL USEFUL#####
# Find missing years
#Find out if there are missing years in the data set. Will return "false" if there are no missing years, or a df of missing years. Date should be in yyyy-mm-dd
yrs_missing <- function(df, Date) {
years <- format(df$Date, "%Y")
unique_years <- unique(years)
all_years <- seq(min(unique_years), max(unique_years), by = 1)
missing_years <- setdiff(all_years, unique_years)
View(missing_years) #should be zero
any(missing_years) #says if there is anything in the list
print(missing_years)
}
```
#### 1) First, retrieve water flow data from the Water Survey of Canada, and add a column for "water year" and a column for "day of the water year".
A "water year" or hydrological year, is measured from October 1-September 30th, and measures the cycle of a watershed from its lowest baseflow point.
A "day of the year" or Julian day, is a continuous count of days from the beginning of the water year. These methods are used in hydrology to standardize the reporting of water statistics and forecasting.
```{r}
#extract stations from the Water Survey of Canada database
stn_all <- tidyhydat::hy_daily_flows(station_number = c('07BE001','07DA001', '07HA001','07NB001', '07KC001', '07AD002','07DA001', '07AA002'))
stn_all <- stn_all %>%
group_by(STATION_NUMBER) %>%
#make a complete set of days for each year
tidyr::complete(Date = seq.Date(as.Date("1920/10/1"), as.Date("2022/09/30"), by="day"))
#add water year
stn_all <- addWaterYear(stn_all) %>%
mutate(waterYear = as.character(waterYear))
#add day of the year from "Eflows_FUNCTIONS.R" function
stn_all_split <- split(stn_all, stn_all$STATION_NUMBER )
stn_all_split_doy <- lapply(stn_all_split, calc_day_of_wyear)
stn_all <- bind_rows(stn_all_split_doy, .id = "STATION_NUMBER")
#rm(stn_all_split, stn_all_split_doy)
#View(stn_all_split)
```
#### 2) Tidy data: detect and resolve anomalies and deal with missing data.
Use the functions built for this project to delete years with more than 14 consecutive days of missing data, and the tidyverse package to estimate missing observations using the last existing observation.
```{r, results = 'hide'}
stn_cln <- stn_all %>%
mutate(Date = as.Date(Date)) %>%
filter(waterYear > 1970 & waterYear < 2021)
#delete years that have >14 days missing data using calc_rle function from functions I built for this project
lst_stns_cln <- split(stn_cln, stn_cln$STATION_NUMBER )
lst_stns_cln_rle <- lapply(lst_stns_cln, calc_rle)
#unlist the station numbers
stns_ready <- bind_rows(lst_stns_cln_rle, .id = "STATION_NUMBER")
#reformat date and fill remaining missing values with preceding values
stns_daymonthyear <- stns_ready %>%
mutate(Date = format(Date,"%d-%m-%Y")) %>%
group_by(STATION_NUMBER) %>%
fill(Value, .direction = 'downup')
```
```{r, echo=FALSE, include = FALSE}
# IHA calcs - not included in ice cycle rpubs document
#make a list of dfs- split by station number
lst_stns <- split(stns_daymonthyear, stns_daymonthyear$STATION_NUMBER)
#apply calc_IHA function to all dfs in the list
lst_stns_IHA <- lapply(lst_stns, calc_IHA)
#Combine all IHA outputs into one df (unnlist the dfs)
#make a "watershed" column for graphing later
IHA <- bind_rows(lst_stns_IHA, .id = "STATION_NUMBER")
View(IHA)
#write.csv(IHA, "IHAlongterm.csv")
```
```{r, echo=FALSE, include = FALSE}
#data prep for ice variables
#change the date format, and fill any values that are missing with preceding values
stns_yearmonthday <- stns_ready %>%
mutate(Date = as.Date(Date)) %>%
group_by(STATION_NUMBER) %>%
fill(Value, .direction = 'downup')
```
#### 3) Calculate the ice cycle variables using the ice functions
These functions calculate the timing of the onset of freshet, the freeze and thaw dates, and the days of total continuous ice coverage. Use the split-apply-combine method to loop over a list of data frames and calculate all variables for multiple stations.
See source code for how to combine the output dataframes, especially if there are an uneven number of years for each station.
```{r}
#first make a list of dfs- split by station number
lst_stns <- split(stns_yearmonthday, stns_yearmonthday$STATION_NUMBER )
#apply ice variable function to all dfs in the list, by station
stn_g1 <- lapply(lst_stns, Group_1_ice_cover)
stn_g2 <- lapply(lst_stns, Group_2_freeze_thaw)
stn_g3 <- lapply(lst_stns, Group_3_freshet)
#Unnlist the outputs
output1 <- bind_rows(stn_g1, .id = "STATION_NUMBER")
output2 <- bind_rows(stn_g2, .id = "STATION_NUMBER")
output3 <- bind_rows(stn_g3, .id = "STATION_NUMBER")
test <- stn_all %>%
filter(STATION_NUMBER == "07AA002")
```
```{r, echo=FALSE, include = FALSE}
#Deal with each output df having different lengths and not being able to combine them
#find max and min years for each result df
min_yro1 <- min(output1$waterYear)
max_yro1 <- max(output1$waterYear)
min_yro2 <- min(output2$waterYear)
max_yro2 <- max(output2$waterYear)
min_yro3 <- min(output3$waterYear)
max_yro3 <- max(output3$waterYear)
#expand data set to include all years for all output dfs
output1_ex <- output1 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro1:max_yro1) #fills NA for values for missing years
output2_ex <- output2 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro2:max_yro2) #fills NA for values for missing years
output3_ex <- output3 %>%
mutate(waterYear = as.integer(waterYear)) %>%
tidyr::complete(STATION_NUMBER, waterYear = min_yro3:max_yro3) #fills NA for values for missing years
df_ice <- cbind(output1_ex, output2_ex, output3_ex)
df_ice_final <- df_ice[!duplicated(as.list(df_ice))] #removed duplicated columns
View(df_ice_final)
#write.csv(df_ice_final, "ice.csv")
```
#### 4) Mann-Kendall Trend Analysis: Use the MK function created for this project
Tests were performed at an alpha = .05, a conservative level to ensure a low probability of a type I error.
#### __Results table for select stations:__
```{r, echo=FALSE, include = FALSE}
#Mann-Kendall Trend Analysis - IHA Variables
#Expand IHA_post df to have all years for all stations so they can be used in the MK function
for (i in unique(IHA$STATION_NUMBER)) {
min_yr <- min(IHA$Year)
max_yr <- max(IHA$Year)
IHA_post_expand <- IHA %>%
mutate(Year = as.integer(Year)) %>%
tidyr::complete(STATION_NUMBER, Year = min_yr:max_yr)
return(IHA_post_expand)
}
IHA_post_expand <- IHA_post_expand %>%
rename("One_Day_Max" = "1 Day Max",
"One_Day_Min" = '1 Day Min',
"Three_Day_Max" = "3 Day Max",
"Three_Day_Min" = "3 Day Min",
"Seven_Day_Min" ="7 Day Min",
"Seven_Day_Max" = "7 Day Max",
"Thirty_Day_Max" = "30 Day Max",
"Thirty_Day_Min" = "30 Day Min",
"Ninety_Day_Max" = "90 Day Max",
"Ninety_Day_Min" = "90 Day Min",
"High_pulse_number" = "High pulse number",
"Low_pulse_number" = "Low pulse number",
"High_pulse_length" = "High pulse length",
"Low_pulse_length" = "Low pulse length")
```
```{r, echo=FALSE, include = FALSE}
#Use calc_Mk to do a MK for all Stations for selected variables for IHA
Reversals <- calc_MK(IHA_post_expand, parameter = Reversals, start = 1970) %>%
rename("Result_Reversals" = 'P_Value')
One_Day_Min <- calc_MK(IHA_post_expand, parameter = One_Day_Min , start = 1970) %>% rename("Result_One_Day_Min" = 'P_Value')
One_Day_Max <- calc_MK(IHA_post_expand, parameter = One_Day_Max , start = 1970) %>% rename("Result_1_Day_Max" = 'P_Value')
Seven_Day_Min <- calc_MK(IHA_post_expand, parameter = Seven_Day_Min , start = 1970) %>% rename("Result_7_Day_Min" = 'P_Value')
Seven_Day_Max <- calc_MK(IHA_post_expand, parameter = Seven_Day_Max , start = 1970) %>% rename("Result_7_Day_Max" = 'P_Value')
Thirty_Day_Min <- calc_MK(IHA_post_expand, parameter = Thirty_Day_Min , start = 1970) %>% rename("Result_30_Day_Min" = 'P_Value')
Thirty_Day_Max <- calc_MK(IHA_post_expand, parameter = Thirty_Day_Max , start = 1970) %>% rename("Result_30_Day_Max" = 'P_Value')
High_pulse_number <- calc_MK(IHA_post_expand, parameter = High_pulse_number , start = 1970) %>% rename("Result_High_pulse_number" = 'P_Value')
Low_pulse_number <- calc_MK(IHA_post_expand, parameter = Low_pulse_number , start = 1970) %>% rename("Result_Low_pulse_number" = 'P_Value')
High_pulse_length <- calc_MK(IHA_post_expand, parameter = High_pulse_length , start = 1970) %>% rename("Result_High_pulse_length" = 'P_Value')
Low_pulse_length <- calc_MK(IHA_post_expand, parameter = Low_pulse_length , start = 1970) %>% rename("Result_Low_pulse_length" = 'P_Value')
result_MK_IHA <- cbind(Reversals, One_Day_Min, One_Day_Max, Seven_Day_Min, Seven_Day_Max, Thirty_Day_Min, Thirty_Day_Max, High_pulse_number, Low_pulse_number, High_pulse_length, Low_pulse_length )
View(result_MK_IHA)
#write.csv(result_MK_IHA, "IHAresult.csv")
```
```{r, echo = FALSE, include = FALSE}
DY <- calc_MK(df_ice_final, parameter = Freshet_Dayofyear, start = 1970) %>%
rename("Freshet_Onset" = 'P_Value' )
icecover <- calc_MK(df_ice_final, parameter = ice_coverage_wy, start = 1970)%>%
rename("ice_cover" = 'P_Value' )
frdoy <- calc_MK(df_ice_final, parameter = Freeze_DOY, start = 1970) %>%
rename("Freeze_doy" = 'P_Value' )
thawdoy <- calc_MK(df_ice_final, parameter = Thaw_DOY, start = 1970)%>%
rename("Thaw_doy" = 'P_Value' )
result <- cbind(DY, icecover, frdoy, thawdoy)
MK_result_final <- result[!duplicated(as.list(result))] #removed duplicated columns
MK_result_final <- MK_result_final %>% mutate_if(is.numeric, ~round(., 3)) #round
View(MK_result_final)
```
```{r, echo = FALSE}
#show a few results
kable(MK_result_final[3:6,], caption = " ")
```
```{r, echo = FALSE, include = FALSE, message=FALSE}
#new for this rpubs document
#HA001
stn_07HA001 <- df_ice_final %>%
filter(STATION_NUMBER == "07HA001") %>%
na.omit()
TS_07HA001 <- ts(stn_07HA001$Freshet_Dayofyear, frequency = 1, start = c(1970,1))
MannKendall(TS_07HA001)
sens.slope(TS_07HA001)
pettitt.test(TS_07HA001)
#AA002
stn_07AA002 <- df_ice_final %>%
filter(STATION_NUMBER == "07AA002") %>%
na.omit()
TS_07AA002 <- ts(stn_07AA002$ice_coverage_wy, frequency = 1, start = c(1970,1))
MannKendall(TS_07AA002)
sens.slope(TS_07AA002)
pettitt.test(TS_07AA002)
#AD002
stn_07AD002 <- df_ice_final %>%
filter(STATION_NUMBER == "07AD002") %>%
na.omit()
TS_07AD002 <- ts(stn_07AD002$Freshet_Dayofyear, frequency = 1, start = c(1970,1))
MannKendall(TS_07AD002)
sens.slope(TS_07AD002)
pettitt.test(TS_07AD002)
#KC001
stn_07KC001 <- df_ice_final %>%
filter(STATION_NUMBER == "07KC001") %>%
na.omit()
TS_07KC001 <- ts(stn_07KC001$ice_coverage_wy, frequency = 1, start = c(1970,1))
MannKendall(TS_07KC001)
sens.slope(TS_07KC001)
pettitt.test(TS_07KC001)
```
#### __Interpretation__
Most stations had non-significant values for these ice variables, and the Mann-Kendall test did not detect a monotonic trend in ice coverage for most variables.
This does not mean that there have been no changes in ice regimes, but only that this test did not detect an effect at the selected level. More precise analysis of across the full watershed are needed, and will be forthcoming.
It is notable from the results table that the test detected a trend for two stations (07HA001 & 07KC001) along the Peace River, across several of the ice variables.
The ice cycle variables for these two stations are graphed below using the lm smooth method to fit a linear trend to the data.
```{r, fig-side, fig.show="hold", out.width="50%", echo = FALSE, message= FALSE}
df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Freeze_DOY, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
geom_line(linewidth = .75) +
geom_smooth(method = lm, se = F) +
#annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
#annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") + #07BK007
# annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
#geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(0, 150) +
theme_classic() +
ggtitle("Timing of Winter ice Freeze") +
xlab(" ") +
ylab("Day of the Water Year") +
theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal) +
theme(legend.position = "None")
df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Thaw_DOY, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
geom_line(linewidth = .75) +
geom_smooth(method = lm, se = F) +
#annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
#annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") + #07BK007
# annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
#geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(115, 250) +
theme_classic() +
ggtitle("Timing of Spring Thaw") +
xlab(" ") +
ylab("Day of the Water Year") +
theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal) +
theme(legend.position = "None")
```
```{r, figures-side, fig.show="hold", out.width="50%", echo = FALSE, message= FALSE}
df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= Freshet_Dayofyear, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
geom_line(linewidth = .75) +
geom_smooth(method = lm, se = F) +
#annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
#annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") + #07BK007
# annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
#geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(100, 275) +
theme_classic() +
ggtitle("Timing of Freshet") +
xlab(" ") +
ylab("Day of the Water Year") +
theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal) +
theme(legend.position = "None")
df_ice_final %>%
filter(Station_Number == "07HA001" | Station_Number == "07KC001") %>%
ggplot(aes(y= ice_coverage_wy, x = factor(waterYear), group = STATION_NUMBER, colour = STATION_NUMBER)) +
geom_line(linewidth = .75) +
geom_smooth(method = lm, se = F) +
#annotate(geom = "text", x = 43, y = 25, label = "Athabasca River, \n Athabasca", colour = "#E69F00") + #07BE001
#annotate(geom = "text", x = 44, y = 68, label = "Driftwood River, \n AB", colour = "#56B4E9") + #07BK007
# annotate(geom = "text", x = 43, y = 130, label = "Peace River, \n Taylor", colour = "#999999") + #07FD002
#geom_hline(yintercept = 0, linetype = "dotted") +
scale_x_discrete(breaks = seq(1970, 2020, 10)) +
ylim(0, 250) +
theme_classic() +
ggtitle("Days of Continuous ice Coverage ") +
ylab("Day of the Water Year") +
xlab(" ") +
theme(plot.title = element_text(hjust = 0.5) ) +
#scale_colour_manual(values = pal) +
theme(legend.position = "None")
```
From the graphs above, it appears that the Peace River is freezing later in the year, and thawing earlier, with less total days of continual ice coverage. This indicates a shift to a warmer and drier regime. Other studies have found similar results, but a more thorough investigation would be needed to draw any conclusions from the data here.
It would also be useful to conduct a Mann-Kendall test on the freeze, thaw, and freshet flows measurements to see if these regime shifts might be caused by lower flows, or if there are other mechanistic pathways.
Conversely, the timing of freshet appears to be later in the year. This could be due to lower or more variable flows, making it difficult for the function's algorithm to detect the onset of freshet at the precision is was built for. Perhaps a different algorithm is needed for different watersheds, or perhaps we are seeing unprecedented changes in watershed ice regimes.
#### __Conclusion__
This document describes the workflow and methodology for using ice variable functions to assess trends in ice regimes for rivers across Canada.
The Canadian River ice Database Project previously calculated some of these ice cycle variables, however, they were only calculated for approximately 300 water stations, and only through 2015. These newly developed functions allow scientists to calculate present-day data, in addition to historical data, for any of the 1000+ water stations from the Water Survey of Canada.
The next step in this project is to create an accessible, open-source R package, and apply this analysis broadly for approximately 900+ reference stations within Canada. My hope is that scientists and watershed managers will be able to use this tool in conjunction with climatic data and models, to better assess and predict broad-scale watershed changes.
#### __References__
Beltaos, S., & Bonsal, B. (2021). Climate change impacts on Peace River ice thickness and implications to ice-jam flooding of Peace-Athabasca Delta, Canada. Cold Regions Science and Technology, 186, 103279. https://doi.org/10.1016/j.coldregions.2021.103279
Beltaos, S. (2018). Frequency of ice-jam flooding of Peace-Athabasca Delta. Canadian Journal of Civil Engineering, 45(1), 71–75. https://doi.org/10.1139/cjce-2017-0434
de Rham, L., Dibike, Y., Beltaos, S., Peters, D., Bonsal, B., & Prowse, T. (2020). A Canadian River ice Database from the National Hydrometric Program Archives. Earth System Science Data, 12(3), 1835–1860. https://doi.org/10.5194/essd-12-1835-2020