generated from rstudio/bookdown-demo
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Appendix-A.Rmd
1884 lines (1300 loc) · 78.3 KB
/
Appendix-A.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{=html}
<style type="text/css">
details:hover { cursor: pointer }
</style>
```
---
author: "Benjamin Meyer, Kenai Watershed Forum"
date: "`r Sys.Date()`"
output: html_document:
code_folding: hide
output: pdf_book
---
# (APPENDIX) Appendix {.unnumbered}
# Appendix: Data Review and Uplift
## Introduction
Prior to analysis and interpretation of water quality data, we will ensure that all data that meets QA/QC standards outlined in the current project [Quality Assurance Project Plan (QAPP)](https://paperpile.com/app/p/7703451b-460d-00b4-82a0-1086ea2554c3) is accessible in the appropriate repository.
Water quality data from this project is ultimately destined for the EPA Water Quality Exchange (EPA WQX), formerly EPA STORET. The process of transferring these data to the higher-level EPA repository is referred to as data "uplift."
Section B10 of the 2020 QAPP describes data management details and responsible parties for each step of the data pipeline from observation to repository. The 2021 data preparation and review process is published here as an appendix as an example of the process applied annually to each year's data.
### 2021 Water Quality Data
In this appendix we will collate laboratory data from several sources into a single spreadsheet document with a consistent format. The format consists of a spreadsheet template provided by the Alaska Department of Environmental Conservation (ADEC), referred to here as the AQWMS Template:
```{r, echo = F}
xfun::embed_file('other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx', text = "Download the AQWMS Water Quality Data Spreadsheet Template")
```
AWQMS is an acronym for "Ambient Water Quality Management System," the service used by ADEC to manage and prepare data for potential uplift to the EPA WQX.
#### 2021 Water Quality Data AQWMS Formatting
The code scripts below assemble water quality data from the three analytical laboratories that partnered with Kenai Watershed Forum for this project in 2021:
- SGS Laboratories (Anchorage, AK)
- Soldotna Wastewater Treatment Plant (Soldotna, AK)
- Taurianen Engineering and Testing (Soldotna, AK)
<br>
------------------------------------------------------------------------
```{r, include=FALSE, eval = FALSE}
# this code hides the print output and content of ALL code chunks
# when rendering PDF version, set this chunk option to eval = TRUE
knitr::opts_chunk$set(
include = FALSE
)
```
##### Metals/Nutrients Lab Results (SGS Labs)
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_sgs_batch_info.csv', text = "Download Original Spring 2021 Metals/Nutrients Lab Results from SGS")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_sgs_batch_info.csv', text = "Download Original Summer 2021 Metals/Nutrients Lab Results from SGS")
```
\**Note: the chain of custody documents for SGS Laboratories are integrated into the above downloadable PDF files.*
<details>
<summary>
*Show/Hide Code used to Prepare 2021 Metals/Nutrients Results*
</summary>
```{r, 2021 AQWMS formatting for SGS}
# clear environment
rm(list=ls())
# load packages
library(tidyverse)
library(readxl)
library(openxlsx)
library(data.table)
library(stringr)
library(magrittr)
library(janitor)
library(hms)
library(lubridate)
library(anytime)
xfun::pkg_load2(c("htmltools", "mime"))
# Assign 2021 Field Sample Dates
# Spring 2021 sampling date
spring21_sample_date <- "5/11/2021"
# Summer 2021 Sampling Date
summer21_sample_date <-"7/27/2021"
```
```{r}
################################################################################################################
######################################### Read in and Clean SGS/ALS Data ######################################
################################################################################################################
############################ Part A: SGS Data Read In #############################
## Reformat SGS data downloaded from their server client (SGS Engage, full EDD files) to match AQWMS template
# read in
spring_batch_sgs21 <- read.csv("other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_sgs_batch_info.csv")
summer_batch_sgs21 <- read.csv("other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_sgs_batch_info.csv")
# clean up and retain only useful columns
sgs21 <- bind_rows(spring_batch_sgs21,summer_batch_sgs21) %>%
clean_names() %>%
remove_empty() %>%
# remove unneeded columns
select(-project_id)%>%
rename(sample = sample_id,
lab_sample = lab_sample_id,
detection_limit = dl) %>%
transform(lab_sample = as.character(lab_sample),
sample_rpd = as.character(sample_rpd)) %>%
# add lab name
mutate(lab_name = "SGS North America, Anchorage, Alaska",
matrix = "Water") %>%
# split a.) lab sample run & b.) collect time and date in prep for future join with ALS data
##### NOTE: SGS data has date and time, ALS has date only.
transform(collect_date_time = mdy_hm(collect_date),
rec_date_time = mdy_hm(rec_date),
run_date_time = mdy_hm(run_date_time),
extracted_date_time = mdy_hm(extracted_date)) %>%
mutate(collect_time = as_hms(collect_date_time),
collect_date = date(collect_date_time),
rec_date = date(rec_date_time),
rec_time = as_hms(rec_date_time),
run_time = as_hms(run_date_time),
run_date = date(run_date_time),
extracted_time = as_hms(extracted_date_time),
extracted_date = date(extracted_date_time)) %>%
select(-collect_date_time,-rec_date_time,-run_date_time) %>%
rename(sample = sample)
rm(spring_batch_sgs21,summer_batch_sgs21)
###################### Part B: ALS Data Read In #############################
## SGS subcontracted analyses of Ca, Fe, and Mg to ALS laboratories (Kelso, WA). These results are not included in the spreadsheet download from SGS engage and were entered manually in to seperate spring and summer "ALS" named spreadsheets
#### read in spring 2021 results from ALS
spring_als21 <- read.csv("other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_als_batch_info.csv") %>%
clean_names()
summer_als21 <- read.csv("other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_als_batch_info.csv") %>%
clean_names()
# bind spring and summer
als21 <- bind_rows(spring_als21,summer_als21) %>%
remove_empty() %>%
# proceed left to right of existing ALS dataframe to make its naming structure match the sgs21 dataframe. Add, remove, modify column names as needed
select(-client,
-project,
-service_request) %>%
rename(lab_sample = lab_code) %>%
rename(
collect_date = date_collected,
collect_time = time_collected,
rec_date = date_received,
rec_time = time_received,
# sample_type ::: not sure where to match with sgs data yet or where to put in aqwms, but is important for qa/qc
extracted_date = date_extracted,
extracted_time = time_extracted,
extraction_code = extraction_method,
run_date = date_analyzed,
run_time = time_analyzed,
analytical_method = method,
#units = units,
analyte = component,
resultflag = result_notes,
amount_spiked = spike_concentration,
percent_recovered = percent_recovery,
allowable_limit = acceptance_limits,
sample_rpd = rpd,
# change report/detection limit terminology See SGS document, "SGS DL, LOD, LOQ Interpretation"
loq = reporting_limit) %>%
mutate(lab_name = "ALS Environmental, Kelso, Washington"
#,
#run_time = ""
) %>%
# prep column classes to bind with sgs dataframe
transform(analytical_method = as.character(analytical_method),
run_date = mdy(run_date),
run_time = as_hms(as.POSIXct(run_time, format = "%H:%M")),
#run_time = as_hms(run_time),
collect_date = mdy(collect_date),
rec_date = mdy(rec_date),
rec_time = as_hms(as.POSIXct(rec_time, format = "%H:%M")),
extracted_date = mdy(extracted_date),
extracted_time = as_hms(as.POSIXct(extracted_time, format = "%H:%M")),
result = as.double(result),
collect_time = as_hms(as.POSIXct(collect_time, format = "%H:%M")))
# join SGS data with ALS data
dat <- bind_rows(sgs21,als21)
rm(als21,sgs21,spring_als21,summer_als21)
# export table of sample types
## assign sample type acronyms just like with sgs21 samples. see excel file for full definitions
# --> make sure doesn't conflict with other sample_type designations in rest of document. use same acronyms
### export table of sample types, then manually translate their abbreviations
sample_types <- dat %>%
select(sample_type,lab_name) %>%
distinct()
# remove old version and write new one
unlink("other/input/AQWMS/sample_type_abbreviations.xlsx")
write.xlsx(sample_types, "other/input/AQWMS/sample_type_abbreviations.xlsx")
# manually created a translation of all the acronyms in an accompanying file. removed inconsistencies in sample type abbreviations into one consistent schema between SGS and ALS labs
############### Part C: Address spelling/format issues and inconsistent sample/site names ######################
# Upon visual inspection of site names, we can see that the location names in the AQWMS template differ slightly from the place names in the SGS report (spelling and name inconsistencies).
# 3/28/2022 - A note on addressing "Duplicate" designations. In QA/QC data review in March 2022 the following was clarified through trial and error: we must make a careful distinction between "Field Duplicates" and "Lab duplicates" when preparing this data. The sample names contain info about whether a result is from a "Field Duplicate," e.g., two field collections made at the same location/day/time. However the ALS lab also created "Lab Duplicates," which are not from the same sites as field duplicates, and designates these as "DUP1" in the "sample_type" column.
# See AQWMS Activity Type column assign distinctions
# Decision - we will designate the field duplicates simply as a Field Duplicate
# move info about duplicate sample and/or sample blank status into separate column, "sample_condition"
dat %<>%
mutate(sample_condition = case_when(
grepl("Method Blank",sample) ~ "Method Blank",
grepl("Trip Blank",sample) ~ "Trip Blank",
grepl("DUP",sample) ~ "Field Duplicate",
grepl("Dup",sample) ~ "Field Duplicate")) %>%
# remove "DUP" designation from sample column
mutate(sample = str_replace(sample, "DUP|Dup", ""))
# remove from "sample" names the text containing the suffixes Diss/Dis (Dissolved metals sample) since we only want location info in this column. (Solution for this step was found at https://stackoverflow.com/questions/29271549/replace-all-occurrences-of-a-string-in-a-data-frame)
dat %<>%
mutate(sample = (str_replace(sample, "Diss|Dis|DUP|Dup",""))) %>%
# remove "Diss" suffix and "EP" prefix from "analytical_method" column
mutate(analytical_method = str_replace(analytical_method, "Diss", "")) %>%
# note trailing space after "EP200.8 "
mutate(analytical_method = str_replace(analytical_method,"EP200.8 ","200.8")) %>%
# address the one stubborn site name still containing "Diss"
mutate(sample = case_when(
sample == "RM0-No Name Creek Diss" ~ "RM0-No Name Creek",
TRUE ~ sample))
# z <- dat %>%
# select(sample,sample_type,sample_condition)
# We need to remove white spaces, apostrophes, and dashes; because join functions such as "left_join" are often uncooperative with these types of string characters. We will need to use joins with site names in next steps.
dat %<>%
# remove excess white spaces
mutate(sample = str_trim(sample,"both")) %>%
mutate(sample = str_squish(sample)) %>%
# make remaining white spaces underscores
mutate(sample = gsub("\\s+","_",sample)) %>%
# remove apostrophes
mutate(sample = gsub("\\'","",sample)) %>%
# replace dashes with underscores
mutate(sample = gsub("\\-","_",sample)) %>%
# replace multiple underscores with single
mutate(sample = gsub("\\__","_",sample)) %>%
mutate(sample = gsub("\\___","_",sample)) %>%
# again replace multiple underscores with single
mutate(sample = gsub("\\__","_",sample))
# apply note regarding trip blanks (for BTEX organics)
# assigned in sequence as encountered on chain of custody
dat %<>%
mutate(note = case_when(
grepl("Trip_Blank_1", sample) ~ "KWF Crew, RM1.5_Kenai_City_Dock",
grepl("Trip_Blank_2", sample) ~ "USFWS Crew, RM6.5_Cunningham_Park",
grepl("Trip_Blank_3", sample) ~ "DEC Crew, RM40_Bings_Landing",
grepl("Trip_Blank_4", sample) ~ "DEC Crew, RM43_Upstream_of_Dow_Island"))
# seperate result qualifiers (U, J, B) in to a new column
#sgs21 %<>%
# account for fact that als data already has qualifier column and has characters in results column
# mutate(qualifier = case_when(
# result == "ND" ~ qualifier,
# result != "ND" ~ str_extract(result,"[aA-zZ]+"))) %>%
# mutate(result = str_remove(result,"[aA-zZ]+"))
############## Part D: Prepare SGS/ALS Location/Site Names ##########################
# NOTE: The SGS and ALS 2021 results have a variety of misspelling and typos. For 2022, we should provide labs with a csv file of site names that they can use
# In preparation for a join to AQWMS table, we will manually generate a match table csv file that we can use
## generate list of unique site names from 2021 SGS data
sgs21_sitenames <- data.table(unique(dat$sample)) %>%
arrange(V1)
# generate list of unique site names from 2021 AQWMS template. these are the names we want in the final product
aqwms21_sitenames <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "Monitoring Locations") %>%
select("Monitoring Location Name", "Monitoring Location ID") %>%
distinct()
# write 2021 sgs site names to an excel file
site_match_table_path <- "other/input/AQWMS/sgs_site_names_matching_table.xlsx"
write.xlsx(sgs21_sitenames, site_match_table_path)
# create an excel file with two sheets: a.) SGS site names, and b.) AQWMS site names
wb <- loadWorkbook(site_match_table_path)
addWorksheet(wb,"Sheet2")
writeData(wb,"Sheet2",aqwms21_sitenames)
saveWorkbook(wb,site_match_table_path,overwrite = TRUE)
# Using these two tables, we will manually create a new file titled "sgs_site_names_matching_table_manual_edit.xlsx" and manually match up the two disparate naming systems.
# Performed manually by B Meyer March 18, 2022.
# append "Monitoring Location Name" and "Monitoring Location ID" info from WQX to spring 2021 SGS data
## read in site names join table
sitenames21_match <- read_excel("other/input/AQWMS/sgs_site_names_matching_table_manual_edit.xlsx") %>%
select(`Monitoring Location Name`,`Monitoring Location ID`,sgs_sitenames) %>%
rename(sample = sgs_sitenames) %>%
filter(!is.na(`Monitoring Location ID`))
# append monitoring location names
dat %<>%
left_join(sitenames21_match, by = "sample") %>%
clean_names()
# remove dfs
rm(sgs21_sitenames,aqwms21_sitenames,sitenames21_match)
######################## Part E: "Result Analytical Method Context" name rectification ######################
# In the AQWMS template, the EPA names that will go in the column "Result Analytical Method ID" do not exactly match the names provided by the laboratory (SGS). After communicating with SGS and DEC on 2/8/2022, we are able to cross-walk between the two naming systems. These matches are documented in the excel file "analysis_code_matching_table.xlsx."
# assign "Result Analytical Method ID" and "Result Analytical Method Context" to dataset using matching table
# read in matching table
analysis_code_matching_table <- read_excel("other/input/AQWMS/analysis_code_matching_table.xlsx") %>%
select(-Comments,-`EPA Name`) %>%
clean_names() %>%
rename(analytical_method = sgs_analysis_code) %>%
# remove "EP" prefix from method "EP200.8"
mutate(analytical_method = str_replace(analytical_method,"EP200.8","200.8"))
# read in AQWMS Analytical Methods list
aqwms_analytical_methods <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "Analytical Methods") %>%
select("ID","Context Code") %>%
clean_names() %>%
rename(epa_analysis_id = id) %>%
distinct()
# join two tables above
epa_analysis_codes <- inner_join(aqwms_analytical_methods,analysis_code_matching_table, by = "epa_analysis_id") %>%
filter(!context_code %in% c("USEPA Rev 5.4",
"APHA (1997)",
"APHA (1999)"))
# join EPA analysis IDs and context codes to overall dataset
dat %<>%
mutate(analytical_method = str_replace(analytical_method,"EP200.8","200.8")) %>%
left_join(epa_analysis_codes, by = "analytical_method")
# remove unneeded dfs
rm(analysis_code_matching_table,aqwms_analytical_methods,epa_analysis_codes)
```
```{r}
########################## Miscellaneous Steps #########################################
########### Address Non-Detect values #########################
# Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.
# modify non-detect values from "0" to "NA" if resultflag = U or ND
dat %<>%
mutate(result1 = na_if(result,0)) %>%
select(-result) %>%
rename(result = result1)
###### Segregate laboratory QA/QC data from field data ########
# These data will be evaluated at a later step. See excel file "other/input/AQWMS/sample_type_abbreviations_manual_edit.xlsx" for sample_type naming schema
sgs21_als21_qaqc_dat <- dat %>%
# retain only results not from field sampling program (project samples and trip blanks)
# also filter out hydrocarbon surrogate results ("surr"). Surrogate standards are compounds spiked into all samples, blanks, Laboratory Control Samples, and matrix spikes to monitor the efficacy of sample extraction, chromatographic, and calibration systems. They do not represent environmental observations.
filter(!sample_type %in% c("PS","SMPL","TB") |
grepl("(surr)",analyte))
write.csv(sgs21_als21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/sgs21_als21_qaqc_dat.csv")
# retain only non-field sample results for AQWMS export
dat %<>%
filter(sample_type %in% c("PS","SMPL","TB")) %>%
filter(!grepl("(surr)",analyte))
rm(sgs21_als21_qaqc_dat)
```
</details>
<br>
------------------------------------------------------------------------
##### Fecal Coliform Lab Results (Soldotna Wastewater Treatment Plant (SWWTP)/Taurianen Engineering)
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls', text = "Download Original Spring 2021 Fecal Coliform Lab Results from SWWTP")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/FecalColiform_Results_Summer2021.pdf', text = "Download Original Summer 2021 Fecal Coliform Lab Results from Taurianen")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/SWWTP_Spring_2021_TCC_FC.jpg', text = "Download Spring 2021 Fecal Coliform Chain of Custody")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/FC_CoC_Taurianen_Summer2021.pdf', text = "Download Summer 2021 Fecal Coliform Chain of Custody")
```
<details>
<summary>
*Show/Hide Code used to Prepare 2021 Fecal Coliform Results*
</summary>
```{r}
############################################################################################################
##################################### Read in and Clean SWWTP / Taurianen FC Data #########################
############################################################################################################
########################### Part A: SWWTP FC Data Read In ##################################################
swwtp_spring21 <- read_excel("other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls", skip = 11) %>%
clean_names() %>%
## fix site naming and terminology
# move info about duplicate sample and/or sample blank status into separate column
# sample type abbreviations
mutate(sample_type = case_when(
grepl("BLANK",sample_location_rm) ~ "MB", # method blank
grepl("POSITIVE",sample_location_rm) ~ "LCS")) %>% # laboratory control sample
# assign all other samples as "PS" (project sample)
mutate_at(vars(sample_type),~replace_na(.,"PS")) %>%
# field dup designation
mutate(sample_condition = case_when(
grepl("DUP",sample_location_rm) ~ "Field Duplicate")) %>%
# remove "BLANK", and "POSITIVE designation from sample_location column
mutate(sample_location_rm = (str_replace(sample_location_rm, "BLANK|POSITIVE", "")))
# remove "DUP" from site name column and trim white spaces in site name column
swwtp_spring21 %<>%
mutate(sample_location_rm = str_remove(sample_location_rm,"DUP")) %>%
mutate(sample_location_rm = str_trim(sample_location_rm,"right"))
# address different site naming systems
# use manually generated matching table
# read in matching table and match
swwtp_spring21_site_matching <- read_excel("other/input/AQWMS/swwtp_site_names_matching_table_manual_edit.xlsx")
# join
swwtp_spring21 %<>%
full_join(swwtp_spring21_site_matching) %>%
select(-sample_location_rm)
rm(swwtp_spring21_site_matching)
## fix lab analysis times and dates
swwtp_spring21 %<>%
# lab processing time/date
mutate(analysis_time_in = as_hms(time_in),
analysis_date_in = mdy(spring21_sample_date),
analysis_time_out = as_hms(time_out),
# see file "other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls for out analysis date
analysis_date_out = ymd("2021-05-12")) %>%
select(-time_in,-time_out) %>%
transform(time_sampled = as_hms(time_sampled)) %>%
# field sample date and time
mutate(time_sampled = as_hms(time_sampled),
sample_date = mdy(spring21_sample_date))
## assign time/date received at lab. info from chain of custody
swwtp_spring21_rec_time <- "13:31:00"
swwtp_spring21_rec_date <- "2021-05-11"
swwtp_spring21 %<>%
mutate(rec_date = ymd(swwtp_spring21_rec_date),
rec_time = as_hms(swwtp_spring21_rec_time))
## rename existing column names and create new ones to match sgs21 data format at end of prior code chunk
swwtp_spring21 %<>%
rename(lab_sample = dish_number,
result = colony_count_100m_l,
collect_time = time_sampled,
run_time = analysis_time_in,
run_date = analysis_date_in,
# = analysis_time_out,
# = analysis_date_out,
collect_date = sample_date) %>%
mutate(note = paste0("Lab analysis volume = ",ml," mL"),
matrix = "Water (Surface, Eff., Ground)",
analytical_method = "9222 D ~ Membrane filtration test for fecal coliforms",
analyte = "Fecal Coliform",
units = "cfu/100ml",
# reporting limit ("loq") value from 2019 QAPP, pg 17
loq = 1.0,
lab_name = "Soldotna Wastewater Treatment Plant, Soldotna, Alaska",
units = "cfu/100ml",
epa_analysis_id = "9222D",
context_code = "APHA",
analyst = "AW") %>%
clean_names() %>%
select(-ml,-colony_count) %>%
# transform to prep for bind with sgs21
transform(lab_sample = as.character(lab_sample),
result = as.double(result)) %>%
# apply correction to the one "TNTC" result (Too Numerous To Count), since we can't have characters and integers in same column
mutate(note = case_when(
lab_sample == "30" ~ paste("Lab analysis volume = 0.5 mL, result = TNTC"),
TRUE ~ note))
########### Address Non-Detect values #########################
# Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.
# Assign resultflag column. Use "=" if result > 1 and "U" if result < 1. See pg 17 of 2020 QAPP at "other/documents/QAPP/QAPP-2020-KenaiRiverWatershed_ZnCu.pdf"
swwtp_spring21 %<>%
mutate(resultflag = case_when(
result < 1 ~ "U",
TRUE ~ "="))
# modify non-detect values from "0" to "NA" if resultflag = U or ND
swwtp_spring21 %<>%
mutate(result1 = na_if(result,0)) %>%
select(-result) %>%
rename(result = result1)
# segregate lab results from field results, and write lab qa/qc results to external csv
swwtp_spring21_qaqc_dat <- swwtp_spring21 %>%
filter(sample_type %in% c("MB","LCS"))
write.csv(swwtp_spring21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/swwtp_spring21_fc_qaqc_dat.csv", row.names = F)
swwtp_spring21 %<>%
filter(!sample_type %in% c("MB","LCS"))
## join SGS 2021 data to Spring 2021 Fecal Coliform data from SWWTP
dat <- bind_rows(dat,swwtp_spring21) %>%
select(-location)
rm(swwtp_spring21)
########################### Part B: Taurianen FC Data Read In (Summer 2021) ##############################################
taur_summer21_rec_date <- "2021-07-27"
taur_summer21_rec_time <- "13:37:00"
## read in taurianen summer 2021 results
taur_summer21 <- read_excel("other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/Fecal_Coliform_Results_Spreadsheet.xlsx", skip = 3) %>%
clean_names() %>%
select(-qc1,-data_entry,-qc2) %>%
## move info about duplicate sample and/or sample blank status into separate column
mutate(sample_condition = case_when(
grepl("DUP",sample_location) ~ "Field Duplicate")) %>%
# remove "DUP" designation from sample_location column
mutate(sample_location = (str_replace(sample_location, "_DUP", ""))) %>%
# trim white spaces in site name column
mutate(sample_location = str_trim(sample_location,"right")) %>%
## add known info about times/dates, correct formats and column names
mutate(collect_date = mdy(summer21_sample_date),
run_date = mdy(summer21_sample_date),
run_time = as_hms(time_relinquished),
analysis_date_out = mdy("7/28/2021"),
analysis_time_out = as_hms(time_tested),
# time/date received at lab from chain of custody
rec_date = ymd(taur_summer21_rec_date),
rec_time = as_hms(taur_summer21_rec_time),
# drop old columns
.keep = "unused") %>%
select(-date_of_testing,-neg_pos) %>%
transform(time_sampled = as_hms(time_sampled)) %>%
## add lab name
mutate(lab_name = "Taurianen Engineering and Testing, Soldotna, Alaska") %>%
# rename columns
rename(sample = sample_location,
collect_time = time_sampled)
# NOTE: for Taurianan QA/QC practices, see email from from Taurianen at "other/documents/references/Taurianen QA Technique (Email march 2022).docx" (folder in this project repo)
## fix site naming and terminology
# generate spreadsheet of unique site names from taurianen dataset
taur_summer21_sites <- data.frame(unique(taur_summer21$sample)) %>%
rename(sample = unique.taur_summer21.sample.)
# export site names list to spreadsheet
write.xlsx(taur_summer21_sites, "other/input/AQWMS/taurianen_site_names_matching_table.xlsx")
# manually edit a new spreadsheet such that taurianen site names are paired iwth AWQMS site names
# read in manually edited site names sheet
taur_summer21_sites <- read_excel("other/input/AQWMS/taurianen_site_names_matching_table_manual_edit.xlsx")
# join AWQMS site names to taurianen data
taur_summer21 <- left_join(taur_summer21,taur_summer21_sites,by = "sample")
## add and/or rename other columns to match SWWTP dataframe structure
taur_summer21 %<>%
clean_names() %>%
select(-direct_count) %>%
rename(result = number_of_colonies) %>%
mutate(note = "",
matrix = "Water",
analytical_method = "9222 D ~ Membrane filtration test for fecal coliforms",
analyte = "Fecal Coliform",
units = "cfu/100ml",
# loq = reporting limit
loq = 1,
epa_analysis_id = "9222D",
context_code = "APHA") %>%
transform(result = as.double(result))
# assign "sample_type"
taur_summer21 %<>% mutate(sample_type = case_when(
sample_condition == "Lab Blank" ~ "MB", # method blank
sample_condition == "Positive Control" ~ "LCS", # laboratory control sample
TRUE ~ "PS"
))
########### Address Non-Detect values #########################
# Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.
# Assign resultflag column. Use "=" if result > 1 and "U" if result < 1. See pg 17 of 2020 QAPP at "other/documents/QAPP/QAPP-2020-KenaiRiverWatershed_ZnCu.pdf"
taur_summer21 %<>%
mutate(resultflag = case_when(
result < 1 ~ "U",
TRUE ~ "="))
# modify non-detect values from "0" to "NA" if resultflag = U or ND
taur_summer21 %<>%
mutate(result1 = na_if(result,0)) %>%
select(-result) %>%
rename(result = result1)
# segregate lab results from field results, and write lab qa/qc results to external csv
taur_summer21_qaqc_dat <- taur_summer21 %>%
filter(sample_type %in% c("MB","LCS"))
write.csv(taur_summer21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/taur_summer21_qaqc_dat.csv", row.names = F)
# join 2021 Taurianen Fecal Coliform data into overall dataframe so far
dat <- bind_rows(dat,taur_summer21)
rm(taur_summer21,taur_summer21_sites,taur_summer21_qaqc_dat,swwtp_spring21_qaqc_dat)
```
</details>
<br>
------------------------------------------------------------------------
##### Total Dissolved Solids Lab Results (Soldotna Wastewater Treatment Plant (SWWTP))
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF TSS MONITORING 05-11-21.xlsx', text = "Download Original Spring 2021 Total Suspended Solids Results from SWWTP.xlsx")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/KRWF TSS MONITORING 07-28-21.xlsx', text = "Download Original Summer 2021 Total Suspended Solids Results from SWWTP.xlsx")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/SWWTP_Spring_2021_TCC_FC.jpg', text = "Download Spring 2021 Total Suspended Solids Chain of Custody")
```
```{r, echo = F}
xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/TSS_CoC_SWWTP_Summer2021.pdf', text = "Download Summer 2021 Total Suspended Solids Chain of Custody")
```
<details>
<summary>
*Show/Hide Code used to Prepare 2021 Total Dissolved Solids Results*
</summary>
```{r}
# SWWTP Spring 2021 TSS data
## Reformat TSS data to match AQWMS template
# read in
swwtp_tss_spring21 <- read_excel('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF TSS MONITORING 05-11-21.xlsx', skip = 1, sheet = "Updated_Formatting") %>%
clean_names() %>%
transform(date_of_analysis = anydate(date_of_analysis)) %>%
# add info from lab COC
mutate(rec_date = ymd_hms("2021-05-11 14:00:00"))
swwtp_tss_summer21 <- read_excel('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/KRWF TSS MONITORING 07-28-21.xlsx', skip = 1, sheet = "Updated_Formatting") %>%
clean_names() %>%
transform(sample_time = anytime(sample_time)) %>%
# add info from lab COC
mutate(rec_date = ymd_hms("2021-07-27 14:00:00"))
# combine spring & summer
swwtp_tss21 <- bind_rows(swwtp_tss_spring21,swwtp_tss_summer21) %>%
remove_empty()
rm(swwtp_tss_spring21,swwtp_tss_summer21)
# prepare and format to match larger dataset
# miscellaneous
swwtp_tss21 %<>%
select(-qc1,-data_entry,-x8) %>%
rename(analysis_time = time) %>%
transform(sample_time = as_hms(sample_time),
analysis_time = as_hms(analysis_time)) %>%
# move info about duplicate sample and/or sample blank status into separate column
mutate(sample_condition = case_when(
grepl("DUP",sample_location) ~ "Field Duplicate")) %>%
# remove "DUP" designation from locations column
mutate(sample_location = str_replace(sample_location, "_DUP", "")) %>%
# replace "O" with zeros in location column
mutate(sample_location = str_replace(sample_location, "RM_O", "RM_0")) %>%
# add units of suspended solids
mutate(units = "mg/l") %>%
rename(result = s_s_mg_l) %>%
transform(result = as.numeric(result)) %>%
# add info about EPA analysis type from AWQMS template
mutate(epa_analysis_id = "2540-D",
analytical_method = "SM21-2540-+D",
context_code = "APHA",
note = "") %>%
# remove tare and paper weight values
select(-dried_wt,-paper_wt,-tare_wt_kg, -ml) %>%
# modify date/time formats
mutate(collect_date = as.character(paste(field_sample_date,sample_time)),
run_date_time = as.character(paste(date_of_analysis,analysis_time)), .keep = "unused") %>%
mutate(collect_time = as_hms(as.POSIXct(collect_date))) %>%
mutate(collect_date = date(as.POSIXct(collect_date)),
run_time = as_hms(ymd_hms(run_date_time)),
run_date = date(ymd_hms(run_date_time)),.keep = "unused") %>%
# rename
rename(analyst = signature) %>%
# miscellaneous
mutate(lab_sample = "",
matrix = "Water",
analyte = "Total suspended solids",
# loq = reporting limit
loq = 1.0,
# lod = sensitivity, or method detection limit
lod = 0.31,
lab_name = "Soldotna Wastewater Treatment Plant, Soldotna, Alaska")
# assign "sample_type"
swwtp_tss21 %<>%
mutate(sample_type = case_when(
sample_condition == "Lab Blank" ~ "MB", # method blank
sample_condition == "Positive Control" ~ "LCS", # laboratory control sample
TRUE ~ "PS"
))
# get site names consistent with AWQMS format
swwtp_tss_sitenames <- data.frame(unique(swwtp_tss21$sample_location))
# delete existing csv if present
unlink("other/input/AQWMS/swwtp_tss_sitenames.csv")
# export csv of swwtp_tss site names
write.csv(swwtp_tss_sitenames,"other/input/AQWMS/swwtp_tss_sitenames.csv",row.names = F)
# use this list to create manually edited file, matched to AWQMS template names
# read in manually edited file
swwtp_tss_sitenames <- read_excel("other/input/AQWMS/swwtp_tss_site_names_matching_table_manual_edit.xlsx")
# join correct site names to overall 2021 TSS dataset
swwtp_tss21 <- left_join(swwtp_tss21,swwtp_tss_sitenames) %>%
clean_names() %>%
rename(sample = sample_location)
########### Address Non-Detect values #########################
# Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.
# Assign resultflag column. Use "=" if result > 1 and "U" if result < 1. See pg 17 of 2020 QAPP at "other/documents/QAPP/QAPP-2020-KenaiRiverWatershed_ZnCu.pdf"
swwtp_tss21 %<>%
mutate(resultflag = case_when(
result < 1 & result > 0.31 ~ "J",
result < 0.31 ~ "U",
TRUE ~ "="))
# modify non-detect values from "0" to "NA" if resultflag = U or ND
swwtp_tss21 %<>%
mutate(result1 = na_if(result,0)) %>%
select(-result) %>%
rename(result = result1)
# in future scripts need to ensure that non-detect "0" results (resultflag = "U") are shown as "NA" rather than zero.
######## 3/21/2022 --> no qa/qc results from SWWTP TSS on record yet. contact by email to see whats available. looking for lab duplicate and/or external qc check standard (pg 34 of QAPP)
# no response as of 3/29/22. need to call to ensure practice/records are in place by spring 2022
# segregate lab results from field results, and write lab qa/qc results to external csv
#swwtp_tss21_qaqc_dat <- swwtp_tss21 %>%
# filter(sample_type %in% c("MB","LCS"))
#write.csv(swwtp_tss21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/swwtp_tss21_qaqc_dat.csv", row.names = F)
#rm(swwtp_tss21_qaqc_dat)
# join TSS data with overall dataset
dat <- bind_rows(dat,swwtp_tss21)
rm(swwtp_spring21,swwtp_tss_sitenames,swwtp_tss21)
```
</details>
------------------------------------------------------------------------
<details>
<summary>
*Additional Miscellaneous Code Used to Prepare Data For AQWMS Template*
</summary>
```{r}
############### Miscellaneous Steps for Overall Field Results Dataframe #################
############# a.) filter out lab blanks and positive lab controls ##################
#dat %<>%
# filter(!sample_condition %in% c("Lab Blank","Positive Control")) # should already be gone now 3/4/22
############ b.) match latitude and longitude coordinates to sites ##################
## read in coordinates
site_coords <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "site_coordinates") %>%
remove_empty()
## join coords to overall df
dat <- left_join(dat,site_coords)
############ c.) assign "result sample fraction" (e.g. filtered, dissolved, etc.) ############
## read in manually organized table that pairs "result sample fraction" with "analytical method"
result_sample_fraction <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "result_sample_fraction") %>%
filter(!is.na(analytical_method)) %>%
select(-description)
## join to table
dat <- left_join(dat,result_sample_fraction)
################ d.) assign "result detection condition"##############
## read in manually assigned join table
result_detection_condition <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "result_detection_condition") %>%
clean_names() %>%
filter(!is.na(resultflag))
## join to table
## z <- dat %>% left_join(dat,result_detection_condition, by = "resultflag")
## the attempt at left_join above is exhibiting nonsensical result for unclear reason. For now, define programmatically:
dat %<>%
mutate(result_detection_condition = case_when(
resultflag == "U" | resultflag == "ND" ~ "Not Detected",
resultflag == "J" ~ "Present Below Quantification Limit"))
############### e.) assign chemical preservative type ################
## read in table
chemical_preservative <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "chemical_preservative") %>%
filter(!is.na(preservative)) %>%