TADAOutliers #47

cristinamullin · 2022-04-15T12:42:12Z

Consider adding outlier information to TADA stats function.

Append one or two additional columns to the dataset flagging outliers at the individual station/char level and/or at the all stations/char level.

Add new function input for stats to flag outliers across single station (input ID) or all stations:
Scale = AllStations
Scale = IndividualStations

cristinamullin · 2023-04-14T21:12:02Z

We need to be cautious about removal of outliers in environmental datasets.

This would only provide an option to review and remove data that are different than approximately 99% of the data available for a given parameter and unit combination. This is only to try to catch invalid data - many outliers are still valid results.

The tool would provide an option to flag data that falls above or below these values:
Upper Outlier = 75th Percentile + 1.5 * (75th percentile - 25th percentile)
Lower Outlier = 25th Percentile - 1.5 * (75th percentile - 25th percentile)

Jim Hagy (see TADA Working Group notes: https://usepa.sharepoint.com/:w:/r/sites/AutomatedDataAnalysisWorkingGroup/_layouts/15/Doc.aspx?sourcedoc=%7BC74D9A1C-DCEE-46B1-AC07-E05AD63E2714%7D&file=IssuePaper_RetrievalQAQC_Jan2021.docx&action=default&mobileredirect=true): If would be useful to be able to select whether this flagging process is applied to the original data or the log of the data. For data that are strongly log-normally distributed, many valid observations will be >1.5*IQR above the 75th percentile. But if you applied those percentiles to the logs, it would be a different story.

This is one place, where the distribution charts become helpful. We could apply the outlier test to original data or log of the data depending on the data distribution. See examples in CDC app: https://ergapps.shinyapps.io/atsdrepc/

cristinamullin · 2023-04-14T21:58:49Z

This topic could potentially be related to the censored data method used for each characteristic (but feel free to move this to a new issue):

Example.....

Cristina- is 1/x useful?
Lesley Merrick (OR) - they use it when the detection limit (or ½ detection limit) is above the water quality standard, particularly when using geomean. This is our white paper on using censored data in the IR. https://www.oregon.gov/deq/FilterDocs/iriCensoredData.pdf

cristinamullin · 2024-05-24T22:19:18Z

This issue is related to the TADA Shiny issue and pending development of an outlier tab: USEPA/TADAShiny#137

hillarymarler · 2024-10-28T18:22:57Z

A few existing packages related to outliers:

envoutliers: Methods for Identification of Outliers in Environmental Data - https://cran.r-project.org/web/packages/envoutliers/index.html
EnvStats: Package for Environmental Statistics, Including US EPA Guidance - https://cran.r-project.org/web/packages/EnvStats/index.html (some outlier functions)
outliers: A collection of some tests commonly used for identifying outliers - https://cran.r-project.org/web/packages/outliers/index.html

@cristinamullin are there any notes from previous working group discussions that might be helpful for me to review on this topic?

@wokenny13 the EnvStats package might be useful to check out for some of the mod 3 functions.

cristinamullin added the Tables&Figures label May 4, 2022

cristinamullin assigned katiehealy Feb 24, 2023

cristinamullin mentioned this issue Mar 9, 2023

Stats/Histogram/Boxplot/Outlier Tab USEPA/TADAShiny#27

Closed

cristinamullin unassigned katiehealy Mar 29, 2023

cristinamullin closed this as completed Mar 29, 2023

cristinamullin reopened this Apr 14, 2023

cristinamullin added the Good First Issue Good issue for first time contributors label Nov 13, 2023

cristinamullin mentioned this issue Feb 12, 2024

Outliers tab USEPA/TADAShiny#137

Open

cristinamullin added Top Priority Module 1 MVP labels Feb 12, 2024

cristinamullin assigned cristinamullin and unassigned cristinamullin Mar 19, 2024

cristinamullin added the ERG Discussion label Mar 21, 2024

cristinamullin removed the ERG Discussion label May 1, 2024

cristinamullin self-assigned this May 1, 2024

cristinamullin assigned wokenny13 and unassigned cristinamullin May 23, 2024

cristinamullin unassigned wokenny13 Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TADAOutliers #47

TADAOutliers #47

cristinamullin commented Apr 15, 2022 •

edited

Loading

cristinamullin commented Apr 14, 2023 •

edited

Loading

cristinamullin commented Apr 14, 2023 •

edited

Loading

cristinamullin commented May 24, 2024

hillarymarler commented Oct 28, 2024

TADAOutliers #47

TADAOutliers #47

Comments

cristinamullin commented Apr 15, 2022 • edited Loading

cristinamullin commented Apr 14, 2023 • edited Loading

cristinamullin commented Apr 14, 2023 • edited Loading

cristinamullin commented May 24, 2024

hillarymarler commented Oct 28, 2024

cristinamullin commented Apr 15, 2022 •

edited

Loading

cristinamullin commented Apr 14, 2023 •

edited

Loading

cristinamullin commented Apr 14, 2023 •

edited

Loading