Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of weighted results in surprising behaviour in the estimate #344

Open
joethorley opened this issue Jan 22, 2024 · 6 comments
Open

Use of weighted results in surprising behaviour in the estimate #344

joethorley opened this issue Jan 22, 2024 · 6 comments

Comments

@joethorley
Copy link
Collaborator

library(ssdtools)
#> Please replace the following in your scripts:
#> - `ssdtools::boron_data` with `ssddata::ccme_boron`
#> - `ssdtools::ccme_data` with `ssddata::ccme_data`
library(ssddata)

data <- ssddata::ccme_boron

data$Weight <- 1
data$Weight[rank(data$Conc) > 6] <- 1/10

fitall <- ssd_fit_dists(data, dists="lnorm")
ssd_hc(fitall)
#> # A tibble: 1 × 10
#>   dist    percent   est    se   lcl   ucl    wt method     nboot pboot
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <int> <dbl>
#> 1 average       5  1.68    NA    NA    NA     1 parametric     0    NA

fit1 <- ssd_fit_dists(subset(data, Weight == 1), dists="lnorm")
ssd_hc(fit1)
#> # A tibble: 1 × 10
#>   dist    percent   est    se   lcl   ucl    wt method     nboot pboot
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <int> <dbl>
#> 1 average       5  1.04    NA    NA    NA     1 parametric     0    NA

fit1w <- ssd_fit_dists(subset(data, Weight == 1), dists="lnorm", weight = "Weight")
ssd_hc(fit1w)
#> # A tibble: 1 × 10
#>   dist    percent   est    se   lcl   ucl    wt method     nboot pboot
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <int> <dbl>
#> 1 average       5  1.04    NA    NA    NA     1 parametric     0    NA

fitallw10 <- ssd_fit_dists(data, dists="lnorm", weight = "Weight")
ssd_hc(fitallw10)
#> # A tibble: 1 × 10
#>   dist    percent   est    se   lcl   ucl    wt method     nboot pboot
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <int> <dbl>
#> 1 average       5 0.547    NA    NA    NA     1 parametric     0    NA

Created on 2024-01-22 with reprex v2.1.0

@joethorley
Copy link
Collaborator Author

The last value with all the unequally weighted data appears to be incorrect.

@joethorley joethorley changed the title Use of weighted appears to be resulting in an underestimate of the estimate Use of weighted results in unexpected behaviour Feb 5, 2024
@joethorley joethorley changed the title Use of weighted results in unexpected behaviour Use of weighted results in unexpected behaviour in the estimate Feb 5, 2024
@joethorley
Copy link
Collaborator Author

However the calculations are correct.

As David Fox confirmed

I can further confirm that the introduction of weights in the log-likelihood function has been done correctly. I have validated this several ways: (i) writing my own code in R; (ii) performing the calculations outside of R with different mathematical/statistical software; (iii) and comparing with fitdistrplus.

I believe the matter of SSD weighting is still an unresolved issue because what ssdtools (and fitdistrplus) are doing is not what ecotoxciologists want or expect.

@joethorley joethorley changed the title Use of weighted results in unexpected behaviour in the estimate Use of weighted results in surprising behaviour in the estimate Feb 5, 2024
@joethorley
Copy link
Collaborator Author

We will turn off this functionality until a better solution is found.

@atillmanns
Copy link
Collaborator

atillmanns commented Feb 14, 2024

I disagree with what David Fox has written above. Weighting the data in a tail end of a distribution will likely increase the length of the tails. This is because these data points will be the focus of fitting the distribution. This could be useful if the species in the tails are of great importance and if there are few data points to represent the class of taxa that are found to be sensitive. In this case it would be a useful too to have to weight the distribution more on these species so an estimate can be made that will be protective of that group of species. I do not think this function should be removed as it is not erroneous. It just does not behave similar to weighting when it is done in a regression. I therefore think this functionality should not be removed.

@beckyfisher
Copy link
Contributor

One potential resolution for this is to leave the weighting function based on log-likelihood available, but include a section in the technical details vignette explaining that the current implementation is correct, but that this may result in unexpected behavior. Text from the Phase III report should make a good starting point for this vignette update.

@atillmanns
Copy link
Collaborator

I agree. I think we just need to explain what weighting the tails means when fitting distributions using MDL and how this can be useful for ecotoxicologists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants