Skip to content

Commit

Permalink
Merge pull request #26 from cmusso86/update_release
Browse files Browse the repository at this point in the history
complementando vignette e tentativa de calll-out
  • Loading branch information
cmusso86 authored Jun 30, 2024
2 parents 94d02e4 + 139857b commit dcfbc98
Show file tree
Hide file tree
Showing 7 changed files with 88 additions and 14 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ Authors@R:
role = c("aut","ths", 'cph'),
email = "[email protected]",
comment = c(ORCID = "0000-0003-2009-4844")))
Description: Enables the diagnostics and enhancement of regression model calibration.It offers both global and local visualization tools for calibration diagnostics and provides one recalibration method: Torres R, Nott DJ, Sisson SA, Rodrigues T, Reis JG, Rodrigues GS (2024) <doi:10.48550/arXiv.2403.05756>. The method leverages on Probabilistic Integral Transform (PIT) values to both evaluate and perform the calibration of statistical models.
Description: Enables the diagnostics and enhancement of regression model calibration.It offers both global and local visualization tools for calibration diagnostics and provides one recalibration method: Torres R, Nott DJ, Sisson SA, Rodrigues T, Reis JG, Rodrigues GS (2024) <doi:10.48550/arXiv.2403.05756>. The method leverages on Probabilistic Integral Transform (PIT) values to both evaluate and perform the calibration of statistical models. For a more detailed description of the package, please refer to the bachelor's thesis available bellow.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
URL: https://github.com/cmusso86/recalibratiNN, https://cmusso86.github.io/recalibratiNN/
URL: https://bdm.unb.br/handle/10483/38504, https://github.com/cmusso86/recalibratiNN, https://cmusso86.github.io/recalibratiNN/
BugReports: https://github.com/cmusso86/recalibratiNN/issues
Imports:
stats(>= 3.0.0),
Expand Down
3 changes: 2 additions & 1 deletion R/recalibrate.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#' @description
#' This function offers recalibration techniques for regression models that assume Gaussian distributions by using the
#' Mean Squared Error (MSE) as the loss function. Based on the work by Torres R. et al. (2024), it supports
#' both local and global recalibration approaches to provide samples from a recalibrated predictive distribution.
#' both local and global recalibration approaches to provide samples from a recalibrated predictive distribution. A detailed algorithm can also be found in Musso C. (2023).
#'
#' @param yhat_new Numeric vector with predicted response values for the new (or test) set.
#' @param space_cal Numeric matrix or data frame representing the covariates/features of the calibration/validation set,
Expand Down Expand Up @@ -43,6 +43,7 @@
#'
#' @references
#' \insertRef{torres2024}{recalibratiNN}
#' \insertRef{musso2023}{recalibratiNN}
#'
#' @examples
#'
Expand Down
2 changes: 2 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ output: github_document
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = F,
message = F,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%",
Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,15 @@ download.
``` r
if(!require(pacman)) install.packages("pacman")
pacman::p_load_current_gh("cmusso86/recalibratiNN")
#> crayon (1.5.2 -> 1.5.3) [CRAN]
#> cli (3.6.2 -> 3.6.3) [CRAN]
#>
#> The downloaded binary packages are in
#> /var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T//Rtmpx2IcOw/downloaded_packages
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> checking for file ‘/private/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T/RtmpCPgUPL/remotes159257e41e1ea/cmusso86-recalibratiNN-c947b5d/DESCRIPTION’ ... ✔ checking for file ‘/private/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T/RtmpCPgUPL/remotes159257e41e1ea/cmusso86-recalibratiNN-c947b5d/DESCRIPTION’
#> checking for file ‘/private/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T/Rtmpx2IcOw/remotes17f90582977a6/cmusso86-recalibratiNN-94d02e4/DESCRIPTION’ ... ✔ checking for file ‘/private/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T/Rtmpx2IcOw/remotes17f90582977a6/cmusso86-recalibratiNN-94d02e4/DESCRIPTION’
#> ─ preparing ‘recalibratiNN’:
#> checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information
#> checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information
#> ─ installing the package to process help pages
#> Loading required namespace: recalibratiNN
#> ─ saving partial Rd database
Expand All @@ -60,6 +64,9 @@ pacman::p_load_current_gh("cmusso86/recalibratiNN")
#> WARNING: Added dependency on R >= 3.5.0 because serialized objects in
#> serialize/load version 3 cannot be read in older versions of R.
#> File(s) containing such objects:
#> ‘recalibratiNN/inst/extdata/mse_cal.rds’
#> ‘recalibratiNN/inst/extdata/y_hat_cal.rds’
#> ‘recalibratiNN/inst/extdata/y_hat_test.rds’
#> ‘recalibratiNN/vignettes/mse_cal.rds’
#> ‘recalibratiNN/vignettes/y_hat_cal.rds’
#> ‘recalibratiNN/vignettes/y_hat_test.rds’
Expand Down
11 changes: 11 additions & 0 deletions inst/REFERENCES.bib
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,14 @@ @article{torres2024
acm_classes={G.3; I.5.1; I.6.4},
doi={10.48550/arXiv.2403.05756}
}


@misc{musso2023,
author = {Carolina Musso},
title = {Recalibration of Gaussian Neural Network Regression Models: The RecalibratiNN Package},
year = {2023},
howpublished = {Undergraduate Thesis (Bachelor in Statistics), University of Brasília},
note = {Available at: \url{https://bdm.unb.br/handle/10483/38504}},
month = {Dec},
year = {2023}
}
3 changes: 2 additions & 1 deletion man/recalibrate.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

68 changes: 60 additions & 8 deletions vignettes/simple_mlp.Rmd
Original file line number Diff line number Diff line change
@@ -1,22 +1,59 @@
---
title: "ANN ajusted to bidimensional data"
subtitle: "A visual example of how to recalibrate a neural network"
title: "Recalibrating the predicitions of an ANN."
subtitle: "A visual example of recalibration using bidimensional data."
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{ANN ajusted to bidimensional data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
header-includes:
- \usepackage{amsmath}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
message = FALSE,
comment = "#>"
)
```

## PROBLEM

The calibration of a model can be evaluated by comparing observed values with their respective estimated conditional (or predictive) distributions. This evaluation can be conducted globally, examining overall calibration, or locally, investigating calibration in specific parts of the covariate space. To better illustrate how the package can improve a models's calibration, let's consider some artificial examples.

```{r setup}
library(recalibratiNN)
```

In the following example, we are going to recalibrate the predictions of an artificial neural network (ANN) model to non-linear heteroscedastic data. First we will simulate some data as follows:


We define the sample size:
\begin{equation}
n = 10000
\end{equation}

The vectors \(x_1\) and \(x_2\) are generated from uniform distributions:
\begin{equation}
x_1 \sim \text{Uniform}(-3, 3)
\end{equation}
\begin{equation}
x_2 \sim \text{Uniform}(-5, 5)
\end{equation}

We define the function \(\mu\) as:
\begin{equation}
\mu(x) = \left| x_1^3 - 50 \sin(x_2) + 30 \right|
\end{equation}

The response variable \(y\) is generated from a normal distribution with mean \(\mu\) and standard deviation \(20 \left| \frac{x_2}{x_1 + 10} \right|\):
\begin{equation}
y \sim \mathcal{N}\left(\mu, 20 \left| \frac{x_2}{x_1 + 10} \right|\right)
\end{equation}

```{r echo = F}
library(glue)
library(RANN)
Expand Down Expand Up @@ -56,6 +93,7 @@ y_test <- y[(split2*n+1):n]
```

Now, this toy model was trained using the Keras framework with TensorFlow backend. The ANN architecture consist of an ANN with 3 hidden layers with ReLU activation functions and dropout for regularization as follows:

```{r, eval=F}
model_nn <- keras_model_sequential()
Expand Down Expand Up @@ -118,22 +156,27 @@ y_hat_cal <- readRDS(file_path2)|> as.numeric()
file_path3 <- system.file("extdata", "y_hat_test.rds", package = "recalibratiNN")
y_hat_test <- readRDS(file_path3)|> as.numeric()
```

## MISCALIBRATION DISGNOSTICS

Now, we can evaluate the calibration of the appropriate functions of the recalibratiNN package. Firstly, we will calculate the Probability Integral Transform (PIT) values using the `PIT_global` function and visualize them using the `gg_PIT_global` function.

```{r}
We can observe in this graph that, globally, the model shows significant miscalibration (deviating from a uniform distribution)

```{r}
## Global calibrations
pit <- PIT_global(ycal = y_cal,
yhat = y_hat_cal,
mse = MSE_cal)
gg_PIT_global(pit)
```

For comparison, we will also calculate the local PIT values using the local functions. This is important because the model may be well calibrated globally but not locally. In other words, it may exhibit varying or even opposing patterns of miscalibration throughout the covariate space, which can be compensated for when analyzed globally.

Here, we can see that the model is miscalibrated differently according to the regions.

```{r}
pit_local <- PIT_local(xcal = x_cal,
ycal = y_cal,
Expand All @@ -143,9 +186,16 @@ pit_local <- PIT_local(xcal = x_cal,
gg_PIT_local(pit_local,
facet = TRUE)
```

Since this example consists of bidimensional data, we visualize the calibration of the model on a surface representing the covariate space. In this graph, we used a 95% Confidence Interval centered on the mean predicted by the model, with a fixed variance estimated by the Mean Squared Error (MSE). When the true value falls within the interval, it is colored greenish; when it falls outside, it is colored red.

::: {.callout-note}
Note that this visualization is not part of the recalibratiNN package since it can only be applied to bidimensional data, which is not typically the case when adjusting neural networks. This example was used specifically to demonstrate (mis)calibration visually and to make the concept more tangible.
:::

The following graph illustrates the original coverage of the model, which is around 90%. Thus, globally, we observe that the model underestimates the true uncertainty of the data (90% < 95%). However, despite the global coverage being approximately 90%, there are specific regions where the model consistently makes more incorrect predictions (falling well below the 95% mark), while accurately predicting (100%) within other regions. Although this last part may initially seem favorable (more accuracy is typically desirable), it indicates that the uncertainty of the predictions is not adequately captured by the model (overestimated) . This highlights the importance of interpreting predictions probabilistically, considering a distribution rather than just a point prediction.

```{r}
coverage_model <- tibble(
x1cal = x_test[,1],
Expand All @@ -160,19 +210,21 @@ mutate(lwr = qnorm(0.05, y_hat, sqrt(MSE_cal)),
)
coverage_model |>
arrange(CI) |>
ggplot() +
geom_point(aes(x1cal,
x2cal,
color = CI),
alpha = 0.8)+
alpha = 0.9,
size = 3)+
labs(x="x1" , y="x2",
title = glue("Original coverage: {coverage_model$coverage[1]} %"))+
scale_color_manual("Confidence Interval",
values = c("in" = "aquamarine3",
"out" = "steelblue4"))+
theme_classic()
```

##
```{r}
recalibrated <-
recalibrate(
Expand Down

0 comments on commit dcfbc98

Please sign in to comment.