Merge branch 'dev' into rm-baselines

alafage · web-flow · commit ebb43c530d29 · 2025-08-26T16:22:21.000+02:00
diff --git a/.github/workflows/build-docs.yml b/.github/workflows/build-docs.yml
@@ -14,7 +14,7 @@ env:
 
 jobs:
   documentation:
-    runs-on: self-hosted
+    runs-on: [self-hosted]
     steps:
     - uses: actions/checkout@v4
 
diff --git a/README.md b/README.md
@@ -105,6 +105,7 @@ Check out all our tutorials at [torch-uncertainty.github.io/auto_tutorials](http
 
 The following projects use TorchUncertainty:
 
+- _Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation_ - [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/Franchi_Towards_Understanding_and_Quantifying_Uncertainty_for_Text-to-Image_Generation_CVPR_2025_paper.pdf)
 - _Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It_ - [ICLR 2025](https://arxiv.org/abs/2403.14715)
 - _A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors_ - [ICLR 2024](https://arxiv.org/abs/2310.08287)
 
diff --git a/auto_tutorial_source/Bayesian_Methods/tutorial_bayesian.py b/auto_tutorial_source/Bayesian_Methods/tutorial_bayesian.py
@@ -16,8 +16,7 @@
 For more information on Bayesian Neural Networks, we refer to the following resources:
 
 - Weight Uncertainty in Neural Networks `ICML2015 <https://arxiv.org/pdf/1505.05424.pdf>`_
-- Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users `IEEE Computational Intelligence Magazine
-    <https://arxiv.org/pdf/2007.06823.pdf>`_
+- Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users `IEEE Computational Intelligence Magazine <https://arxiv.org/pdf/2007.06823.pdf>`_
 
 Training a Bayesian LeNet using TorchUncertainty models and Lightning
 ---------------------------------------------------------------------
diff --git a/auto_tutorial_source/Classification/tutorial_ood_detection.py b/auto_tutorial_source/Classification/tutorial_ood_detection.py
@@ -146,4 +146,5 @@
 # ----------
 #
 # [1] Hendrycks, D., & Gimpel, K. (2016). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR 2017.
+#
 # [2] Hendrycks, D., Basart, S., Mazeika, M., Zou, A., Kwon, J., Mostajabi, M., ... & Song, D. (2019). Scaling out-of-distribution detection for real-world settings. In ICML 2022.
diff --git a/auto_tutorial_source/Ensemble_Methods/tutorial_from_de_to_pe.py b/auto_tutorial_source/Ensemble_Methods/tutorial_from_de_to_pe.py
@@ -3,7 +3,11 @@
 Improved Ensemble parameter-efficiency with Packed-Ensembles
 ============================================================
 
-*This tutorial is adapted from a notebook part of a lecture given at the `Helmholtz AI Conference <https://haicon24.de/>`_ by Sebastian Starke, Peter Steinbach, Gianni Franchi, and Olivier Laurent.*
+*This tutorial is adapted from a notebook part of a lecture given at the* |conference|_ *by Sebastian Starke, Peter Steinbach, Gianni Franchi, and Olivier Laurent.*
+
+.. _conference: https://haicon24.de/
+
+.. |conference| replace:: *Helmholtz AI Conference*
 
 In this notebook will work on the MNIST dataset that was introduced by Corinna Cortes, Christopher J.C. Burges, and later modified by Yann LeCun in the foundational paper:
 
@@ -12,6 +16,7 @@
 The MNIST dataset consists of 70 000 images of handwritten digits from 0 to 9. The images are grayscale and 28x28-pixel sized. The task is to classify the images into their respective digits. The dataset can be automatically downloaded using the `torchvision` library.
 
 In this notebook, we will train a model and an ensemble on this task and evaluate their performance. The performance will consist in the following metrics:
+
 - Accuracy: the proportion of correctly classified images,
 - Brier score: a measure of the quality of the predicted probabilities,
 - Calibration error: a measure of the calibration of the predicted probabilities,
@@ -174,13 +179,16 @@ def optim_recipe(model, lr_mult: float = 1.0):
 # This table provides a lot of information:
 #
 # **OOD Detection: Binary Classification MNIST vs. FashionMNIST**
+#
 # - AUPR/AUROC/FPR95: Measures the quality of the OOD detection. The higher the better for AUPR and AUROC, the lower the better for FPR95.
 #
 # **Calibration: Reliability of the Predictions**
+#
 # - ECE: Expected Calibration Error. The lower the better.
 # - aECE: Adaptive Expected Calibration Error. The lower the better. (~More precise version of the ECE)
 #
 # **Classification Performance**
+#
 # - Accuracy: The ratio of correctly classified images. The higher the better.
 # - Brier: The quality of the predicted probabilities (Mean Squared Error of the predictions vs. ground-truth). The lower the better.
 # - Negative Log-Likelihood: The value of the loss on the test set. The lower the better.
@@ -236,7 +244,7 @@ def optim_recipe(model, lr_mult: float = 1.0):
 # We need to multiply the learning rate by 2 to account for the fact that we have 2 models
 # in the ensemble and that we average the loss over all the predictions.
 #
-# #### Downloading the pre-trained models
+# **Downloading the pre-trained models**
 #
 # We have put the pre-trained models on Hugging Face that you can download with the utility function
 # "hf_hub_download" imported just below. These models are trained for 75 epochs and are therefore not
@@ -393,9 +401,11 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
 # In constrast to calibration, the values of the confidence scores are not important, only the order of the scores. *Ideally, the best model will order all the correct predictions first, and all the incorrect predictions last.* In this case, there will be a threshold so that all the predictions above the threshold are correct, and all the predictions below the threshold are incorrect.
 #
 # In TorchUncertainty, we look at 3 different metrics for selective classification:
+#
 # - **AURC**: The area under the Risk (% of errors) vs. Coverage (% of classified samples) curve. This curve expresses how the risk of the model evolves as we increase the coverage (the proportion of predictions that are above the selection threshold). This metric will be minimized by a model able to perfectly separate the correct and incorrect predictions.
 #
 # The following metrics are computed at a fixed risk and coverage level and that have practical interests. The idea of these metrics is that you can set the selection threshold to achieve a certain level of risk and coverage, as required by the technical constraints of your application:
+#
 # - **Coverage at 5% Risk**: The proportion of predictions that are above the selection threshold when it is set for the risk to egal 5%. Set the risk threshold to your application constraints. The higher the better.
 # - **Risk at 80% Coverage**: The proportion of errors when the coverage is set to 80%. Set the coverage threshold to your application constraints. The lower the better.
 #
diff --git a/auto_tutorial_source/Post_Hoc_Methods/tutorial_scaler.py b/auto_tutorial_source/Post_Hoc_Methods/tutorial_scaler.py
@@ -101,6 +101,7 @@
 # We also compute and plot the top-label calibration figure. We see that the
 # model is not well calibrated.
 fig, ax = ece.plot()
+fig.tight_layout()
 fig.show()
 
 # %%
@@ -143,6 +144,7 @@
 # that the model is now better calibrated. If the temperature is greater than 1,
 # the final model is less confident than before.
 fig, ax = ece.plot()
+fig.tight_layout()
 fig.show()
 
 # %%
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -159,7 +159,6 @@ Classes
     BatchEnsemble
     CheckpointCollector
     EMA
-    MCDropout
     StochasticModel
     SWA
     SWAG
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -20,7 +20,7 @@
     f"{datetime.now().year!s}, Adrien Lafage and Olivier Laurent"
 )
 author = "Adrien Lafage and Olivier Laurent"
-release = "0.7.0"
+release = "0.7.0.post1"
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "flit_core.buildapi"
 
 [project]
 name = "torch_uncertainty"
-version = "0.7.0"
+version = "0.7.0.post1"
 authors = [
     { name = "ENSTA U2IS AI", email = "olivier.laurent@ensta.fr" },
     { name = "Adrien Lafage", email = "adrienlafage@outlook.com" },
diff --git a/tests/datamodules/classification/test_uci.py b/tests/datamodules/classification/test_uci.py
@@ -1,3 +1,6 @@
+import warnings
+from urllib.error import URLError
+
 import pytest
 
 from torch_uncertainty.datamodules.classification import (
@@ -13,27 +16,30 @@ class TestHTRU2DataModule:
     """Testing the HTRU2DataModule datamodule class."""
 
     def test_htru2(self) -> None:
-        dm = HTRU2DataModule(root="./data/", batch_size=128)
+        try:
+            dm = HTRU2DataModule(root="./data/", batch_size=128)
 
-        dm.prepare_data()
-        dm.setup()
+            dm.prepare_data()
+            dm.setup()
 
-        dm.train_dataloader()
-        dm.val_dataloader()
-        dm.test_dataloader()
+            dm.train_dataloader()
+            dm.val_dataloader()
+            dm.test_dataloader()
 
-        dm.setup("test")
-        dm.test_dataloader()
+            dm.setup("test")
+            dm.test_dataloader()
 
-        dm = HTRU2DataModule(root="./data/", batch_size=128, val_split=0.1)
+            dm = HTRU2DataModule(root="./data/", batch_size=128, val_split=0.1)
 
-        dm.prepare_data()
-        dm.setup()
+            dm.prepare_data()
+            dm.setup()
 
-        with pytest.raises(ValueError):
-            dm.setup("other")
+            with pytest.raises(ValueError):
+                dm.setup("other")
 
-        dm = BankMarketingDataModule(root="./data/", batch_size=128)
-        dm = DOTA2GamesDataModule(root="./data/", batch_size=128)
-        dm = OnlineShoppersDataModule(root="./data/", batch_size=128)
-        dm = SpamBaseDataModule(root="./data/", batch_size=128)
+            dm = BankMarketingDataModule(root="./data/", batch_size=128)
+            dm = DOTA2GamesDataModule(root="./data/", batch_size=128)
+            dm = OnlineShoppersDataModule(root="./data/", batch_size=128)
+            dm = SpamBaseDataModule(root="./data/", batch_size=128)
+        except URLError as e:
+            warnings.warn(f"Data download failed due to network error: {e}", stacklevel=2)
diff --git a/tests/datamodules/classification/test_ucr_uea.py b/tests/datamodules/classification/test_ucr_uea.py
@@ -16,7 +16,7 @@ def test_ucr_uea_main(self) -> None:
         dm.train_dataloader()
         dm.val_dataloader()
         dm.test_dataloader()
-        
+
         dm = UCRUEADataModule(
             dataset_name="test",
             batch_size=128,
@@ -25,4 +25,3 @@ def test_ucr_uea_main(self) -> None:
         dm.dataset = DummyClassificationDataset
         dm.setup()
         dm.setup("test")
-        
diff --git a/tests/layers/test_batch.py b/tests/layers/test_batch.py
@@ -1,7 +1,12 @@
 import pytest
 import torch
 
-from torch_uncertainty.layers.batch_ensemble import BatchConv1d, BatchConv2d, BatchConvTranspose2d, BatchLinear
+from torch_uncertainty.layers.batch_ensemble import (
+    BatchConv1d,
+    BatchConv2d,
+    BatchConvTranspose2d,
+    BatchLinear,
+)
 
 
 @pytest.fixture
@@ -62,6 +67,7 @@ def test_conv_one_estimator(self, oned_input: torch.Tensor) -> None:
         layer = BatchConv1d(6, 2, num_estimators=1, kernel_size=1, bias=False)
         assert layer(oned_input).shape == torch.Size([5, 2, 3])
 
+
 class TestBatchConv2d:
     """Testing the BatchConv2d layer class."""
 
diff --git a/tests/metrics/classification/test_calibration.py b/tests/metrics/classification/test_calibration.py
@@ -49,7 +49,9 @@ def test_plot_multiclass(
     def test_errors(self) -> None:
         with pytest.raises(TypeError, match="is expected to be `int`"):
             CalibrationError(task="multiclass", num_classes=None)
-        with pytest.raises(ValueError, match="`n_bins` does not exist, use `num_bins`."):
+        with pytest.raises(
+            ValueError, match="`n_bins` does not exist in TorchUncertainty, use `num_bins`."
+        ):
             CalibrationError(task="multiclass", num_classes=2, n_bins=1)
 
 
diff --git a/torch_uncertainty/datasets/classification/ucr_uea.py b/torch_uncertainty/datasets/classification/ucr_uea.py
@@ -41,8 +41,8 @@ def __init__(
         """
         if not tslearn_installed:  # coverage: ignore
             raise ImportError(
-                "The cv2 library is not installed. Please install"
-                "torch_uncertainty with the image option:"
+                "The tslearn library is not installed. Please install "
+                "torch_uncertainty with the timeseries option: "
                 """pip install -U "torch_uncertainty[timeseries]"."""
             )
 
diff --git a/torch_uncertainty/layers/distributions.py b/torch_uncertainty/layers/distributions.py
@@ -12,6 +12,8 @@ def get_dist_linear_layer(dist_family: str) -> type[nn.Module]:
         return LaplaceLinear
     if dist_family == "cauchy":
         return CauchyLinear
+    if dist_family == "gamma":
+        return GammaLinear
     if dist_family == "student":
         return StudentTLinear
     if dist_family == "nig":
@@ -28,6 +30,8 @@ def get_dist_conv_layer(dist_family: str) -> type[nn.Module]:
         return LaplaceConvNd
     if dist_family == "cauchy":
         return CauchyConvNd
+    if dist_family == "gamma":
+        return GammaConvNd
     if dist_family == "student":
         return StudentTConvNd
     if dist_family == "nig":
@@ -302,6 +306,84 @@ class CauchyConvNd(_LocScaleConvNd):
     """
 
 
+class GammaLinear(_ExpandOutputLinear):
+    """Gamma distribution Linear Density Layer.
+
+    Args:
+        base_layer (type[nn.Module]): The base layer class.
+        event_dim (int): The number of event dimensions.
+        min_scale (float): The minimal value of the scale parameter.
+        **layer_args: Additional arguments for the base layer.
+
+    Note:
+        You should avoid null targets when using the Gamma distribution.
+    """
+
+    def __init__(
+        self,
+        base_layer: type[nn.Module],
+        event_dim: int,
+        min_concentration: float = 1e-6,
+        min_rate: float = 1e-6,
+        **layer_args,
+    ) -> None:
+        super().__init__(
+            base_layer=base_layer,
+            event_dim=event_dim,
+            num_params=2,
+            **layer_args,
+        )
+        self.min_concentration = min_concentration
+        self.min_rate = min_rate
+
+    def forward(self, x: Tensor) -> dict[str, Tensor]:
+        x = super().forward(x)
+        concentration = torch.clamp(
+            F.softplus(x[..., : self.event_dim]), min=self.min_concentration
+        )
+        rate = torch.clamp(
+            F.softplus(x[..., self.event_dim : 2 * self.event_dim]), min=self.min_rate
+        )
+        return {"concentration": concentration, "rate": rate}
+
+
+class GammaConvNd(_ExpandOutputConvNd):
+    """Gamma distribution Convolutional Density Layer.
+
+    Args:
+        base_layer (type[nn.Module]): The base layer class.
+        event_dim (int): The number of event dimensions.
+        min_scale (float): The minimal value of the scale parameter.
+        **layer_args: Additional arguments for the base layer.
+
+    Note:
+        You should avoid null targets when using the Gamma distribution.
+    """
+
+    def __init__(
+        self,
+        base_layer: type[nn.Module],
+        event_dim: int,
+        min_scale: float = 1e-6,
+        **layer_args,
+    ) -> None:
+        super().__init__(
+            base_layer=base_layer,
+            event_dim=event_dim,
+            num_params=2,
+            **layer_args,
+        )
+        self.min_scale = min_scale
+
+    def forward(self, x: Tensor) -> dict[str, Tensor]:
+        x = super().forward(x)
+        loc = x[:, : self.event_dim]
+        scale = torch.clamp(
+            F.softplus(x[:, self.event_dim : 2 * self.event_dim]), min=self.min_scale
+        )
+        return {"loc": loc, "scale": scale}
+
+
 class StudentTLinear(_ExpandOutputLinear):
     def __init__(
         self,
diff --git a/torch_uncertainty/metrics/classification/calibration_error.py b/torch_uncertainty/metrics/classification/calibration_error.py
@@ -15,7 +15,6 @@
 from torchmetrics.metric import Metric
 from torchmetrics.utilities.data import dim_zero_cat
 from torchmetrics.utilities.enums import ClassificationTaskNoMultilabel
-from torchmetrics.utilities.plot import _PLOT_OUT_TYPE
 
 from .adaptive_calibration_error import AdaptiveCalibrationError
 
@@ -143,7 +142,7 @@ def reliability_chart(
     title: str = "Reliability Diagram",
     figsize: tuple[int, int] = (6, 6),
     dpi: int = 150,
-) -> _PLOT_OUT_TYPE:
+) -> tuple[object, object]:
     """Builds Reliability Diagram
     `Source <https://github.com/hollance/reliability-diagrams>`_.
     """
@@ -177,7 +176,7 @@ def reliability_chart(
     return fig, ax
 
 
-def custom_plot(self) -> _PLOT_OUT_TYPE:
+def custom_plot(self) -> tuple[object, object]:
     confidences = dim_zero_cat(self.confidences)
     accuracies = dim_zero_cat(self.accuracies)
 
@@ -311,7 +310,7 @@ def __new__(  # type: ignore[misc]
             for details. Our version of the metric is a wrapper around the original metric providing a plotting functionality.
         """
         if kwargs.get("n_bins") is not None:
-            raise ValueError("`n_bins` does not exist, use `num_bins`.")
+            raise ValueError("`n_bins` does not exist in TorchUncertainty, use `num_bins`.")
         if adaptive:
             return AdaptiveCalibrationError(
                 task=task,
diff --git a/torch_uncertainty/models/mlp.py b/torch_uncertainty/models/mlp.py
@@ -8,7 +8,7 @@
 from torch_uncertainty.layers.distributions import get_dist_linear_layer
 from torch_uncertainty.models import StochasticModel
 
-__all__ = ["bayesian_mlp", "mlp", "packed_mlp"]
+__all__ = ["batched_mlp", "bayesian_mlp", "mimo_mlp", "mlp", "packed_mlp"]
 
 
 class _MLP(nn.Module):
diff --git a/torch_uncertainty/models/wrappers/ema.py b/torch_uncertainty/models/wrappers/ema.py
diff --git a/torch_uncertainty/routines/regression.py b/torch_uncertainty/routines/regression.py
diff --git a/torch_uncertainty/utils/distributions.py b/torch_uncertainty/utils/distributions.py

Original file line number	Diff line number	Diff line change
`@@ -146,4 +146,5 @@`
`146`	`146`	`# ----------`
`147`	`147`	`#`
`148`	`148`	`# [1] Hendrycks, D., & Gimpel, K. (2016). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR 2017.`
	`149`	`+#`
`149`	`150`	`# [2] Hendrycks, D., Basart, S., Mazeika, M., Zou, A., Kwon, J., Mostajabi, M., ... & Song, D. (2019). Scaling out-of-distribution detection for real-world settings. In ICML 2022.`
Original file line number	Diff line number	Diff line change
`@@ -20,7 +20,7 @@`
`20`	`20`	`f"{datetime.now().year!s}, Adrien Lafage and Olivier Laurent"`
`21`	`21`	`)`
`22`	`22`	`author = "Adrien Lafage and Olivier Laurent"`
`23`		`-release = "0.7.0"`
	`23`	`+release = "0.7.0.post1"`
`24`	`24`
`25`	`25`	`# -- General configuration ---------------------------------------------------`
`26`	`26`	`# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration`