scikit-learn-contrib · Damien-Bouet · May 28, 2024 · May 28, 2024 · May 31, 2024 · May 31, 2024
diff --git a/AUTHORS.rst b/AUTHORS.rst
@@ -43,5 +43,6 @@ Contributors
 * Ambros Marzetta <ambrosm>
 * Carl McBride Ellis <Carl-McBride-Ellis>
 * Baptiste Calot <[email protected]>
+* Damien Bouet <[email protected]>
 * Leonardo Garma <[email protected]>
 To be continued ...
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -5,6 +5,10 @@ History
 0.9.x (2024-xx-xx)
 ------------------
 
+* Add `SplitCPRegressor`, based on new `SplitCP` abstract class, to support the new CCP method
+* Add `GaussianCCP`, `PolynomialCCP` and `CustomCCP` based on `CCPCalibrator` to implement the Conditional CP method
+* Add the `StandardCalibrator`, to reproduce standard CP and make sure that the `SplitCPRegressor` is implemented correctly.
+* Add the CCP documentation, tutorial and demo notebooks
 * Fix issue 525 in contribution guidelines with syntax errors in hyperlinks and other formatting issues.
 * Bump wheel version to avoid known security vulnerabilities
 

diff --git a/README.rst b/README.rst
@@ -229,6 +229,8 @@ and with the financial support from Région Ile de France and Confiance.ai.
 
 [12] Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. "Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control." (2022).
 
+[13] Isaac Gibbs, John J. Cherian, and Emmanuel J. Candès, "Conformal Prediction With Conditional Guarantees" (2023).
+
 
 📝 License
 ==========

diff --git a/doc/api.rst b/doc/api.rst
@@ -1,3 +1,6 @@
+
+.. _api:
+
 #########
 MAPIE API
 #########
@@ -109,9 +112,33 @@ Resampling
    subsample.BlockBootstrap
    subsample.Subsample
 
+New Split CP class
+===================
+
+.. autosummary::
+   :toctree: generated/
+   :template: class.rst
+
+   future.split.base.SplitCP
+   future.split.SplitCPRegressor
+   future.split.SplitCPClassifier
+
+Calibrators
+===========
+
+.. autosummary::
+   :toctree: generated/
+   :template: class.rst
+
+   future.calibrators.base.BaseCalibrator
+   future.calibrators.StandardCalibrator
+   future.calibrators.ccp.CCPCalibrator
+   future.calibrators.ccp.CustomCCP
+   future.calibrators.ccp.PolynomialCCP
+   future.calibrators.ccp.GaussianCCP
 
 Mondrian
-==========
+========
 
 .. autosummary::
    :toctree: generated/

diff --git a/doc/index.rst b/doc/index.rst
@@ -32,6 +32,16 @@
    examples_classification/index
    notebooks_classification
 
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: CONDITIONAL CP
+
+   theoretical_description_ccp
+   theoretical_description_calibrators
+   examples_regression/4-tutorials/plot_ccp_tutorial
+   examples_classification/4-tutorials/plot_ccp_class_tutorial
+
 .. toctree::
    :maxdepth: 2
    :hidden:

diff --git a/doc/notebooks_regression.rst b/doc/notebooks_regression.rst
@@ -16,3 +16,6 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
+4. Leverage CCP method to have adaptative prediction intervals on Communities and Crime Dataset : `ccp_CandC_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/tutorial_ccp_CandC.ipynb>`_
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
diff --git a/doc/theoretical_description_calibrators.rst b/doc/theoretical_description_calibrators.rst
@@ -0,0 +1,80 @@
+.. title:: Calibrators : contents
+
+.. _theoretical_description_calibrators:
+
+###############
+Calibrators
+###############
+
+In Mapie, the conformalisation step is done directly inside
+:class:`~mapie.regression.MapieRegressor` or :class:`~mapie.classification.MapieClassifier`,
+depending on the ``method`` argument.
+However, when implementing the new CCP method, we decided to externalize the conformalisation
+step into a new object named ``calibrator``, to have more freedom and possible customisation.
+
+The new classes (:class:`~mapie.future.split.SplitCPRegressor` and :class:`~mapie.future.split.SplitCPClassifier`) have 3 steps:
+
+1. ``fit_predictor``, which fit the sklearn estimator
+2. ``fit_calibrator``, which do the conformalisation (calling ``calibrator.fit``)
+3. ``predict``, which compute the predictions and call ``calibrator.predict`` to create the prediction intervals
+
+Thus, the calibrators, based on :class:`~mapie.future.calibrators.base.BaseCalibrator`,
+must have the two methods: ``fit`` and ``predict``.
+
+Mapie currently implements calibrators for the CCP method (and the standard method),
+but any conformal prediction method can be implemented by the user as
+a subclass of :class:`~mapie.future.calibrators.base.BaseCalibrator`.
+
+Example of standard split CP:
+------------------------------
+
+For instance, the :class:`~mapie.future.calibrators.StandardCalibrator` implements
+the :ref:`standard split method<theoretical_description_regression_standard>`:
+
+* ``.fit`` computes :math:`\hat{q}_{n, \alpha}^+`, the :math:`(1-\alpha)` quantile of the distribution
+* ``.predict`` comptues the prediction intervals with: :math:`\hat{\mu}(X_{n+1}) \pm \hat{q}_{n, \alpha}^+`
+
+
+The CCP calibrators:
+---------------------
+For the CCP method (see :ref:`theoretical description<theoretical_description_ccp>`),
+:class:`~mapie.future.calibrators.ccp.CCPCalibrator` implements:
+
+* ``.fit`` solve the optimization problem (see :ref:`step 2<theoretical_description_ccp_control_steps>`) to find the optimal :math:`\hat{g}`
+* ``.predict`` comptues the prediction intervals using :math:`\hat{g}` (see :ref:`step 3<theoretical_description_ccp_control_steps>`)
+
+We just need a way to define our :math:`\Phi` function (see :ref:`step 1<theoretical_description_ccp_control_steps>`).
+
+Multiple subclasses are implemented to facilitate the definition of the :math:`\Phi` function,
+but other could be implemented by the user as a subclass of :class:`~mapie.future.calibrators.ccp.CCPCalibrator`.
+
+1. :class:`~mapie.future.calibrators.ccp.CustomCCP`
+
+   This class allows to define by hand the :math:`\Phi` function, as a
+   concatenation of other functions which create features of ``X`` (or potentially ``y_pred`` or any exogenous variable ``z``)
+
+   It can also be used to concatenate other :class:`~mapie.future.calibrators.ccp.CCPCalibrator` instances.
+
+2. :class:`~mapie.future.calibrators.ccp.PolynomialCCP`
+
+   It create some polynomial features of ``X`` (or potentially ``y_pred`` or any exogenous variable ``z``).
+   It could be created by hand using `CustomCCP`, it is just a way simplify the creation of :math:`\Phi`.
+
+3. :class:`~mapie.future.calibrators.ccp.GaussianCCP`
+
+   It create gaussian kernels, as done in the method's paper :ref:`[1]<theoretical_description_calibrators_references>`.
+   It samples random points from the :math:`\{ X_i \}_i`, then compute gaussian distances
+   between each point and :math:`X_{n+1}` with a given standard deviation :math:`\sigma`
+   (which can be optimized using cross-validation), following the formula:
+
+   .. math::
+     \forall j \in \{ \text{sampled index} \}, \quad \Phi(X)_j = exp \left( -\frac{(X_{n+1} - X_j)^2}{2\sigma ^2} \right)
+
+
+.. _theoretical_description_calibrators_references:
+
+References
+==========
+
+[1] Isaac Gibbs, John J. Cherian, and Emmanuel J. Candès,
+"Conformal Prediction With Conditional Guarantees", `arXiv <https://arxiv.org/abs/2305.12616>`_, 2023.
diff --git a/doc/theoretical_description_ccp.rst b/doc/theoretical_description_ccp.rst
@@ -0,0 +1,201 @@
+.. title:: Theoretical Description : contents
+
+.. _theoretical_description_ccp:
+
+########################
+Theoretical Description
+########################
+
+The Conditional Conformal Prediction (CCP) method :ref:`[1]<theoretical_description_ccp_references>` is a model agnostic conformal prediction method which
+can create adaptative prediction intervals.
+
+In MAPIE, this method has a lot of advantages:
+
+- It is model agnostic (it doesn’t depend on the model but only on the predictions, unlike CQR)
+- It can create very adaptative intervals (with a varying width which truly reflects the model uncertainty)
+- while providing coverage guarantee on all sub-groups of interest (avoiding biases)
+- with the possibility to inject prior knowledge about the data or the model
+
+However, we will also see its disadvantages:
+- The adaptativity depends on the calibrator we use: It can be difficult to choose the correct calibrator,
+with the best parameters.
+- The calibration and even more the inference are much longer than for the other methods.
+We can reduce the inference time using ``unsafe_approximation=True``,
+but we lose the strong theoretical guarantees and risk a small miscoverage
+(even if, most of the time, the coverage is achieved).
+
+To conclude, it can create more adaptative intervals than the other methods,
+but it can be difficult to find the best settings (calibrator type and parameters)
+and can have a big computational time.
+
+How does it works?
+====================
+
+Method's intuition
+--------------------
+
+We recall that the `standard split method` estimates the absolute residuals by a constant :math:`\hat{q}_{n, \alpha}^+`
+(which is the quantile of :math:`{|Y_i-\hat{\mu}(X_i)|}_{1 \leq i \leq n}`). Then, the prediction interval is:
+
+.. math:: \hat{C}_{n, \alpha}^{\textrm split}(X_{n+1}) = \hat{\mu}(X_{n+1}) \pm \hat{q}_{n, \alpha}^+
+
+The idea of the `CCP` method, is to learn, not a constant, but a function :math:`q(X)`,
+to have a different interval width depending on the :math:`X` value. Then, we would have:
+
+.. math:: \hat{C}_{n, \alpha}^{\textrm CCP}(X_{n+1}) = \hat{\mu}(X_{n+1}) \pm \hat{q}(X_{n+1})
+
+To be able to find the best function, while having some coverage guarantees,
+we should select this function inside some defined class of functions :math:`\mathcal{F}`.
+
+This method is motivated by the following equivalence:
+
+.. math:: 
+  \begin{array}{c}
+  \mathbb{P}(Y_{n+1} \in \hat{C} \; | \; X_{n+1}=x) = 1 - \alpha, \quad \text{for all x} \\
+  \textstyle \Longleftrightarrow \\
+  \mathbb{E} \left[ f(X_{n+1}) \mathbb{I} \left\{ Y_{n+1} \in \hat{C}(X_{n+1}) \right\} \right] = 0, \quad \text{for all measurable f} \\
+  \end{array}
+
+This is the equation corresponding to the perfect conditional coverage, which is theoretically impossible to obtain.
+Then, relaxing this objective by replacing "all measurable f" with "all f belonging to some class :math:`\mathcal{F}`"
+seems a way to get close to the perfect conditional coverage.
+
+
+.. _theoretical_description_ccp_control_steps:
+
+The method follow 3 steps:
+----------------------------
+
+1. Choose  a class of functions. The simple approach is to choose a class a finite dimension :math:`d \in \mathbb{N}`,
+   using, for any :math:`\Phi \; : \; \mathbb{R}^d \to \mathbb{R}` (chosen by the user)
+
+  .. math::
+    \mathcal{F} = \left\{ \Phi (\cdot)^T \beta  :  \beta \in \mathbb{R}^d \right\}
+
+2. Find the best function of this class by solving the following optimization problem:
+
+  .. note:: It is actually a quantile regression between the transformation :math:`\Phi (X)` and the conformity scores `S`.
+
+  Considering an upper bound :math:`M` of the conformity scores,
+  such as :math:`S_{n+1} < M`:
+
+  .. math::
+    \hat{g}_M^{n+1} := \text{arg}\min_{g \in \mathcal{F}} \; \frac{1}{n+1} \sum_{i=1}^n{l_{\alpha} (g(X_i), S_i)} \; + \frac{1}{n+1}l_{\alpha} (g(X_{n+1}), M)
+
+  .. warning::
+    In the :ref:`API<api>`, we use by default :math:`M=max(\{S_i\}_{i\leq n})`,
+    the maximum conformity score of the calibration set,
+    but you can specify it yourself if a bound is known, considering your data,
+    model and conformity score.
+
+    Moreover, it means that there is still small computations which are done
+    for each test point :math:`X_{n+1}`. If you want to avoid that, you can
+    use ``unsafe_approximation=True``, which only consider:
+
+    .. math::
+      \hat{g} :=  \text{arg}\min_{g \in \mathcal{F}} \; \frac{1}{n} \sum_{i=1}^n{l_{\alpha} (g(X_i), S_i)}
+
+    However, it may result in a small miscoverage.
+    It is recommanded to empirically check the resulting coverage on the test set.
+
+3. We use this optimized function :math:`\hat{g}_M^{n+1}` to compute the prediction intervals:
+
+  .. math::
+    \hat{C}_M^{n+1}(X_{n+1}) = \{ y : S(X_{n+1}, \: y) \leq \hat{g}_M^{n+1}(X_{n+1}) \}
+
+  .. note:: The formulas are generic and work with all conformity scores. But in the case of the absolute residuals, we get:
+
+    .. math::
+      \hat{C}(X_{n+1}) = \hat{\mu}(X_{n+1}) \pm \hat{g}_M^{n+1}(X_{n+1})
+
+.. _theoretical_description_ccp_control_coverage:
+
+Coverage guarantees:
+-----------------------
+
+.. warning::
+  The following guarantees assume that the approximation described above is not used, and that
+  the chosen bound M is indeed such as :math:`\forall \text{ test index }i, \; S_i < M`
+
+Following this steps, we have the coverage guarantee:
+:math:`\forall f \in \mathcal{F},`
+
+.. math::
+  \mathbb{P}_f(Y_{n+1} \in \hat{C}_M^{n+1}(X_{n+1})) \geq 1 - \alpha
+
+.. math::
+  \text{and} \quad \left | \mathbb{E} \left[ f(X_{n+1}) \left(\mathbb{I} \left\{ Y_{n+1} \in \hat{C}_M^{n+1}(X_{n+1}) \right\} - (1 - \alpha) \right) \right] \right |
+  \leq \frac{d}{n+1} \mathbb{E} \left[ \max_{1 \leq i \leq n+1} \left|f(X_i)\right| \right]
+
+.. note:: 
+  If we want to have a homogenous coverage on some given groups in :math:`\mathcal{G}`, we can use
+  :math:`\mathcal{F} = \{ x \mapsto \sum _{G \in \mathcal{G}} \; \beta_G \mathbb{I} \{ x \in G \} : \beta_G \in \mathbb{R} \}`,
+  then we have :math:`\forall G \in \mathcal{G}`:
+
+  .. math::
+    1 - \alpha
+    \leq \mathbb{P} \left( Y_{n+1} \in \hat{C}_M^{n+1}(X_{n+1}) \; | \; X_{n+1} \in G \right) 
+    \leq 1- \alpha + \frac{|\mathcal{G}|}{(n+1) \mathbb{P}(X_{n+1} \in G)} \\
+    = 1- \alpha + \frac{\text{number of groups in } \mathcal{G}}{\text{number of samples of } \{X_i\} \text{ in G}}
+
+How to use it in practice?
+============================
+
+Creating a class a function adapted to our needs
+--------------------------------------------------
+
+The following will provide some tips on how to use the method (for more practical examples, see
+:doc:`examples_regression/4-tutorials/plot_ccp_tutorial` or
+`How to leverage the CCP method on real data
+<https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/tutorial_ccp_CandC.ipynb>`_
+).
+
+1. If you want a generally adaptative interval and you don't have prior
+   knowledge about your data, you can use gaussian kernels, implemented in Mapie
+   in :class:`~mapie.future.calibrators.ccp.GaussianCCP`. See the API doc for more information.
+
+2. If you want to avoid bias on sub-groups and ensure an homogenous coverage on those,
+   you can add indicator functions corresponding to those groups. 
+
+3. You can inject prior knowledge in the method using :class:`~mapie.future.calibrators.ccp.CustomCCP`,
+   if you have information about the conformity scores distribution
+   (domains with different biavior, expected model uncertainty depending on a given feature, etc).
+
+4. Empirically test obtained coverage on a test set, to make sure that the expected coverage is achieved. 
+
+
+Avoid miscoverage
+--------------------
+
+- | To guarantee marginal coverage, you need to have an intercept term in the :math:`\Phi` function (meaning, a feature equal to :math:`1` for all :math:`X_i`).
+  | It correspond, in the :ref:`API<api>`, to ``bias=True``.
+
+- | Some miscoverage can come from the optimization process, which is
+    solved with numerical methods, and may fail to find the global minimum.
+    If the target coverage is not achieved, you can try adding regularization,
+    to help the optimization process. You can also try reducing the number of dimensions :math:`d`
+    or using a smoother :math:`\Phi` function, such as with gaussian kernels
+    (indeed, using only indicator functions makes the optimization difficult).
+
+    .. warning::
+      Adding some regularization will theoretically induce a miscoverage,
+      as the objective function will slightly increase, to minimize the regularization term.
+
+      In practice, it may increase the coverage (as it helps the optimization convergence),
+      but it can also decrease it. Always empirically check the resulting coverage
+      and avoid too big regularization terms (below :math:`10^{-4}` is usually recommanded).
+
+
+- | Finally, if you have coverage issues because the optimisation is difficult,
+    you can artificially enforce higher coverage by reducing the value of :math:`\alpha`.
+    Evaluating the best adjusted :math:`\alpha` using cross-validation will ensure
+    the same coverage on the test set (subject to variability due to the finite number of samples).
+
+
+.. _theoretical_description_ccp_references:
+
+References
+==========
+
+[1] Isaac Gibbs, John J. Cherian, and Emmanuel J. Candès,
+"Conformal Prediction With Conditional Guarantees", `arXiv <https://arxiv.org/abs/2305.12616>`_, 2023.
diff --git a/doc/theoretical_description_classification.rst b/doc/theoretical_description_classification.rst
@@ -31,6 +31,9 @@ for at least :math:`90 \%` of the new test data points.
 Note that the guarantee is possible only on the marginal coverage, and not on the conditional coverage
 :math:`P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) | X_{n+1} = x_{n+1} \}` which depends on the location of the new test point in the distribution. 
 
+
+.. _theoretical_description_classification_lac:
+
 1. LAC
 ------