Skip to content
/ LTS Public

Local Temperature Scaling for Probability Calibration

License

Notifications You must be signed in to change notification settings

uncbiag/LTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Temperature Scaling (LTS)

This is the official repository for

Local Temperature Scaling for Probability Calibration
Zhipeng Ding, Xu Han, Peirong Liu, and Marc Niethammer
ICCV 2021 eprint arxiv

If you use LTS or some part of the code, please cite:

@inproceedings{ding2021local,
  title={Local temperature scaling for probability calibration},
  author={Ding, Zhipeng and Han, Xu and Liu, Peirong and Niethammer, Marc},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6889--6899},
  year={2021}
}

Key Features

Different from previous probability calibration methods, LTS is a spatially localized probability calibration approach for semantic segmentation.

Spatially Localized Feature

In the following figure, Left: Predicted probabilities (confidence) by a U-Net. Middle: Average accuracy of each bin for 10 bins of reliability diagram with an equal bin width indicating different probability ranges that need to be optimized for different locations. Right: Temperature value map obtained via optimization, revealing different optimal localized Temperature scaling values at different locations.

In the following two figures, the top row shows the global reliability diagrams for different methods for the entire image. The three rows underneath correspond to local reliability diagrams for the different methods for different local patches. Note that temperature scaling (TS) and image-based temperature scaling (IBTS) can calibrate probabilities well across the entire image. Visually, they are only slightly worse than LTS. However, when it comes to local patches, LTS can still successfully calibrate probabilities while TS and IBTS can not. In general, LTS improves local probability calibrations.

b c

Theoretical Justification

With KKT conditions, we can prove that

When the to-be-calibrated segmentation network is overconfident, 
minimizing NLL w.r.t. TS, IBTS, and LTS results in solutions that are also the solutions of 
maximizing entropy of the calibrated probability w.r.t. TS, IBTS and LTS under the condition of overconfidence.

Similarly, there is another theorem to validate the effectiveness of TS, IBTS and LTS under the condition of underconfidence in Appendix.

Implementation

The overall architecture for probability calibration via (local) temperature scaling is shown in the following figure. The output logit map of a pre-trained semantic segmentation network (Seg) is locally scaled to produces the calibrated probabilities. OP denotes optimization or prediction via a deep convolutional network to obtain the (local) temperature values.

Specifically, in this paper, we use a simple tree-like convolutional network (See figure below) as in (Lee et al.). However other neural network architectures could also work as illustrated by (Bai et al.). The following figures are the high-level illustration of the tree-like CNN. Left subfigure is for LTS and right subfigure is for IBTS. Detailed descriptions can be found in Appendix.

Walk-through Example

As an example, we use the Tiramisu model for semantic segmentation on CamVid dataset. Note that other deep segmentation networks and datasets can also be used.

Deep Semantic Segmentation Network

Tiramisu is a fully convolutional densenet. The implementation and training details can be found this github repository. You need to modify the code accordingly in order to make it addaptive to your settings.

Train Calibration Models

After getting logits from the segmentation model and properly set the dataloader, the next step is to train calibration model. To train LTS, simply run

python Tiramisu_calibration.py --gpu 0 --model-name LTS --epochs 200 --batch-size 4 --lr 1e-4 --seed 2021 --save-per-epoch 1 

The table below is a collection of probability calibration models that can be used as baselines. You could pull these reposteries and modify the code accoradingly.

Methods Implementations
Temperature Scaling TS
Vector Scaling VS
Isotonic Regression IsoReg
Ensemble Temperature Scaling ETS
Dirichlet Calibration DirODIR
Focal Loss Calibration FL
Maximum Mean Calibration Error MMCE

Evaluation

To evaluate the four calibration metrics (ECE, MCE, SCE, and ACE) defined in the paper, simply run

python probability_measure_CamVid.py --gpu 0 --model_name LTS 
python probability_measure_Local_CamVid.py --gpu 0 --model_name LTS

For multi-atlas segmentation experiment to validate the probability calibration, please refer to VoteNet-Family for details.