Skip to content

IINemo/llm-uncertainty-head

Repository files navigation

LLM Uncertainty Head

License: MIT Python 3.11 Hugging Face EMNLP 2025

Installation | Basic usage

Pre-trained UQ heads -- supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty. A powerful Transformer architecture in their design and informative features derived from LLM attention maps enable strong performance, as well as cross-lingual and cross-domain generalization.

Installation

pip install git+https://github.com/IINemo/lm-polygraph.git@dev
pip install git+https://github.com/IINemo/llm-uncertainty-head.git

Basic usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from luh import AutoUncertaintyHead

from lm_polygraph import CausalLMWithUncertainty
from luh.calculator_infer_luh import CalculatorInferLuh
from luh.luh_estimator_dummy import LuhEstimatorDummy


model_name = "mistralai/Mistral-7B-Instruct-v0.2"
uhead_name = "llm-uncertainty-head/uhead_Mistral-7B-Instruct-v0.2"

llm = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(
    model_name)
tokenizer.pad_token = tokenizer.eos_token
uhead = AutoUncertaintyHead.from_pretrained(
    uhead_name, base_model=llm)

generation_config = GenerationConfig.from_pretrained(model_name)
args_generate = {"generation_config": generation_config,
                 "max_new_tokens": 50}
calc_infer_llm = CalculatorInferLuh(uhead, 
                                    tokenize=True, 
                                    args_generate=args_generate,
                                    device="cuda",
                                    generations_cache_dir="",
                                    predict_token_uncertainties=True)

estimator = LuhEstimatorDummy()
llm_adapter = CausalLMWithUncertainty(llm, tokenizer=tokenizer, stat_calculators=[calc_infer_llm], estimator=estimator)

# prepare text ...
messages = [
    [
        {
            "role": "user", 
            "content": "In which year did the programming language Mercury first appear? Answer with a year only."
        }
    ]
]
# The correct answer is 1995
chat_messages = [tokenizer.apply_chat_template(m, tokenize=False, add_bos_token=False) for m in messages]
inputs = tokenizer(chat_messages, return_tensors="pt", padding=True, truncation=True, add_special_tokens=False).to("cuda")

output = llm_adapter.generate(inputs["input_ids"])
output["uncertainty_score"]

Training

Training UHead from data from top package directory:

CUDA_VISIBLE_DEVICES=0 python -m luh.cli.train.run_train_uhead \
    --config-dir=./configs \
    --config-name=run_train_uhead.yaml \
    dataset.path="<path to your dataset, e.g. hf:llm-uncertainty-head/train_akimbio_mistral>" \
    model.pretrained_model_name_or_path="<your model name, e.g.  mistralai/Mistral-7B-Instruct-v0.2>"

Cite

@inproceedings{shelmanov2025head,
  title        = {A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs},
  author       = {Shelmanov, Artem and Fadeeva, Ekaterina and Tsvigun, Akim and Tsvigun, Ivan and Xie, Zhuohan and Kiselev, Igor and Daheim, Nico and Zhang, Caiqi and Vazhentsev, Artem and Sachan, Mrinmaya and others},
  booktitle    = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year         = {2025},
  address      = {Abu Dhabi, United Arab Emirates},
  publisher    = {Association for Computational Linguistics},
  pages        = {35700--35719},
  url          = {https://aclanthology.org/2025.emnlp-main.1809/}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages