Skip to content

Official dataset, evaluation scripts, and benchmark details for our paper: RecToM: A Benchmark for Evaluating Machine Theory of Mind in Recommendation Dialogues

Notifications You must be signed in to change notification settings

CGCL-codes/RecToM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

This repository contains the official dataset, evaluation scripts, and benchmark details for our AAAI-accepted paper:

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems


🌟 Overview

RecToM is a benchmark designed to rigorously evaluate the Theory of Mind (ToM) capabilities of Large Language Models (LLMs) within recommendation dialogues.
LLMs must infer users’ Beliefs, Desires, and Intents during multi-turn interactionsβ€”skills essential for building context-aware and effective recommender systems.

πŸ” Key Features

🧭 Multi-choice Strategy

A single utterance may express multiple distinct intentions. RecToM captures this natural conversational complexity.

πŸ”Ž Multi-granular Intentions

Intentions are hierarchical: an utterance may contain both a high-level purpose and fine-grained contextual sub-intentions.

πŸ“š Multi-dimensional Beliefs

Beliefs about items (e.g., movies) involve multiple interconnected aspects:
who introduces the item, whether the seeker has watched it, and their levels of preference or acceptance.

🎯 Multi-concurrent Desires

Users frequently pursue multiple goals simultaneously, such as exploring new items while comparing alternatives.


πŸ“Š Dataset Structure & Statistics

RecToM contains 20,524 expertly annotated dialogue–query pairs across 10 ToM reasoning categories.

✨ Question Type Statistics

Question Type Quantity # Options Answer Type
Desire (Seek) 1,448 2 single
Coarse Intention (Rec / Seek) 2,205 / 2,205 5 / 4 multiple
Fine Intention (Rec / Seek) 2,205 / 2,205 10 / 16 multiple
Belief (Rec) 1,762 7 single
Prediction (Rec / Seek) 2,098 / 2,149 5 / 4 multiple
Judgement (Rec / Seek) 2,098 / 2,149 2 / 2 single

Table: Statistics of question types and option distributions in RecToM.

πŸ”§ Evaluation

You can run the evaluation using the provided script:

bash evaluate/12_run.sh

πŸ“š Citation

If you use RecToM in your research, please cite our paper:

@inproceedings{li2026rectom,
  title     = {RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems},
  author    = {Li, Mengfan and Shi, Xuanhua and Deng, Yang},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26)},
  year      = {2026},
  publisher = {AAAI Press},
  note      = {To appear}
}

About

Official dataset, evaluation scripts, and benchmark details for our paper: RecToM: A Benchmark for Evaluating Machine Theory of Mind in Recommendation Dialogues

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published