This software project accompanies the research paper, TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles. This paper has been accepted by ACL 2024.
TOAD is a synthetic TOD dataset that simulates realistic app context interactions and provides multiple system response styles (verbosity & mirroring user expressions).
Preparation:
- Install dependencies from
requirements.txt. - We use OpenAI Compatible API to make requests to LLMs. Set the environment variable
OPENAI_API_KEY,BASE_URL(optional) andENGINE(e.g. "gpt-3.5-turbo") to config the backend LLM. You can use a dotenv file.
Synthesis: The data synthesis pipeline is divided into 3 steps. The generated files will be stored in data/.
Step 1: Context generation
- Run
python -m context_generation.occupation_generatorto synthesizeoccupations.json(you can skip this step and re-use the existing file). - Run
python -m context_generation.persona_generatorto synthesizepersonas.jsonlusing occupations. - Run
python -m context_generation.context_generatorto synthesizecontexts.jsonlusing personas.
Step 2: Dialog generation
- Run code in
dialog_generationto synthesize dialogs based on contexts. Example command:
python -m dialog_generation.main \
--phenomena='compound' \
--output_dir='data/dialogs' \
--number_of_data=1000 \
--full_options_mode \
--thread_num=15--phenomenaspecifies the phenomena to be used in dialog generation. It can be one ofcompound,compositional,none.--output_dirspecifies the path to save the generated dialogs.--number_of_dataspecifies the number of dialogs to generate.--full_options_modeasks for generating of all 6 response style options.--thread_numspecifies the number of threads to run in parallel.
For how to customize dialog generation by modifying the schema.json, please refer to the documentation in that directory.
Step 3: Quality control
- Run
python -m quality_control.mainto filter out inconsistent dialogs using the LLM.
@inproceedings{liu2024toad,
title = "{TOAD}: Task-Oriented Automatic Dialogs with Diverse Response Styles",
author = "Liu, Yinhong and
Fang, Yimai and
Vandyke, David and
Collier, Nigel",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
year = "2024",
url = "https://arxiv.org/abs/2402.10137"
}