Repository for the paper 'Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models'
The paper has been accepted for the 'Instruction Tuning and Instruction Following' workshop at NeurIPS 2023.
Paper : https://arxiv.org/abs/2311.18215
KoTox
is an automatically generated toxic instruction dataset in Korean, comprising 39K unethical instruction-output pairs.
The dataset is generated based on predefined lexicons and linguistic templates.
It is designed to address potentially harmful or misleading instructions by including outputs that refrain from providing specific opinions or information in response.
The dataset has been proven effective in mitigating toxicity in Korean Large Language Models (LLMs).
@misc{byun2023automatic,
title={Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models},
author={Sungjoo Byun and Dongjun Jang and Hyemi Jo and Hyopil Shin},
year={2023},
eprint={2311.18215},
archivePrefix={arXiv},
primaryClass={cs.CL}
}