Skip to content

s-nlp/pseudoparadetox

Repository files navigation

LLMs to Replace Crowdsourcing For Parallel Data Creation? The Case of Text Detoxification

ImageTranscreation

EMNLP Web Page

This is the official implementation of the paper LLMs to Replace Crowdsourcing For Parallel Data Creation? The Case of Text Detoxification by Daniil Moskovskiy, Sergey Pletenev and Alexander Panchenko.

Abstract

The lack of high-quality training data remains a significant challenge in NLP. Manual annotation methods, such as crowdsourcing, are costly, require intricate task design skills, and, if used incorrectly, may result in poor data quality. From the other hand, LLMs have demonstrated proficiency in many NLP tasks, including zero-shot and few-shot data annotation. However, they often struggle with text detoxification due to alignment constraints and fail to generate the required detoxified text. This work explores the potential of modern open source LLMs to annotate parallel data for text detoxification. Using the recent technique of activation patching, we generate a pseudo-parallel detoxification dataset based on ParaDetox. The detoxification model trained on our generated data shows comparable performance to the original dataset in automatic detoxification evaluation metrics and superior quality in manual evaluation and side-by-side comparisons.

Results

table1 Results of the automatic detoxification evaluation after training BART on the original ParaDetox data (highlighted in gray) and generated with LLMs PseudoParaDetox data in 0-shot and 10-shot settings. A.P. stands for Activation Patched models, ❌ stands for models used as is. Best results for each setting (0-shot/10-shot) are bold, and the best overall results are bold italic.

table2 Results of the manual detoxification evaluation after training BART on the original ParaDetox data (highlighted in gray) and generated with LLMs PseudoParaDetox data in 0-shot and 10-shot settings. A.P. stands for Activation Patched models, ❌ stands for models used as is. Best results for each setting (0-shot/10-shot) are bold, and the best overall results are bold italic.

sbs Side-by-side evaluation BART trained on ParaDetox versus PseudoParaDetox (generated by activation patched LLMs) on a held-out test set. Win of PseudoParaDetox (), Tie () and ParaDetox ().

Quick start

For synthetic data generation you need abbliterated model. You can make model with code in /noteboks/RefusalsRemover.ipynb or use already patched model like Meta-Llama-3-8B-Instruct-abliterated-v3 or Meta-Llama-3.1-8B-Instruct-abliterated

Next, you need to generate synthetic non-toxic data with patched model

for lang in en
do
    python detoxer.py --language $lang --model_path "mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated" \
    --nshot 10 --batch_size 8 \
    --output_path ./datasets/Meta-Llama-3.1-8B-Instruct_abliterated_10shot_$lang.csv
done

In order to check the quality of synthetic data we train two Seq2Seq models with real and synthetic data.

To train BART-large with real data:

python src/bart_baseline.py --config_path configs/bart_real_train.json --input_columns en_toxic_comment --output_columns en_neutral_sentence

To train BART-large with synth data:

python src/bart_baseline.py --config_path configs/bart_synth_train.json --input_columns en_toxic_comment --output_columns generated_neutral_sentence

For validate both BART models:

python ./src/utils/evaluate.py \
    --source_list path/to/test/ \
    --references_list path/to/test/  \
    --input_path /results/generated/Meta-Llama-3-8B-Instruct-abliterated-v3.5_10shot_t08_p09.txt \
    --output_dir /results/generated.results

Test data

Test part of ParaDetox dataset is available upon request.

Citation

If you find this work useful, please cite this paper:

@inproceedings{moskovskiy-etal-2024-llms,
    title = "{LLM}s to Replace Crowdsourcing For Parallel Data Creation? The Case of Text Detoxification",
    author = "Moskovskiy, Daniil  and
      Pletenev, Sergey  and
      Panchenko, Alexander",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.839",
    pages = "14361--14373",
    abstract = "The lack of high-quality training data remains a significant challenge in NLP. Manual annotation methods, such as crowdsourcing, are costly, require intricate task design skills, and, if used incorrectly, may result in poor data quality. From the other hand, LLMs have demonstrated proficiency in many NLP tasks, including zero-shot and few-shot data annotation. However, they often struggle with text detoxification due to alignment constraints and fail to generate the required detoxified text. This work explores the potential of modern open source LLMs to annotate parallel data for text detoxification. Using the recent technique of activation patching, we generate a pseudo-parallel detoxification dataset based on ParaDetox. The detoxification model trained on our generated data shows comparable performance to the original dataset in automatic detoxification evaluation metrics and superior quality in manual evaluation and side-by-side comparisons.",
}