This repository contains models and evaluation methodology for the detoxification task of Russian texts. The original paper "Methods for Detoxification of Texts for the Russian Language" was presented at Dialogue-2021 conference.
In this repository, we release two best models detoxGPT and condBERT (see Methodology for more details). You can try detoxification inference example in this notebook or .
Also, you can test our models via web-demo or you can pour out your anger on our Telegram bot.
In our research, we tested several approaches:
- Duplicate: simple duplication of the input;
- Delete: removal of rude and toxic from pre-defined vocab;
- Retrieve: retrieval based on cosine similarity between word embeddings from non-toxic part of RuToxic dataset;
Based on ruGPT models. This method requires parallel dataset for training. We tested ruGPT-small, ruGPT-medium, and ruGPT-large models in several setups:
- zero-shot: the model is taken as is (with no fine-tuning). The input is a toxic sentence which we would like to detoxify prepended with the prefix “Перефразируй” (rus. Paraphrase) and followed with the suffix “>>>” to indicate the paraphrasing task
- few-shot: the model is taken as is. Unlike the previous scenario, we give a prefix consisting of a parallel dataset of toxic and neutral sentences.
- fine-tuned: the model is fine-tuned for the paraphrasing task on a parallel dataset.
Based on BERT model. This method does not require parallel dataset for training. One of the tasks on which original BERT was pretrained -- predicting the word that should was replaced with a [MASK] token -- suits delete-retrieve-generate style transfer method. We tested RuBERT and Geotrend pre-trained models in several setups:
- zero-shot where BERT is taken as is (with no extra fine-tuning);
- fine-tuned where BERT is fine-tuned on a dataset of toxic and safe sentences to acquire a style- dependent distribution, as described above.
The evaluation consists of three types of metrics:
- style transfer accuracy (STA): accuracy based on toxic/non-toxic classifier (we suppose that the resulted text should be in non-toxic style)
- content preservation:
- word overlap (WO);
- BLEU: accuracy based on n-grams (1-4);
- cosine similarity (CS): between vectors of texts’ embeddings.
- language quality: perplexity (PPL) based on language model.
Finally, aggregation metric: geometric mean between STA, CS and PPL.
You can run ru_metric.py
script for evaluation. The fine-tuned weights for toxicity classifier can be found here.
Method | STA↑ | CS↑ | WO↑ | BLEU↑ | PPL↓ | GM↑ |
---|---|---|---|---|---|---|
Baselines | ||||||
Duplicate | 0.00 | 1.00 | 1.00 | 1.00 | 146.00 | 0.05 ± 0.0012 |
Delete | 0.27 | 0.96 | 0.85 | 0.81 | 263.55 | 0.10 ± 0.0007 |
Retrieve | 0.91 | 0.85 | 0.07 | 0.09 | 65.74 | 0.22 ± 0.0010 |
detoxGPT-small | ||||||
zero-shot | 0.93 | 0.20 | 0.00 | 0.00 | 159.11 | 0.10 ± 0.0005 |
few-shot | 0.17 | 0.70 | 0.05 | 0.06 | 83.38 | 0.11 ± 0.0009 |
fine-tuned | 0.51 | 0.70 | 0.05 | 0.05 | 39.48 | 0.20 ± 0.0011 |
detoxGPT-medium | ||||||
fine-tuned | 0.49 | 0.77 | 0.18 | 0.21 | 86.75 | 0.16 ± 0.0009 |
detoxGPT-large | ||||||
fine-tuned | 0.61 | 0.77 | 0.22 | 0.21 | 36.92 | 0.23 ± 0.0010 |
condBERT | ||||||
DeepPavlov zero-shot | 0.53 | 0.80 | 0.42 | 0.61 | 668.58 | 0.08 ± 0.0006 |
DeepPavlov fine-tuned | 0.52 | 0.86 | 0.51 | 0.53 | 246.68 | 0.12 ± 0.0007 |
Geotrend zero-shot | 0.62 | 0.85 | 0.54 | 0.64 | 237.46 | 0.13 ± 0.0009 |
Geotrend fine-tuned | 0.66 | 0.86 | 0.54 | 0.64 | 209.95 | 0.14 ± 0.0009 |
Folder data
consists of all used train datasets, test data and naive example of style transfer result:
data/train
: RuToxic dataset, list of Russian rude words, and 200 samples of parallel sentences that were used for ruGPT fine-tuning;data/test
: 10,000 samples that were used for approaches evaluation;data/results
: example of style transfer output format illustrated with naive duplication.
If you find this repository helpful, feel free to cite our publication:
@article{DBLP:journals/corr/abs-2105-09052,
author = {Daryna Dementieva and
Daniil Moskovskiy and
Varvara Logacheva and
David Dale and
Olga Kozlova and
Nikita Semenov and
Alexander Panchenko},
title = {Methods for Detoxification of Texts for the Russian Language},
journal = {CoRR},
volume = {abs/2105.09052},
year = {2021},
url = {https://arxiv.org/abs/2105.09052},
archivePrefix = {arXiv},
eprint = {2105.09052},
timestamp = {Mon, 31 May 2021 16:16:57 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2105-09052.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
For any questions please contact Daryna Dementieva via email.