-
Notifications
You must be signed in to change notification settings - Fork 58
Add Toxicity Evaluation #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ashahba
merged 13 commits into
opea-project:main
from
daniel-de-leon-user293:daniel/toxicity-eval
Mar 12, 2025
Merged
Changes from 9 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
090e2b9
add toxicity_eval
daniel-de-leon-user293 ca6b21e
remove poetry.lock
daniel-de-leon-user293 d1adfb1
add unit tests (WIP)
daniel-de-leon-user293 bc6fe6c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c278d51
Merge branch 'opea-project:main' into daniel/toxicity-eval
daniel-de-leon-user293 fad54ab
add unittests and rm poetry
daniel-de-leon-user293 6f74ead
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4ccef79
fix deprecated HF API
daniel-de-leon-user293 54eec94
fix args typo for gaudi config
daniel-de-leon-user293 24245c5
clean up README
daniel-de-leon-user293 a3165d0
Merge branch 'main' into daniel/toxicity-eval
daniel-de-leon-user293 5ab67bc
aurpc probabilites fix
daniel-de-leon-user293 2fb0673
Merge branch 'main' into daniel/toxicity-eval
ashahba File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # Toxicity Detection Accuracy | ||
|
|
||
| Toxicity detection plays a critical role in guarding the inputs and outputs of large language models (LLMs) to ensure safe, respectful, and responsible content. Given the widespread use of LLMs in applications like customer service, education, and social media, there's a significant risk that they could inadvertently produce or amplify harmful language if toxicity is not detected effectively. | ||
|
|
||
| To evaluate a target toxicity detection LLM, we use seven datasets: BeaverTails, Jigsaw Unintended Bias, OpenAI Moderation, SurgeAI Toxicity, ToxicChat, ToxiGen, and XSTest. We also employ the most commonly used metrics in toxicity classification to provide a comprehensive assessment. Currently, the benchmark script supports benchmarking only one dataset at a time. Future work includes enabling benchmarking on multiple datasets simultaneously. The Gaudi 2 accelerator is deployed in the benchmark to address the high demand of the AI workload while balancing power efficiency. | ||
|
|
||
| - Supported Datasets | ||
| - [BeaverTails](https://huggingface.co/datasets/PKU-Alignment/BeaverTails) | ||
| - [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | ||
| - [OpenAI Moderation](https://github.com/openai/moderation-api-release/tree/main) | ||
| - [SurgeAI Toxicity](https://github.com/surge-ai/toxicity) | ||
| - [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat) | ||
| - [ToxiGen](https://huggingface.co/datasets/toxigen/toxigen-data) | ||
| - [XSTest](https://huggingface.co/datasets/walledai/XSTest) | ||
| - More datasets to come... | ||
|
|
||
| - Supported Metrics | ||
| - accuracy | ||
| - auprc (area under precision recall curve) | ||
| - auroc | ||
| - f1 | ||
| - fpr (false positive rate) | ||
| - precision | ||
| - recall | ||
|
|
||
| ## Get Started on Gaudi 2 Accelerator | ||
| ### Requirements | ||
| If you are using an `hpu` device, then clone the `optimum-habana` and the `GenAIEval` repositories. | ||
| ```bash | ||
| git clone https://github.com/huggingface/optimum-habana.git | ||
| git clone https://github.com/opea-project/GenAIEval | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ### Setup | ||
| If you're running behind corporate proxy, run Gaudi Docker with additional proxies and volume mount. | ||
| ```bash | ||
| DOCKER_RUN_ENVS="--env ftp_proxy=${ftp_proxy} --env FTP_PROXY=${FTP_PROXY} --env http_proxy=${http_proxy} --env HTTP_PROXY=${HTTP_PROXY} --env https_proxy=${https_proxy} --env HTTPS_PROXY=${HTTPS_PROXY} --env no_proxy=${no_proxy} --env NO_PROXY=${NO_PROXY} --env socks_proxy=${socks_proxy} --env SOCKS_PROXY=${SOCKS_PROXY} --env TF_ENABLE_MKL_NATIVE_FORMAT=1" | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| docker run --disable-content-trust ${DOCKER_RUN_ENVS} \ | ||
| -d --rm -it --name toxicity-detection-benchmark \ | ||
| -v ${PWD}:/workdir \ | ||
| --runtime=habana \ | ||
| -e HABANA_VISIBLE_DEVICES=all \ | ||
| -e OMPI_MCA_btl_vader_single_copy_mechanism=none \ | ||
| --cap-add=sys_nice \ | ||
| --net=host \ | ||
| --ipc=host \ | ||
| vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ### Evaluation | ||
| #### Execute interactive container | ||
| ```bash | ||
| docker exec -it toxicity-detection-benchmark bash | ||
| ``` | ||
| #### Navigate to `workdir` and install required packages | ||
| ```bash | ||
| cd /workdir | ||
| cd optimum-habana && pip install . && cd ../GenAIEval | ||
| pip install -r requirements.txt | ||
| pip install -e . | ||
| ``` | ||
|
|
||
| In case of [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [OpenAI Moderation](https://github.com/openai/moderation-api-release), and [Surge AI Toxicity](https://github.com/surge-ai/toxicity) datasets, make sure the datasets are downloaded and stored in current working directory. | ||
|
|
||
| #### Test the model and confirm the results are saved correctly | ||
| Navigate to the toxicity evaluation directory: | ||
| ```bash | ||
| cd evals/evaluation/toxicity_eval | ||
| ``` | ||
|
|
||
| Replace `MODEL_PATH` and `DATASET` with the appropriate path for the model and the name of the dataset. | ||
| ```bash | ||
| MODEL_PATH=Intel/toxic-prompt-roberta | ||
| DATASET=tc | ||
| python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} | ||
| cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json | ||
| ``` | ||
|
|
||
| If you are using an `hpu` device, you can instantiate the Gaudi configuration by passing the `GAUDI_CONFIG_NAME` variable with the appropriate configuration name. The default value for the device name (`device`) is `hpu`. | ||
| ```bash | ||
| MODEL_PATH=Intel/toxic-prompt-roberta | ||
| DATASET=tc | ||
| GAUDI_CONFIG_NAME=Habana/roberta-base | ||
| DEVICE_NAME=hpu | ||
| python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -g_config ${GAUDI_CONFIG_NAME} --device ${DEVICE_NAME} | ||
| cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json | ||
| ``` | ||
|
|
||
| For the Jigsaw Unintended Bias, OpenAI Moderation, and Surge AI Toxicity datasets, pass the path of the stored dataset path in place of `DATASET_PATH` | ||
| ```bash | ||
| MODEL_PATH=Intel/toxic-prompt-roberta | ||
| DATASET=jigsaw | ||
| DATASET_PATH=/path/to/dataset | ||
| python ./classification_metrics/scripts/benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -p ${DATASET_PATH} | ||
| cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json | ||
| ``` | ||
|
|
||
| ## Get Started on CPU | ||
|
|
||
| ### Requirements | ||
| * Linux system or WSL2 on Windows (validated on Ubuntu* 20.04/22.04 LTS) | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Python 3.9, 3.10 | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Poetry | ||
ashahba marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Installation | ||
| Follow the GenAIEval installation steps provided in the repository's main [README](https://github.com/daniel-de-leon-user293/GenAIEval/tree/daniel/toxicity-eval?tab=readme-ov-file#installation). | ||
|
|
||
| ### Evaluation | ||
| Navigate to the toxicity evaluation directory: | ||
| ```bash | ||
| cd evals/evaluation/toxicity_eval | ||
| ``` | ||
|
|
||
| In case of [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [OpenAI Moderation](https://github.com/openai/moderation-api-release), and [Surge AI Toxicity](https://github.com/surge-ai/toxicity), make sure the datasets are downloaded and stored in current working directory. | ||
|
|
||
| Replace `MODEL_PATH` and `DATASET` with the appropriate path for the model and the name of the dataset. For running the script on cpu device, replace the variable `DEVICE_NAME` with `cpu`. | ||
| ```bash | ||
| MODEL_PATH=Intel/toxic-prompt-roberta | ||
| DATASET=tc | ||
| DEVICE_NAME=cpu | ||
| python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} --device ${DEVICE_NAME} | ||
| ``` | ||
| You can find the evaluation results in the results folder: | ||
| ```bash | ||
| cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json | ||
| ``` | ||
| For the Jigsaw Unintended Bias, OpenAI Moderation, and Surge AI Toxicity datasets, pass the path of the stored dataset path in place of `DATASET_PATH` | ||
|
|
||
| ```bash | ||
| MODEL_PATH=Intel/toxic-prompt-roberta | ||
| DATASET=jigsaw | ||
| DATASET_PATH=/path/to/dataset | ||
| DEVICE_NAME=cpu | ||
| python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -p ${DATASET_PATH} --device ${DEVICE_NAME} | ||
| ``` | ||
| You can find the evaluation results in the results folder: | ||
| ```bash | ||
| cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Copyright (C) 2025 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.