Skip to content

USC-FORTIS/NLP-ADBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NLP-ADBench

Overview

NLP-ADBench is a benchmarking tool for Anomaly Detection in Natural Language Processing. It provides a comprehensive evaluation of the performance of various anomaly detection algorithms on a wide range of NLP datasets. The tool is designed to evaluate the performance of anomaly detection algorithms on NLP datasets using a two-step approach.

NLPAD Datasets

The datasets required for this project can be downloaded from the following huggingface links:

  1. NLPAD Datasets: These are the datasets mentioned in NLP-ADBench paper. You can download them from:

  2. Pre-Extracted Embeddings: We evaluate 8 two-step NLP-AD algorithms that rely on embeddings generated by models such as BERT bert-base-uncased and OpenAI text-embedding-3-large model. These algorithms are designed to work with structured numerical data and cannot directly process raw textual data, requiring text transformation into numerical embeddings. We have already extracted these embeddings for your convenience. If you want to use them directly, you can download them from:

Step by Step Instructions for Running the Benchmark

Environment Setup Instructions

Follow these steps to set up the development environment using the provided Conda environment file:

  1. Install Anaconda or Miniconda: Download and install Anaconda or Miniconda from here.

  2. Create the Environment: Using the terminal, navigate to the directory containing the environment.yml file and run:

    conda env create -f environment.yml
  3. Activate the Environment: Activate the newly created environment using:

    conda activate nlpad

Import data

Get Pre-Extracted Embeddings data from the huggingface link and put it in the data folder.

Place all downloaded embeddings data into the ./feature folder in the ./benchmark directory of this project.

Run the code

Run the following commands from the ./benchmark directory of the project:

BERT

If you want to run a benchmark using data embedded with BERT's bert-base-uncased model, use this command:

python [your_script_name].py bert

OpenAI

If you want to run a benchmark using data embedded with OpenAI's text-embedding-3-large model, use this command:

python [your_script_name].py gpt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages