Skip to content

Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)

Notifications You must be signed in to change notification settings

DFKI-NLP/LLMCheckup

Repository files navigation

LLMCheckup

Static Badge Static Badge Static Badge Static Badge

Dialogical Interpretability Tool for LLMs

💥Running with conda / virtualenv

Note: Please use Python 3.8+ and torch 2.0+

Create the environment and install dependencies

Conda

conda create -n llmcheckup python=3.9
conda activate llmcheckup

venv

python -m venv venv
source venv/venv/activate

⚙️Install the requirements

python -m pip install --upgrade pip
pip install -r requirements.txt
python -m nltk.downloader "averaged_perceptron_tagger" "wordnet" "omw-1.4"

🚀Launch system

python flask_app.py

💟Supported explainability methods

  • Feature Attribution
    • Attention, Integrated gradient, etc.
    • Implemented by 🐛inseq package
  • Semantic Similarity
  • Free-text rationalization
    • Zero-shot CoT
    • Plan-and-Solve
    • Optimization by PROmpting (OPRO)
    • Any customized additional prompt according to users' wish
    • Notice: Above mentioned options can be freely chosen in the interface - "Prompt modification"
  • Data Augmentation
    • Implemented by NLPAug package or few-shot prompting
  • Counterfactual Generation

🤗Models:

In our study, we identified three LLMs for our purposes.

🐳Deployment:

We support different methods for deployment:

✏️Support:

Method Unix-based Windows
Original
GPTQ
Bitsanbytes*
petals**

*: 🪟 For Windows: if you encounter errors while installing bitsandbytes, then try:

python -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

**: petals is currently not supported in windows, since a lot of Unix-specific things are used in petals. See issue here. petals is still usable if running LLMCheckup in Docker or WSL 2.

🔍Use case:

Fact checking

Dataset: COVID-Fact

Link: https://github.com/asaakyan/covidfact

Structure

{
    Claim: ...,
    Evidence: ...,
    Label: ...,
}

Commonsense Question Answering

Dataset: ECQA

Link: https://github.com/dair-iitd/ECQA-Dataset

{
    Question: ...,
    Multiple choices: ...,
    Correct answer: ...,
    Positive explanation: ...,
    Negative explanation: ...,
    Free-flow explanation: ...,
}

📝Input with multi modalities

  • Text
  • Image
    • Image upload
    • Optical Character Recognition
  • Audio
    • A lightweight fairseq s2t model from meta
    • If you encounter errors when reading recorded files: soundfile.LibsndfileError: Error opening path_to_wav: Format not recognised.. Then please try to install ffmpeg.
      • 🐧In Linux: sudo apt install ffmpeg or pip3 install ffmpeg
      • 🪟In Windows: Download ffmpeg from here and add the path to system environment

About

Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages