A modern CLI that adds natural language explanations to labeled datasets using AI models (Google Gemini and OpenAI GPT).
This tool is primarily designed for the UStanceBR corpus — a collection of stance detection datasets, composed by tweets annotated with "for" or "against" labels across multiple political targets. It performs two main tasks:
- Explanation Generation: Generates explanations for existing human-labeled data
- Classification + Explanation: Classifies unlabeled text and provides explanations for the classifications
- Dual Processing: Explains human labels AND performs LLM classification with explanations
- Batch Processing: Processes data in batches of 100 for efficiency
- Checkpoint System: Automatically saves progress and can resume from interruptions
- Multi-Model Support: Works with GPT-5, Gemini 2.0 Flash, and Gemini 2.5 Pro
- Excel Compatibility: Reads and writes Excel files with structured data
-
Install Bun (JavaScript runtime):
curl -fsSL https://bun.sh/install | bashFor more options, visit the Bun installation guide.
-
Clone the Repository:
git clone https://github.com/yourusername/label-explainer.git cd label-explainer -
Install Dependencies:
bun install
-
Set Up API Keys: Create a
.envfile in the root directory:# For Google Gemini models: GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key_here # For OpenAI GPT models: OPENAI_API_KEY=your_openai_api_key_here
- Get Google API key: Google AI Studio
- Get OpenAI API key: OpenAI Platform
-
Create Data Directory:
mkdir train_test
-
Prepare Your Excel Files: Place your Excel files in the
train_testdirectory with the following naming convention:- Training files:
{target}_train.xlsx - Test files:
{target}_test.xlsx
Example:
bolsonaro_train.xlsx,bolsonaro_test.xlsx - Training files:
-
Excel File Format: Your Excel files should have the following structure:
- Column A: Tweet text
- Column C: Label (for/against) - for explanation tasks
The tool will add:
- Column G: Human label explanations
- Column H: LLM-generated labels
- Column I: LLM label explanations
Process all targets with the default model (Gemini 2.0 Flash):
bun run process# Use a specific model
bun run process -m gemini-2.5-pro
# Process specific targets with specific model
bun run process -m gpt-5 -t bolsonaro -t lula
# Clear previous checkpoints and start fresh
bun run process --clear-checkpoints
# Show help
bun run process --helpgemini-2.0-flash(default) - Fast and efficientgemini-2.5-pro- More accurate but slowergpt-5- OpenAI's latest model (used with low thinking)
Default targets for UStanceBR corpus:
bolsonarocloroquinacoronavacgloboigrejalula
The tool performs the following steps for each dataset:
- Load Data: Reads Excel files from
train_testdirectory - Explain Human Labels: Generates explanations for existing labels
- Classify with LLM: Uses AI to classify texts independently
- Generate LLM Explanations: Provides explanations for AI classifications
- Save Results: Outputs processed Excel file with all annotations
label-explainer/
├── src/
│ ├── prompts/ # AI prompt templates
│ │ ├── explanation.ts # Prompt for explaining existing labels
│ │ └── classification.ts # Prompt for classifying and explaining
│ ├── services/ # Core services
│ │ ├── batch-processor.ts # Batch processing logic
│ │ ├── checkpoint.ts # Progress saving/resuming
│ │ └── excel.ts # Excel file operations
│ ├── utils/ # Utility functions
│ │ ├── common.ts # Common utilities
│ │ └── models.ts # AI model configurations
│ ├── process.ts # Main processing script
│ └── compare.ts # Comparison tool
├── train_test/ # Input data directory
├── dataset/
│ └── checkpoints/ # Progress checkpoints
└── README.md
The tool automatically saves progress after each batch:
- Checkpoints are stored in
dataset/checkpoints/ - If processing is interrupted, simply run the command again to resume
- Use
--clear-checkpointsto start fresh
The tool generates Excel files with the following columns:
| Column | Content |
|---|---|
| A | Original text |
| C | Human label (if provided) |
| G | Human label explanation |
| H | LLM-generated label |
| I | LLM label explanation |
Output files are named: processed-{model}-{target}-{train/test}.xlsx
- Update
src/utils/models.tswith your model configuration - Add the model type to
ModelTypetype definition - Update the model selection logic
Prompts are stored in src/prompts/:
explanation.ts- For explaining existing labelsclassification.ts- For classification tasks
MIT — do what you want, just give credit ✨
Built for processing the UStanceBR corpus and designed to be extensible for other NLP tasks.