Legal Appeal Outcome Prediction System

An AI-powered system for predicting appeal case outcomes and generating compelling legal briefs

📋 Table of Contents

Overview
Key Features
Architecture
Technology Stack
Prerequisites
Installation
Configuration
Usage
API Documentation
Project Structure
Model Training
Development
Deployment
Limitations & Disclaimers
Contributing
License

🎯 Overview

The Legal Appeal Outcome Prediction System is a comprehensive, production-ready application that predicts the likelihood of success for appealed legal cases from the defendant/appellant's perspective. The system uses advanced machine learning (LegalBERT embeddings + MLP Classifier) to analyze legal text and provides:

Binary outcome predictions (win/lose) with confidence scores
Outcome likelihood analysis for specific appeal results (reversed, granted, affirmed, denied, dismissed, remanded)
Similar precedent case discovery using cosine similarity
AI-generated legal briefs based on case facts and winning precedents
Brief-based prediction simulation to measure improvement in case strength
Fact extraction and editing capabilities for iterative refinement

Important: This system predicts outcomes for appealed cases, not trial cases. Predictions are based on historical appeal case data and should not be considered legal advice.

✨ Key Features

1. Appeal Outcome Prediction

Predicts appeal success/failure using LegalBERT embeddings
Win = Successful appeal (reversed, granted)
Lose = Unsuccessful appeal (affirmed, denied, dismissed, remanded)
Provides probability scores and confidence metrics
Shows likelihood percentages for specific appeal outcomes

2. Case Fact Extraction & Management

Automatically extracts key factual elements from legal text using GPT-4o-mini
Editable fact list for user refinement
Re-prediction based on edited facts
Facts-driven similarity search for precedents

3. Similar Precedent Discovery

Find similar appeal cases using cosine similarity on LegalBERT embeddings
Searchable by full text or extracted facts
Configurable number of precedents (1-10)
Displays original outcome labels (REVERSED, GRANTED, AFFIRMED, etc.)
Shows case snippets, similarity scores, and metadata

4. AI-Generated Legal Briefs

Generates compelling appellate briefs based on case facts
Uses only winning precedents (defendant/appellant prevailed)
Professional legal formatting and structure
Improvement feature: Regenerate briefs with user instructions
Download as PDF or Word document (RTF format)
Properly formatted without markdown characters

5. Brief-Based Prediction Simulation

Simulate prediction outcomes using generated legal briefs
Compare original vs brief-based predictions
Calculate improvement in defendant's chances
Visual change analysis with color-coded indicators
Explains why brief improves/decreases chances

6. Intelligent Legal Judgment Language

Converts technical "win/lose" predictions to proper legal terminology
Automatically infers case type (criminal vs civil) from nature of suit
Displays: "Judgment in Favor of Defendant" or "Judgment in Favor of Plaintiff/Government"
Supports any nature of suit (contract, tort, civil rights, employment, etc.)

7. RAG-Powered Explanations

Natural language explanations using GPT-4o-mini
Retrieval-Augmented Generation (RAG) for documentation-based answers
Defendant/appellant-focused explanations
Incorporates outcome likelihoods with legal reasoning
Fallback to template-based explanations if GPT unavailable

🏗️ Architecture

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (Next.js)                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  Prediction  │  │   Results    │  │   Brief      │     │
│  │    Form      │→ │    Page      │→ │  Generator   │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└───────────────────────┬─────────────────────────────────────┘
                        │ HTTP/REST API
┌───────────────────────▼─────────────────────────────────────┐
│                   Backend (FastAPI)                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Predict    │  │   Similar    │  │    Brief     │     │
│  │   Router     │  │    Router    │  │   Router     │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                 │                  │              │
│  ┌──────▼─────────────────▼──────────────────▼──────┐      │
│  │              Utility Modules                      │      │
│  │  • Fact Extraction  • Legal Judgment              │      │
│  │  • Explanation Gen  • Model Loader                │      │
│  │  • Embedding Gen    • Feature Importance          │      │
│  └──────┬───────────────────────────────────────────┘      │
│         │                                                    │
│  ┌──────▼───────────────────────────────────────────┐       │
│  │         ML Pipeline                               │       │
│  │  LegalBERT → Embeddings → MLP Classifier          │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

Data Flow

User Input → Legal text or case facts
Text Processing → Fact extraction (GPT-4o-mini) + Embedding generation (LegalBERT)
Prediction → MLP Classifier → Win/Lose + Confidence
Enhancement → Legal judgment conversion + Outcome likelihoods
Explanation → GPT-4o-mini generates natural language explanation
Similar Cases → Cosine similarity search on embeddings
Brief Generation → GPT-4o-mini creates legal brief from facts + precedents

🛠️ Technology Stack

Backend

Framework: FastAPI 0.104+
ML/NLP:
- LegalBERT (nlpaueb/legal-bert-base-uncased) via sentence-transformers
- scikit-learn (MLP Classifier, preprocessing)
- PyTorch 2.1.0
- Transformers 4.35.0
AI/LLM: OpenAI GPT-4o-mini (for explanations, fact extraction, brief generation)
Data Processing: pandas, numpy
API: Pydantic (validation), uvicorn (ASGI server)
Environment: python-dotenv

Frontend

Framework: Next.js 14.0 (React 18.2)
Language: TypeScript 5.3
Styling: Tailwind CSS 3.4
UI Components:
- ShadCN UI
- Radix UI (Dialog, Label, Slot)
Markdown: react-markdown
PDF Generation: jsPDF 3.0.4
Icons: lucide-react

Development Tools

Notebooks: Jupyter, ipykernel
Visualization: matplotlib, seaborn
Model Interpretation: SHAP

📦 Prerequisites

Python: 3.8 or higher
Node.js: 18.0 or higher
npm or yarn
OpenAI API Key (optional, for GPT features)
Jupyter (for model training)

🚀 Installation

1. Clone the Repository

git clone <repository-url>
cd capstone1

2. Backend Setup

Option A: Using Conda (Recommended)

# Navigate to backend directory
cd backend

# Activate your conda environment
conda activate tf_clean

# Install dependencies
pip install -r requirements.txt

Option B: Using Python venv

# Navigate to backend directory
cd backend

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Model Training

Before running predictions, you need to train the model:

# Start Jupyter
jupyter notebook

# Open and run: notebooks/train_model.ipynb
# This will generate:
# - models/model.pkl (trained classifier)
# - models/label_encoder.pkl (label encoder)
# - models/embeddings.npy (precomputed embeddings)
# - models/clean_dataset.csv (cleaned dataset)

4. Environment Configuration

Create a .env file in the backend/ directory:

# backend/.env
OPENAI_API_KEY=sk-your-openai-api-key-here

Note: The system works without OPENAI_API_KEY, but GPT-powered features (explanations, fact extraction, brief generation) will use fallback methods.

5. Frontend Setup

# Navigate to frontend directory
cd ../frontend

# Install dependencies
npm install

# Or with yarn
yarn install

⚙️ Configuration

Backend Configuration

Environment Variables (backend/.env):

OPENAI_API_KEY=sk-...  # Required for GPT features (optional)

API Configuration (backend/main.py):

Default port: 8000
CORS origins: localhost:3000, localhost:3001
Adjust in main.py if needed

Frontend Configuration

API URL (frontend/lib/api.ts):

const API_BASE_URL = process.env.NEXT_PUBLIC_API_URL || "http://localhost:8000";

Set NEXT_PUBLIC_API_URL in .env.local for production:

NEXT_PUBLIC_API_URL=https://your-api-domain.com

💻 Usage

Starting the Backend

Using Conda:

cd backend
conda activate tf_clean
python main.py

Using venv:

cd backend
source venv/bin/activate  # On macOS/Linux
# OR
venv\Scripts\activate      # On Windows
python main.py

The API will be available at http://localhost:8000

API Docs: http://localhost:8000/docs (Swagger UI)
Health Check: http://localhost:8000/health

Starting the Frontend

cd frontend
npm run dev

The application will be available at http://localhost:3000

Using the Application

Navigate to Prediction Page: /predict
Enter Legal Text: Paste your legal opinion text or case narrative
Optional Metadata: Add court, jurisdiction, nature of suit, year
Submit: Click "Predict Outcome"
View Results:
- Prediction (legal judgment)
- Probability and confidence
- Outcome likelihoods
- Extracted facts (editable)
- Explanation
- Similar precedents
Generate Brief: Click "Generate Legal Brief" in facts section
Simulate Prediction: Click "Simulate Prediction" to see brief's impact
Download: Export brief as PDF or Word document

📚 API Documentation

Base URL

http://localhost:8000

Endpoints

1. POST `/api/predict/`

Predict appeal case outcome from legal text or facts.

Request Body:

{
  "text": "Legal opinion text...", // Optional if facts provided
  "facts": ["Fact 1", "Fact 2"], // Optional if text provided
  "court": "scotus", // Optional
  "jurisdiction": "federal", // Optional
  "nature_of_suit": "criminal", // Optional
  "year": 2024 // Optional
}

Response:

{
  "prediction": "win",
  "legal_judgment": "Judgment in Favor of Defendant",
  "probability": 0.85,
  "confidence": 0.87,
  "extracted_facts": [
    "Defendant/appellant is accused of...",
    "Trial court ruled against..."
  ],
  "outcome_likelihoods": {
    "reversed": 45.2,
    "granted": 12.8,
    "affirmed": 25.1,
    "denied": 10.5,
    "dismissed": 4.2,
    "remanded": 2.2
  },
  "top_features": [...],
  "explanation": "Based on the case facts..."
}

2. POST `/api/similar/`

Find similar appeal cases using cosine similarity.

Request Body:

{
  "text": "Legal text...", // Optional if facts provided
  "facts": ["Fact 1", "Fact 2"], // Optional if text provided
  "top_k": 5 // Optional, default: 5, max: 10
}

Response:

{
  "similar_cases": [
    {
      "case_name": "Smith v. Jones",
      "snippet": "Case text snippet...",
      "similarity": 0.92,
      "outcome": "win",
      "original_outcome": "REVERSED",
      "full_text": "Full case text...",
      "court": "scotus",
      "date_filed": "2020-01-15",
      "docket_id": "12345"
    }
  ]
}

3. POST `/api/brief/`

Generate legal brief based on case facts and precedents.

Request Body:

{
  "facts": ["Fact 1", "Fact 2"],
  "similar_cases": [...],           // Optional
  "nature_of_suit": "criminal",      // Optional
  "legal_judgment": "...",           // Optional
  "improvement_instructions": "...", // Optional (for regeneration)
  "existing_brief": "..."           // Optional (for regeneration)
}

Response:

{
  "brief": "Legal brief text in markdown format...",
  "case_citations": ["Case Name (REVERSED)", "Another Case (GRANTED)"]
}

4. POST `/api/rag/`

Answer questions about the system using RAG.

Request Body:

{
  "question": "How does the prediction model work?"
}

Response:

{
  "answer": "The prediction model uses LegalBERT embeddings...",
  "retrieved_docs": [...]
}

5. GET `/health`

Health check endpoint.

Response:

{
  "status": "healthy"
}

📁 Project Structure

capstone1/
├── backend/
│   ├── data/                          # Raw data files
│   │   ├── opinions_checkpoint.csv
│   │   └── courtlistener_dockets_partial.csv
│   ├── models/                        # Trained models (generated)
│   │   ├── model.pkl                  # MLP Classifier
│   │   ├── label_encoder.pkl          # Label encoder
│   │   ├── embeddings.npy             # Precomputed embeddings
│   │   └── clean_dataset.csv          # Cleaned dataset
│   ├── notebooks/
│   │   └── train_model.ipynb          # Model training notebook
│   ├── routers/                       # API route handlers
│   │   ├── predict.py                 # Prediction endpoint
│   │   ├── similar.py                 # Similar cases endpoint
│   │   ├── rag.py                     # RAG endpoint
│   │   ├── brief.py                   # Legal brief generation
│   │   └── schemas.py                 # Pydantic models
│   ├── utils/                         # Utility modules
│   │   ├── embedding.py               # LegalBERT embedding utilities
│   │   ├── model_loader.py            # Model loading & prediction
│   │   ├── fact_extraction.py         # GPT-based fact extraction
│   │   ├── legal_judgment.py          # Legal judgment language conversion
│   │   ├── explanation.py             # GPT-based explanation generation
│   │   ├── feature_importance.py       # Feature importance extraction
│   │   └── rag_index.py               # RAG document indexing
│   ├── rag_docs/                      # RAG documentation
│   │   ├── explanation_guide.md
│   │   ├── modeling_report.md
│   │   ├── data_dictionary.md
│   │   ├── system_limitations.md
│   │   └── CHANGELOG.md
│   ├── main.py                        # FastAPI application entry point
│   ├── requirements.txt               # Python dependencies
│   └── .env                           # Environment variables (create this)
│
└── frontend/
    ├── app/                           # Next.js app directory
    │   ├── page.tsx                   # Landing page
    │   ├── layout.tsx                 # Root layout
    │   ├── predict/
    │   │   ├── page.tsx               # Prediction form page
    │   │   └── result/
    │   │       └── page.tsx           # Results display page
    │   └── components/                # React components
    │       ├── Form.tsx               # Prediction input form
    │       ├── ResultCard.tsx         # Prediction result display
    │       ├── PrecedentCard.tsx      # Similar case card
    │       └── ui/                    # ShadCN UI components
    │           ├── button.tsx
    │           ├── card.tsx
    │           ├── dialog.tsx
    │           └── ...
    ├── lib/                           # Utility libraries
    │   ├── api.ts                     # API client functions
    │   └── utils.ts                   # Helper functions
    ├── styles/
    │   └── globals.css                # Global styles
    ├── package.json                   # Node.js dependencies
    ├── tailwind.config.js             # Tailwind configuration
    ├── tsconfig.json                  # TypeScript configuration
    └── next.config.js                 # Next.js configuration

🧪 Model Training

The model training process is documented in backend/notebooks/train_model.ipynb.

Training Pipeline

Data Loading: Merges docket and opinion CSVs
Text Cleaning:
- Removes outcome-revealing words (AFFIRMED, REVERSED, etc.)
- Tail-scrubbing (removes last 2000 chars of procedural boilerplate)
- Pattern removal
Label Creation: Binary win/lose labels from appeal outcomes
Embedding Generation: LegalBERT embeddings (768 dimensions)
Model Training: Trains and compares:
- Logistic Regression
- Random Forest
- Gradient Boosting
- SVC (RBF)
- MLP Classifier (typically best)
Model Selection: Automatically selects best model
Artifact Saving: Saves model, encoder, embeddings, dataset

Running Training

cd backend
jupyter notebook notebooks/train_model.ipynb
# Run all cells

🔧 Development

Backend Development

cd backend
source venv/bin/activate  # Activate virtual environment
python main.py            # Run development server

Code Structure:

Routers: API endpoint handlers (routers/)
Utils: Reusable utility functions (utils/)
Schemas: Pydantic models for validation (routers/schemas.py)

Frontend Development

cd frontend
npm run dev              # Development server
npm run build            # Production build
npm run start            # Production server
npm run lint             # Lint code

Code Structure:

Pages: Next.js pages (app/)
Components: Reusable React components (app/components/)
API Client: API communication (lib/api.ts)

Code Style

Backend: Follow PEP 8 (Python style guide)
Frontend: ESLint + TypeScript strict mode
Type Hints: Use type hints in Python
Documentation: Docstrings for all functions

🚢 Deployment

Backend Deployment

Production Server:

uvicorn main:app --host 0.0.0.0 --port 8000

Environment Variables: Set OPENAI_API_KEY in production environment
Model Files: Ensure models/ directory is accessible

Frontend Deployment

Build:
```
npm run build
```
Environment: Set NEXT_PUBLIC_API_URL to production API URL
Deploy: Deploy out/ directory to hosting service (Vercel, Netlify, etc.)

Docker (Optional)

Create Dockerfile for containerized deployment:

# Backend Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

⚠️ Limitations & Disclaimers

Important Disclaimers

Not Legal Advice: This system provides statistical predictions, not legal advice
Appeal Cases Only: Predictions are for appealed cases, not trial cases
Educational Purpose: For research and educational purposes only
No Guarantees: Predictions are estimates based on historical data

Technical Limitations

Data Quality: Performance depends on training data quality
Binary Classification: Oversimplifies complex legal outcomes
Text Cleaning: May remove important context
Static Model: Doesn't adapt to new legal concepts automatically
GPT Dependencies: Requires OpenAI API key for full functionality

See backend/rag_docs/system_limitations.md for detailed limitations.

🤝 Contributing

This is a capstone project. For questions, issues, or contributions:

Review the codebase structure
Follow existing code style
Add tests for new features
Update documentation
Submit pull requests with clear descriptions

📄 License

This project is for educational purposes.

👥 Authors

Project Maintainer: [Your Name]
Institution: [Your Institution]
Year: 2024

🙏 Acknowledgments

LegalBERT: Pre-trained legal language model
CourtListener: Legal case data
OpenAI: GPT-4o-mini for explanations and brief generation
FastAPI: Modern Python web framework
Next.js: React framework
ShadCN UI: UI component library

📞 Support

For issues, questions, or feature requests, please open an issue in the repository.

📝 Additional Documentation

SETUP.md: Detailed setup and troubleshooting guide
CONTRIBUTING.md: Contribution guidelines
Backend Documentation: See backend/rag_docs/ for system documentation

🔒 Security Notes

Never commit .env files - They contain sensitive API keys
Review .gitignore - Ensure sensitive files are excluded
Use environment variables - Store secrets in environment, not code

📦 Large Files

This repository may contain large model files (.pkl, .npy) and data files (.csv).

Options:

Use Git LFS (recommended): Configured via .gitattributes
Exclude from Git: Uncomment relevant lines in .gitignore
Host separately: Store models/data in cloud storage

To use Git LFS:

git lfs install
git lfs track "*.pkl"
git lfs track "*.npy"
git lfs track "*.csv"

Built with ❤️ for legal research and education

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
backend		backend
frontend		frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_SETUP.md		GITHUB_SETUP.md
LICENSE		LICENSE
PRE_UPLOAD_CHECKLIST.md		PRE_UPLOAD_CHECKLIST.md
README.md		README.md
SETUP.md		SETUP.md

License

Shelton-beep/capstone1

Folders and files

Latest commit

History

Repository files navigation

Legal Appeal Outcome Prediction System

📋 Table of Contents

🎯 Overview

✨ Key Features

1. Appeal Outcome Prediction

2. Case Fact Extraction & Management

3. Similar Precedent Discovery

4. AI-Generated Legal Briefs

5. Brief-Based Prediction Simulation

6. Intelligent Legal Judgment Language

7. RAG-Powered Explanations

🏗️ Architecture

System Architecture

Data Flow

🛠️ Technology Stack

Backend

Frontend

Development Tools

📦 Prerequisites

🚀 Installation

1. Clone the Repository

2. Backend Setup

3. Model Training

4. Environment Configuration

5. Frontend Setup

⚙️ Configuration

Backend Configuration

Frontend Configuration

💻 Usage

Starting the Backend

Starting the Frontend

Using the Application

📚 API Documentation

Base URL

Endpoints

1. POST /api/predict/

2. POST /api/similar/

3. POST /api/brief/

4. POST /api/rag/

5. GET /health

📁 Project Structure

🧪 Model Training

Training Pipeline

Running Training

🔧 Development

Backend Development

Frontend Development

Code Style

🚢 Deployment

Backend Deployment

Frontend Deployment

Docker (Optional)

⚠️ Limitations & Disclaimers

Important Disclaimers

Technical Limitations

🤝 Contributing

📄 License

👥 Authors

🙏 Acknowledgments

📞 Support

📝 Additional Documentation

🔒 Security Notes

📦 Large Files

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

1. POST `/api/predict/`

2. POST `/api/similar/`

3. POST `/api/brief/`

4. POST `/api/rag/`

5. GET `/health`

Packages