- Overview
- Key Features
- Architecture
- Technology Stack
- Prerequisites
- Installation
- Configuration
- Usage
- API Documentation
- Project Structure
- Model Training
- Development
- Deployment
- Limitations & Disclaimers
- Contributing
- License
The Legal Appeal Outcome Prediction System is a comprehensive, production-ready application that predicts the likelihood of success for appealed legal cases from the defendant/appellant's perspective. The system uses advanced machine learning (LegalBERT embeddings + MLP Classifier) to analyze legal text and provides:
- Binary outcome predictions (win/lose) with confidence scores
- Outcome likelihood analysis for specific appeal results (reversed, granted, affirmed, denied, dismissed, remanded)
- Similar precedent case discovery using cosine similarity
- AI-generated legal briefs based on case facts and winning precedents
- Brief-based prediction simulation to measure improvement in case strength
- Fact extraction and editing capabilities for iterative refinement
Important: This system predicts outcomes for appealed cases, not trial cases. Predictions are based on historical appeal case data and should not be considered legal advice.
- Predicts appeal success/failure using LegalBERT embeddings
- Win = Successful appeal (reversed, granted)
- Lose = Unsuccessful appeal (affirmed, denied, dismissed, remanded)
- Provides probability scores and confidence metrics
- Shows likelihood percentages for specific appeal outcomes
- Automatically extracts key factual elements from legal text using GPT-4o-mini
- Editable fact list for user refinement
- Re-prediction based on edited facts
- Facts-driven similarity search for precedents
- Find similar appeal cases using cosine similarity on LegalBERT embeddings
- Searchable by full text or extracted facts
- Configurable number of precedents (1-10)
- Displays original outcome labels (REVERSED, GRANTED, AFFIRMED, etc.)
- Shows case snippets, similarity scores, and metadata
- Generates compelling appellate briefs based on case facts
- Uses only winning precedents (defendant/appellant prevailed)
- Professional legal formatting and structure
- Improvement feature: Regenerate briefs with user instructions
- Download as PDF or Word document (RTF format)
- Properly formatted without markdown characters
- Simulate prediction outcomes using generated legal briefs
- Compare original vs brief-based predictions
- Calculate improvement in defendant's chances
- Visual change analysis with color-coded indicators
- Explains why brief improves/decreases chances
- Converts technical "win/lose" predictions to proper legal terminology
- Automatically infers case type (criminal vs civil) from nature of suit
- Displays: "Judgment in Favor of Defendant" or "Judgment in Favor of Plaintiff/Government"
- Supports any nature of suit (contract, tort, civil rights, employment, etc.)
- Natural language explanations using GPT-4o-mini
- Retrieval-Augmented Generation (RAG) for documentation-based answers
- Defendant/appellant-focused explanations
- Incorporates outcome likelihoods with legal reasoning
- Fallback to template-based explanations if GPT unavailable
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Prediction β β Results β β Brief β β
β β Form ββ β Page ββ β Generator β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β HTTP/REST API
βββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Predict β β Similar β β Brief β β
β β Router β β Router β β Router β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β ββββββββΌββββββββββββββββββΌβββββββββββββββββββΌβββββββ β
β β Utility Modules β β
β β β’ Fact Extraction β’ Legal Judgment β β
β β β’ Explanation Gen β’ Model Loader β β
β β β’ Embedding Gen β’ Feature Importance β β
β ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββΌββββββββββββββββββββββββββββββββββββββββββββ β
β β ML Pipeline β β
β β LegalBERT β Embeddings β MLP Classifier β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- User Input β Legal text or case facts
- Text Processing β Fact extraction (GPT-4o-mini) + Embedding generation (LegalBERT)
- Prediction β MLP Classifier β Win/Lose + Confidence
- Enhancement β Legal judgment conversion + Outcome likelihoods
- Explanation β GPT-4o-mini generates natural language explanation
- Similar Cases β Cosine similarity search on embeddings
- Brief Generation β GPT-4o-mini creates legal brief from facts + precedents
- Framework: FastAPI 0.104+
- ML/NLP:
- LegalBERT (
nlpaueb/legal-bert-base-uncased) viasentence-transformers - scikit-learn (MLP Classifier, preprocessing)
- PyTorch 2.1.0
- Transformers 4.35.0
- LegalBERT (
- AI/LLM: OpenAI GPT-4o-mini (for explanations, fact extraction, brief generation)
- Data Processing: pandas, numpy
- API: Pydantic (validation), uvicorn (ASGI server)
- Environment: python-dotenv
- Framework: Next.js 14.0 (React 18.2)
- Language: TypeScript 5.3
- Styling: Tailwind CSS 3.4
- UI Components:
- ShadCN UI
- Radix UI (Dialog, Label, Slot)
- Markdown: react-markdown
- PDF Generation: jsPDF 3.0.4
- Icons: lucide-react
- Notebooks: Jupyter, ipykernel
- Visualization: matplotlib, seaborn
- Model Interpretation: SHAP
- Python: 3.8 or higher
- Node.js: 18.0 or higher
- npm or yarn
- OpenAI API Key (optional, for GPT features)
- Jupyter (for model training)
git clone <repository-url>
cd capstone1Option A: Using Conda (Recommended)
# Navigate to backend directory
cd backend
# Activate your conda environment
conda activate tf_clean
# Install dependencies
pip install -r requirements.txtOption B: Using Python venv
# Navigate to backend directory
cd backend
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtBefore running predictions, you need to train the model:
# Start Jupyter
jupyter notebook
# Open and run: notebooks/train_model.ipynb
# This will generate:
# - models/model.pkl (trained classifier)
# - models/label_encoder.pkl (label encoder)
# - models/embeddings.npy (precomputed embeddings)
# - models/clean_dataset.csv (cleaned dataset)Create a .env file in the backend/ directory:
# backend/.env
OPENAI_API_KEY=sk-your-openai-api-key-hereNote: The system works without
OPENAI_API_KEY, but GPT-powered features (explanations, fact extraction, brief generation) will use fallback methods.
# Navigate to frontend directory
cd ../frontend
# Install dependencies
npm install
# Or with yarn
yarn installEnvironment Variables (backend/.env):
OPENAI_API_KEY=sk-... # Required for GPT features (optional)API Configuration (backend/main.py):
- Default port:
8000 - CORS origins:
localhost:3000,localhost:3001 - Adjust in
main.pyif needed
API URL (frontend/lib/api.ts):
const API_BASE_URL = process.env.NEXT_PUBLIC_API_URL || "http://localhost:8000";Set NEXT_PUBLIC_API_URL in .env.local for production:
NEXT_PUBLIC_API_URL=https://your-api-domain.comUsing Conda:
cd backend
conda activate tf_clean
python main.pyUsing venv:
cd backend
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
python main.pyThe API will be available at http://localhost:8000
- API Docs:
http://localhost:8000/docs(Swagger UI) - Health Check:
http://localhost:8000/health
cd frontend
npm run devThe application will be available at http://localhost:3000
- Navigate to Prediction Page:
/predict - Enter Legal Text: Paste your legal opinion text or case narrative
- Optional Metadata: Add court, jurisdiction, nature of suit, year
- Submit: Click "Predict Outcome"
- View Results:
- Prediction (legal judgment)
- Probability and confidence
- Outcome likelihoods
- Extracted facts (editable)
- Explanation
- Similar precedents
- Generate Brief: Click "Generate Legal Brief" in facts section
- Simulate Prediction: Click "Simulate Prediction" to see brief's impact
- Download: Export brief as PDF or Word document
http://localhost:8000
Predict appeal case outcome from legal text or facts.
Request Body:
{
"text": "Legal opinion text...", // Optional if facts provided
"facts": ["Fact 1", "Fact 2"], // Optional if text provided
"court": "scotus", // Optional
"jurisdiction": "federal", // Optional
"nature_of_suit": "criminal", // Optional
"year": 2024 // Optional
}Response:
{
"prediction": "win",
"legal_judgment": "Judgment in Favor of Defendant",
"probability": 0.85,
"confidence": 0.87,
"extracted_facts": [
"Defendant/appellant is accused of...",
"Trial court ruled against..."
],
"outcome_likelihoods": {
"reversed": 45.2,
"granted": 12.8,
"affirmed": 25.1,
"denied": 10.5,
"dismissed": 4.2,
"remanded": 2.2
},
"top_features": [...],
"explanation": "Based on the case facts..."
}Find similar appeal cases using cosine similarity.
Request Body:
{
"text": "Legal text...", // Optional if facts provided
"facts": ["Fact 1", "Fact 2"], // Optional if text provided
"top_k": 5 // Optional, default: 5, max: 10
}Response:
{
"similar_cases": [
{
"case_name": "Smith v. Jones",
"snippet": "Case text snippet...",
"similarity": 0.92,
"outcome": "win",
"original_outcome": "REVERSED",
"full_text": "Full case text...",
"court": "scotus",
"date_filed": "2020-01-15",
"docket_id": "12345"
}
]
}Generate legal brief based on case facts and precedents.
Request Body:
{
"facts": ["Fact 1", "Fact 2"],
"similar_cases": [...], // Optional
"nature_of_suit": "criminal", // Optional
"legal_judgment": "...", // Optional
"improvement_instructions": "...", // Optional (for regeneration)
"existing_brief": "..." // Optional (for regeneration)
}Response:
{
"brief": "Legal brief text in markdown format...",
"case_citations": ["Case Name (REVERSED)", "Another Case (GRANTED)"]
}Answer questions about the system using RAG.
Request Body:
{
"question": "How does the prediction model work?"
}Response:
{
"answer": "The prediction model uses LegalBERT embeddings...",
"retrieved_docs": [...]
}Health check endpoint.
Response:
{
"status": "healthy"
}capstone1/
βββ backend/
β βββ data/ # Raw data files
β β βββ opinions_checkpoint.csv
β β βββ courtlistener_dockets_partial.csv
β βββ models/ # Trained models (generated)
β β βββ model.pkl # MLP Classifier
β β βββ label_encoder.pkl # Label encoder
β β βββ embeddings.npy # Precomputed embeddings
β β βββ clean_dataset.csv # Cleaned dataset
β βββ notebooks/
β β βββ train_model.ipynb # Model training notebook
β βββ routers/ # API route handlers
β β βββ predict.py # Prediction endpoint
β β βββ similar.py # Similar cases endpoint
β β βββ rag.py # RAG endpoint
β β βββ brief.py # Legal brief generation
β β βββ schemas.py # Pydantic models
β βββ utils/ # Utility modules
β β βββ embedding.py # LegalBERT embedding utilities
β β βββ model_loader.py # Model loading & prediction
β β βββ fact_extraction.py # GPT-based fact extraction
β β βββ legal_judgment.py # Legal judgment language conversion
β β βββ explanation.py # GPT-based explanation generation
β β βββ feature_importance.py # Feature importance extraction
β β βββ rag_index.py # RAG document indexing
β βββ rag_docs/ # RAG documentation
β β βββ explanation_guide.md
β β βββ modeling_report.md
β β βββ data_dictionary.md
β β βββ system_limitations.md
β β βββ CHANGELOG.md
β βββ main.py # FastAPI application entry point
β βββ requirements.txt # Python dependencies
β βββ .env # Environment variables (create this)
β
βββ frontend/
βββ app/ # Next.js app directory
β βββ page.tsx # Landing page
β βββ layout.tsx # Root layout
β βββ predict/
β β βββ page.tsx # Prediction form page
β β βββ result/
β β βββ page.tsx # Results display page
β βββ components/ # React components
β βββ Form.tsx # Prediction input form
β βββ ResultCard.tsx # Prediction result display
β βββ PrecedentCard.tsx # Similar case card
β βββ ui/ # ShadCN UI components
β βββ button.tsx
β βββ card.tsx
β βββ dialog.tsx
β βββ ...
βββ lib/ # Utility libraries
β βββ api.ts # API client functions
β βββ utils.ts # Helper functions
βββ styles/
β βββ globals.css # Global styles
βββ package.json # Node.js dependencies
βββ tailwind.config.js # Tailwind configuration
βββ tsconfig.json # TypeScript configuration
βββ next.config.js # Next.js configuration
The model training process is documented in backend/notebooks/train_model.ipynb.
- Data Loading: Merges docket and opinion CSVs
- Text Cleaning:
- Removes outcome-revealing words (AFFIRMED, REVERSED, etc.)
- Tail-scrubbing (removes last 2000 chars of procedural boilerplate)
- Pattern removal
- Label Creation: Binary win/lose labels from appeal outcomes
- Embedding Generation: LegalBERT embeddings (768 dimensions)
- Model Training: Trains and compares:
- Logistic Regression
- Random Forest
- Gradient Boosting
- SVC (RBF)
- MLP Classifier (typically best)
- Model Selection: Automatically selects best model
- Artifact Saving: Saves model, encoder, embeddings, dataset
cd backend
jupyter notebook notebooks/train_model.ipynb
# Run all cellscd backend
source venv/bin/activate # Activate virtual environment
python main.py # Run development serverCode Structure:
- Routers: API endpoint handlers (
routers/) - Utils: Reusable utility functions (
utils/) - Schemas: Pydantic models for validation (
routers/schemas.py)
cd frontend
npm run dev # Development server
npm run build # Production build
npm run start # Production server
npm run lint # Lint codeCode Structure:
- Pages: Next.js pages (
app/) - Components: Reusable React components (
app/components/) - API Client: API communication (
lib/api.ts)
- Backend: Follow PEP 8 (Python style guide)
- Frontend: ESLint + TypeScript strict mode
- Type Hints: Use type hints in Python
- Documentation: Docstrings for all functions
-
Production Server:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Environment Variables: Set
OPENAI_API_KEYin production environment -
Model Files: Ensure
models/directory is accessible
-
Build:
npm run build
-
Environment: Set
NEXT_PUBLIC_API_URLto production API URL -
Deploy: Deploy
out/directory to hosting service (Vercel, Netlify, etc.)
Create Dockerfile for containerized deployment:
# Backend Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]- Not Legal Advice: This system provides statistical predictions, not legal advice
- Appeal Cases Only: Predictions are for appealed cases, not trial cases
- Educational Purpose: For research and educational purposes only
- No Guarantees: Predictions are estimates based on historical data
- Data Quality: Performance depends on training data quality
- Binary Classification: Oversimplifies complex legal outcomes
- Text Cleaning: May remove important context
- Static Model: Doesn't adapt to new legal concepts automatically
- GPT Dependencies: Requires OpenAI API key for full functionality
See backend/rag_docs/system_limitations.md for detailed limitations.
This is a capstone project. For questions, issues, or contributions:
- Review the codebase structure
- Follow existing code style
- Add tests for new features
- Update documentation
- Submit pull requests with clear descriptions
This project is for educational purposes.
- Project Maintainer: [Your Name]
- Institution: [Your Institution]
- Year: 2024
- LegalBERT: Pre-trained legal language model
- CourtListener: Legal case data
- OpenAI: GPT-4o-mini for explanations and brief generation
- FastAPI: Modern Python web framework
- Next.js: React framework
- ShadCN UI: UI component library
For issues, questions, or feature requests, please open an issue in the repository.
- SETUP.md: Detailed setup and troubleshooting guide
- CONTRIBUTING.md: Contribution guidelines
- Backend Documentation: See
backend/rag_docs/for system documentation
- Never commit
.envfiles - They contain sensitive API keys - Review
.gitignore- Ensure sensitive files are excluded - Use environment variables - Store secrets in environment, not code
This repository may contain large model files (.pkl, .npy) and data files (.csv).
Options:
- Use Git LFS (recommended): Configured via
.gitattributes - Exclude from Git: Uncomment relevant lines in
.gitignore - Host separately: Store models/data in cloud storage
To use Git LFS:
git lfs install
git lfs track "*.pkl"
git lfs track "*.npy"
git lfs track "*.csv"Built with β€οΈ for legal research and education