Skip to content

JackSwitzer/QuantizationInterpretability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quantization x Interpretability

Research project investigating how model quantization affects interpretability tools, specifically Sparse Autoencoders (SAEs).

Key Findings

  1. SAEs Transfer Across Precisions: BF16-trained SAEs achieve 99% sample correlation when applied to INT4 activations
  2. Degradation Has Structure: Code generation degrades 50% at INT4, while knowledge retrieval remains stable
  3. Smaller SAEs Transfer Better: 0.5x hidden dimension SAEs transfer 2.3x better than 8x SAEs

Project Structure

.
├── scripts/           # Python experiment code
├── data/              # Experiment results (JSON)
├── figures/           # Generated visualizations
├── research_summary.html  # Interactive results summary
├── RESEARCH_FINDINGS.md   # Detailed findings
└── METHODOLOGY.md         # Experimental methodology

Models Tested

  • Qwen3-Coder-30B-A3B (MoE architecture)
  • StarCoder2-15B (Dense architecture)

Precisions

  • BF16 (baseline)
  • FP16
  • INT8
  • INT4 (NF4 quantization)

Metrics

  • Procrustes Alignment: 85-89% across architectures
  • Sample Correlation: 99%
  • Top-10 Feature Agreement: 89%
  • Feature Correlation: 95%

Setup

pip install torch transformers bitsandbytes scipy numpy

Usage

See scripts/ for experiment code. Main entry points:

  • overnight_production_v2.py - Full experiment pipeline
  • semantic_transfer.py - SAE transfer analysis
  • benchmark_eval.py - Benchmark evaluation

View Results

Open research_summary.html in a browser to see the interactive results summary.

Context

This research was conducted as part of the Anthropic Fellows Program application. The goal is to understand whether interpretability tools trained on full-precision models remain valid when those models are quantized for production deployment.

Author

Jack Switzer - January 2026

About

Research on how model quantization affects interpretability tools (SAEs). Anthropic Fellows Program project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors