A comprehensive performance analysis dashboard for MLPerf Inference benchmark results.
- Compare MLPerf v5.0, v5.1 submissions
- Interactive bar charts for performance comparison across systems
- Support for multiple models: DeepSeek-R1, Llama 3.1 8B, Llama 2 70B, and more
- Filter by organizations, accelerators, scenarios (Offline/Server)
- Per-GPU and per-8-GPU-node normalization options
- Performance benefit calculation vs. global baseline
- Baseline system information displayed for each chart
- Handles systems with varying accelerator counts
- Lightweight CSV-based dataset summaries
- Token length distribution histograms with statistics
- Visual representation of input/output token patterns
- Median and max value annotations
- Performance degradation analysis between scenarios
- Side-by-side metric comparison
- Detailed per-system breakdown
- Track system performance evolution across MLPerf versions
- Automatic identification of multi-version systems
mlperf-dashboard/
├── app.py # Main application entry point
├── mlperf_datacenter.py # MLPerf dashboard module
├── dashboard_styles.py # CSS styling
├── requirements.txt # Python dependencies
├── pyproject.toml # Project metadata
├── Makefile # Development commands
├── mlperf-data/ # MLPerf data files
│ ├── mlperf-5.1.csv # MLPerf v5.1 submission data
│ ├── mlperf-5.0.csv # MLPerf v5.0 submission data
│ ├── summaries/ # Dataset summaries (version controlled)
│ │ ├── README.md
│ │ ├── deepseek-r1.csv
│ │ ├── llama3-1-8b-datacenter.csv
│ │ └── llama2-70b-99.csv
│ └── original/ # Original datasets (NOT version controlled)
│ ├── README.md
│ └── generate_dataset_summaries.py
└── tests/ # Test suite
├── conftest.py
├── test_mlperf_datacenter.py
└── README.md
-
Clone the repository:
git clone https://github.com/Harshith-umesh/mlperf-dashboard.git cd mlperf-dashboard -
Set up Python environment:
python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip3 install -r requirements.txt
-
Run the dashboard:
streamlit run app.py
-
Access: Open http://localhost:8501 in your browser
For a complete development environment with linting, formatting, and code quality tools:
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit installAvailable development commands:
make format- Auto-format code (Black, Ruff)make lint- Run linting checksmake type-check- Run static type checkingmake test- Run tests with coveragemake ci-local- Run all CI checks locallymake clean- Clean temporary files
The dashboard includes MLPerf submission data:
mlperf-data/mlperf-5.1.csv- v5.1 submissionsmlperf-data/mlperf-5.0.csv- v5.0 submissions
These files are version controlled.
Lightweight CSV summaries (40-180 KB vs 1-20 MB originals):
mlperf-data/summaries/deepseek-r1.csvmlperf-data/summaries/llama3-1-8b-datacenter.csvmlperf-data/summaries/llama2-70b-99.csv
Original datasets are stored in mlperf-data/original/ (NOT version controlled).
To download datasets:
Visit MLCommons Inference Benchmark Data Download
Example:
cd mlperf-data/original/
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uriTo generate summaries:
cd /path/to/mlperf-dashboard
python mlperf-data/original/generate_dataset_summaries.pySee mlperf-data/original/README.md for detailed instructions.
Run all tests:
pytest tests/Run with coverage:
pytest tests/ --cov=. --cov-report=htmlQuick test:
make testSTREAMLIT_SERVER_HEADLESS=true- Headless mode for productionSTREAMLIT_SERVER_PORT=8501- Server portSTREAMLIT_SERVER_ADDRESS=0.0.0.0- Listen address
- CSV files must include columns for model, scenario, organization, accelerator, and metrics
- Dataset summaries require
input_lengthandoutput_lengthcolumns
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Set up development environment:
pip install -e ".[dev]" - Install pre-commit hooks:
pre-commit install - Make changes and test:
pytest tests/ - Run code quality checks:
make ci-local - Submit a pull request
- Performance: Samples/s, Tokens/s, Queries/s
- Normalization: Per-GPU, Per-8-GPU-Node
- Scenarios: Offline (batch), Server (online)
- Systems: Multi-vendor, multi-accelerator comparison
- Dataset Statistics: Token length distributions
Apache-2.0 License
Note: This dashboard displays MLPerf Inference benchmark results for analysis and comparison purposes.