Scarab-infra

scarab-infra is a set of tools that automate the execution of Scarab simulations. It utilizes Docker and Slurm to effectively simulate applications according to the SimPoint methodology. Furthermore, scarab-infra provides tools to analyze generated simulation statistics and to obtain simpoints and execution traces from binary applications.

Quickstart (Most Common Flow)

Bootstrap the environment
```
./sci --init
```

This installs Docker when possible, configures socket permissions, installs Miniconda if needed, creates/updates the scarabinfra conda environment, validates activation, ensures you have an SSH key, and optionally fetches SimPoint traces, Slurm, ghcr.io credentials, and verifies AI CLI auth status (Codex/Gemini/Claude).

Prepare (or update) your descriptor
```
cp json/exp.json json/<descriptor>.json
# edit json/<descriptor>.json to match your workloads, configs, and paths
```
Each descriptor specifies root_dir, scarab_path, workloads, and build mode. Adjust these fields before building or running. When you modify Scarab source codes that matter to this descriptor, make sure that you are modifying the right Scarab repo residing in scarab_path.
Build Scarab for your descriptor
```
./sci --build-scarab <descriptor>
```
Provide the JSON filename (without extension) from json/. The build runs inside the correct Docker image and respects the scarab_build mode in the descriptor (defaults to opt).
Run simulations
```
./sci --sim <descriptor>
```

Launches the simulations defined in json/<descriptor>.json. Scarab runs in parallel across simpoints and reports status/logs under <root_dir>/simulations/<descriptor>/ (with root_dir taken from the descriptor).

You only need additional steps if you want to inspect workloads, collect traces, or manage jobs manually. The sections below cover those workflows in more detail.

Additional Workflows

Monitor and clean up

Check status (queued/running jobs, logs, and errors):
```
./sci --status <descriptor>
```
Kill active simulations:
```
./sci --kill <descriptor>
```
Remove containers and temporary state:
```
./sci --clean <descriptor>
```

Visualize collected stats

./sci --visualize <descriptor>

Generates bar charts (value and speedup) for each counter listed in visualize.counters and saves them next to collected_stats.csv under <root_dir>/simulations/<descriptor>/.

Use the descriptor structure:

"visualize": {
  "baseline": "baseline",
  "counters": ["IPC"]
}

Each entry in visualize.counters can be either:

a single counter name (e.g. "IPC") to produce the existing bar and speedup plots, or
a list of multiple counters (e.g. ["BTB_OFF_PATH_MISS_count", "BTB_OFF_PATH_HIT_count"]) which will emit a stacked plot (*_stacked.png) combining those counters across workloads/configs.

For additional control you may instead supply objects such as:

{
  "type": "stacked",
  "name": "btb_miss_hit",
  "title": "BTB Miss/Hit Breakdown",
  "y_label": "Events",
  "stats": ["BTB_OFF_PATH_MISS_count", "BTB_OFF_PATH_HIT_count"]
}

The name (optional) governs the output filename stem, while title and y_label adjust plot annotations.

Set visualize.baseline to force the speedup plots to use a specific configuration as their reference (defaults to the first configuration present in the stats file).

Analyze performance drift

./sci --perf-analyze <descriptor>

Diffs collected stats against a baseline configuration, writes a deterministic drift report, and optionally invokes an analyzer CLI (for example Codex, Gemini, or Claude) for root-cause hypotheses.

Use:

"perf_analyze": {
  "baseline": "baseline",
  "counters": ["IPC", "ICACHE_MISS", "BRANCH_MISPRED"],
  "stat_groups": ["bp", "fetch", "core"],
  "compare_all_stats": false,
  "drift_top_workloads": 5,
  "drift_top_simpoints": 5,
  "prompt_budget_tokens": 12000,
  "threshold_pct": 2.0,
  "analyzer_cli_cmd": "codex"
}

Notes:

perf_analyze.counters[0] is used as the trigger counter for drift detection.
stat_groups optionally restricts compared stats to selected Scarab groups: bp, core, fetch, inst, l2l1pref, memory, power, pref, stream.
Set compare_all_stats: true to compare every stat present in collected_stats.csv; counters[0] remains the drift trigger.
drift_top_workloads controls how many highest-drift workloads (by trigger counter abs delta) are expanded in the report/prompt.
drift_top_simpoints controls how many highest-impact simpoints per selected workload are expanded.
prompt_budget_tokens limits prompt size (approximate token budgeting) before invoking the analyzer CLI.
threshold_pct is an absolute percent-delta threshold.
analyzer_cli_cmd supports {prompt_file}, {summary_file}, and {report_file} placeholders. If {prompt_file} is omitted, the prompt path is appended as the last argument.
Example commands: codex (auto-converted to non-interactive codex exec -), codex exec -, gemini -p "@{prompt_file}", claude (auto-converted to non-interactive claude -p and prompt content over stdin).
During ./sci --init, Codex/Gemini/Claude checks are non-interactive: init only verifies whether each CLI is installed and logged in, then prints manual login instructions when needed.
Account-login commands:
- Codex: codex login then codex login status
- Gemini: gemini, then /auth, then gemini auth status
- Claude: claude login (or claude then /login), then claude auth status
If compared configurations use different Scarab binary hashes, --perf-analyze runs git diff in scarab_path and includes changed files/commit summaries in the report and AI prompt.
Outputs are written beside collected_stats.csv: perf_diff_summary.json, perf_drift_report.md, perf_drift_prompt.md, and optionally perf_ai_report.md.

List workloads and simulation modes

./sci --list

Shows the workload group hierarchy and the docker image each mode uses.

Debug Scarab inside the Docker container

Build Scarab with the debug mode

Make sure to edit your json/.json to have scarab_build's value dbg, and rebuild it with ./sci --build-scarab <descriptor>

./sci --interactive <descriptor>

Then, go to the simulation directory under ~/simulations inside the container where it is mounted to the descriptor’s root_dir, for example

cd ~/simulations/<exp_name>/baseline/<workload>/<simpoint>

Create a debug directory and copy the original PARAMS.out file as a new PARAMS.in, then cut the lines following after --- Cut out everything below to use this file as PARAMS.in ---

mkdir debug && cd debug
cp ../PARAMS.out ./PARAMS.in

Now, you can attach gdb with the same scarab parameters where you want to debug.

gdb /scarab/src/scarab

Cached images and containers are handled automatically by the commands above; use ./sci --clean <descriptor> when you want to force a reset.

Collect traces instead of running simulations

./sci --trace your_trace_descriptor

Uses json/<descriptor>.json with descriptor_type: "trace" to launch the trace pipeline (see docs/README.trace.md for details).

Run a perf container

./sci --perf perf

Uses json/perf.json or another json/<descriptor>.json with descriptor_type: "perf" to open the interactive perf container described in the descriptor (see docs/README.perf.md).

Docker Images

./sci --build-scarab <descriptor> or ./sci --sim <descriptor> automatically pulls or rebuilds the docker image it needs, but these commands are handy when you want to inspect or pre-stage images manually.

Download a pre-built image for the current commit

export GIT_HASH=$(git rev-parse --short HEAD)
docker pull ghcr.io/litz-lab/scarab-infra/allbench_traces:$GIT_HASH
docker tag ghcr.io/litz-lab/scarab-infra/allbench_traces:$GIT_HASH allbench_traces:$GIT_HASH

Build or retag a workload image yourself

./sci --build-image <workload_group>

Manual alternative:

export GIT_HASH=$(git rev-parse --short HEAD)
docker build . -f ./workloads/<workload_group>/Dockerfile --no-cache -t <workload_group>:$GIT_HASH

Publications

@inproceedings{oh2024udp,
  author = {Oh, Surim and Xu, Mingsheng and Khan, Tanvir Ahmed and Kasikci, Baris and Litz, Heiner},
  title = {UDP: Utility-Driven Fetch Directed Instruction Prefetching},
  booktitle = {Proceedings of the 51st International Symposium on Computer Architecture (ISCA)},
  series = {ISCA 2024},
  year = {2024},
  month = jun,
}

Requirements

All of these checks are automated by ./sci --init; follow them manually only if you need to diagnose issues locally.

Install Docker (docs).
Configure the Docker socket for non-root use (ref):
```
sudo chmod 666 /var/run/docker.sock
```
Install Miniconda (or Anaconda) so you have a writable Conda installation (see the Miniconda docs). ./sci --init installs Miniconda to ~/miniconda3 if none is available.
Create or update the scarabinfra Conda environment from quickstart_env.yaml:
```
conda env create --file quickstart_env.yaml
```
The helper keeps this environment in sync (including gdown and other pip dependencies).
Activate or validate the environment as needed:
```
conda activate scarabinfra
```
Add an SSH key for the machine running Docker to your GitHub account (guide).

Place SimPoint traces under $trace_home (defaults to ~/traces). A pre-packaged archive is available:

cd ~/traces
gdown https://drive.google.com/uc?id=1tfKL7wYK1mUqpCH8yPaPVvxk2UIAJrOX
tar -xzvf simpoint_traces.tar.gz

Optional: Install Slurm if you plan to run simulations on a Slurm cluster.
Optional: Log in to ghcr.io so you can pull prebuilt images (requires a token with read:packages):
```
echo <YOUR_GITHUB_TOKEN> | docker login ghcr.io -u <YOUR_GITHUB_USERNAME> --password-stdin
```

Name		Name	Last commit message	Last commit date
Latest commit History 469 Commits
.github/workflows		.github/workflows
common		common
docs		docs
fingerprint_src		fingerprint_src
json		json
scarab_stats		scarab_stats
scripts		scripts
workloads		workloads
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
last_built_tag.txt		last_built_tag.txt
quickstart_env.yaml		quickstart_env.yaml
requirements.txt		requirements.txt
sci		sci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scarab-infra

Quickstart (Most Common Flow)

Additional Workflows

Monitor and clean up

Visualize collected stats

Analyze performance drift

List workloads and simulation modes

Debug Scarab inside the Docker container

Build Scarab with the debug mode

Collect traces instead of running simulations

Run a perf container

Docker Images

Download a pre-built image for the current commit

Build or retag a workload image yourself

Publications

Requirements

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scarab-infra

Quickstart (Most Common Flow)

Additional Workflows

Monitor and clean up

Visualize collected stats

Analyze performance drift

List workloads and simulation modes

Debug Scarab inside the Docker container

Build Scarab with the debug mode

Collect traces instead of running simulations

Run a perf container

Docker Images

Download a pre-built image for the current commit

Build or retag a workload image yourself

Publications

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages