Artifact for Baleen (FAST 2024)

Baleen: ML Admission & Prefetching for Flash Caches

Paper (Preprint) | Code | Data | Video walkthrough | Reproduce on Chameleon

This repository is targeted at those seeking to reproduce the results found in the Baleen paper and contains a frozen copy of the code. If you are looking to use Baleen, please go to https://github.com/wonglkd/BCacheSim/ for the latest version.

Scope: this repository contains Python code to reproduce the simulator results in the Baleen paper. The testbed code modified a proprietary internal version of CacheLib and will not be released at this time, pending a rebase on the open-source version of CacheLib. Another key difference is that Meta's exact constants for the disk head time function will not be released, meaning that results will not be exactly the same; instead, we use constants (seek time and bandwidth) measured on the hard disks in our university testbed.

Nomenclature: Some terms were renamed after coding for better clarity in the paper. However, they mean the same thing.

Service Time (in the code) was renamed to Disk Head Time (in the paper)
Chunks (in the code) are called segments (in the paper)

Walkthrough Video

We have verified that our instructions work on Chameleon, and have recorded a video showing the setup of the environment and the reproduction of the instructions below (YouTube: http://tiny.cc/BaleenArtifactYT). This video shows the setup on Chameleon, the running of the instructions below and the running of all notebooks successfully run with no error cells.

Getting Started

Time estimate: 60 mins (20 mins interactive).

Installation (Chameleon Trovi)

Time estimate: 30 minutes (10 mins interactive).

The recommended way is to use Chameleon Trovi, an academic cloud. Note that you will require an allocation; if you are affiliated with FAST, you can request to be added to the associated project (CHI-231080). To do this (and for any other issues with Chameleon), please contact the helpdesk at help@chameleoncloud.org.

Launch artifact on Trovi
(Optional) Open notebook chameleon/1-getting-started.ipynb which will walk you through the Getting Started section of this README. You may run one cell at a time, or click Run -> Run All Cells to execute all commands. If processes get killed, you need a dedicated server.
(Recommended) The shared JupyterHub has limited RAM/disk. Run notebook chameleon/2-start-dedicated-server.ipynb, which provisions a beefier node (for 7 days) that you can create a SSH tunnel to.

Installation (local computer)

Alternatively, you may do a manual install. These commands are also available in getting-started.sh for your convenience.

Clone the repository (if not already done)

git clone --recurse-submodules https://github.com/wonglkd/Baleen-FAST24.git
cd Baleen-FAST24

Note: this repository uses submodules. As a reminder, when you pull, you'll likely want to use git pull --recurse-submodules.

Install Python dependencies with Conda/Mamba/Micromamba or pip. (We developed with Micromamba 1.4.1.)

conda env create -f BCacheSim/install/env_cachelib-py-3.11.yaml
conda activate cachelib-py-3.11
# PyPy is optional (for faster non-ML runs)
# conda env create -f BCacheSim/install/env_cachelib-pypy-3.8.yaml

Alternatively, use pip:

python3 -m pip install --user -r BCacheSim/install/requirements.txt

Download trace files (see here for more details on the traces)

cd data
bash get-tectonic.sh

Do a simple experiment

Time estimate: 30 minutes (10 mins interactive).

Manually run the simulator with the baseline RejectX. (4 mins)

./BCacheSim/run_py.sh py -B -m BCacheSim.cachesim.simulate_ap --config runs/example/rejectx/config.json

Manually train Baleen's ML models (25 secs) and run the simulator with Baleen (~30 mins).

./BCacheSim/run_py.sh py -B -m BCacheSim.episodic_analysis.train --exp example --policy PolicyUtilityServiceTimeSize2 --region Region1 --sample-ratio 0.1 --sample-start 0 --trace-group 201910 --supplied-ea physical --target-wrs 34 50 100 75 20 10 60 90 30 --target-csizes 366.475 --output-base-dir runs/example/baleen --eviction-age 5892.856 --rl-init-kwargs filter_=prefetch --train-target-wr 35.599 --train-models admit prefetch --train-split-secs-start 0 --train-split-secs-end 86400 --ap-acc-cutoff 15 --ap-feat-subset meta+block+chunk
./BCacheSim/run_py.sh py -B -m BCacheSim.cachesim.simulate_ap --config runs/example/baleen/prefetch_ml-on-partial-hit/config.json

Use notebooks/example/example.ipynb to view and plot results.

Detailed Instructions

This section assumes you have completed the 'Getting Started' section and have installed the code and downloaded the traces.

As it requires too much computation time to rerun every single experiment, we suggest the following steps to maximize the use of reviewers' time in evaluating our paper. We supply our traces, code, and the intermediate results from our experimental runs.

Roadmap for evaluation:

Test out Baleen's ML training & simulator (in Getting Started).
- What: simulate RejectX baseline, train Baleen models, simulate Baleen
- Expected results: notebooks/example/example.ipynb
Plot graphs using our intermediate results.
- Example: notebooks/paper-figs/fig-01bc,17-202309.ipynb
Select additional simulations to run if desired.
- See notebooks/reproduce/, in particular commands.ipynb and reproduce_commands.sh

Directory structure

data: traces that are used as input
runs: where experiment results are stored
tmp: temporary directory for ML models, generated episode files
notebooks: Jupyter notebooks for experiments
notebooks/figs: Output directory for figures

Additional notes

624 machine-days were used for the final runs to generate the results used in the paper. Each simulation of a ML policy takes at least 30 minutes, multiplied by 7 traces and 10 samples each.

Future research

notebooks/reproduce/exps-cluster-sample.ipynb will be useful to allow you to run experiments efficiently, but with more dependencies required (brooce, redis).

Troubleshooting

If you face any issues, please try the following things:

Making sure you have the latest version of the repository

git pull --recurse-submodules

Making sure you have the latest copy of the data.

cd data
bash clean.sh
bash get-tectonic.sh

If you need to get an allocation on Chameleon or face any difficulties with the platform itself, please contact their helpdesk.

Any questions?

Please raise a GitHub issue. Support is best effort; you may also email me (contact details at https://wonglkd.fi-de.net).

Reference

Baleen: ML Admission & Prefetching for Flash Caches
Daniel Lin-Kit Wong, Hao Wu, Carson Molder, Sathya Gunasekar, Jimmy Lu, Snehal Khandkar, Abhinav Sharma, Daniel S. Berger, Nathan Beckmann, Gregory R. Ganger
USENIX FAST 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Artifact for Baleen (FAST 2024)

Walkthrough Video

Getting Started

Installation (Chameleon Trovi)

Installation (local computer)

Do a simple experiment

Detailed Instructions

Directory structure

Additional notes

Future research

Troubleshooting

Any questions?

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Artifact for Baleen (FAST 2024)

Walkthrough Video

Getting Started

Installation (Chameleon Trovi)

Installation (local computer)

Do a simple experiment

Detailed Instructions

Directory structure

Additional notes

Future research

Troubleshooting

Any questions?

Reference