Prototypical implementation of "Karasu" for collective and thus efficient cloud configuration profiling. The whole approach is implemented in Python. Please consider reaching out if you have questions or encounter problems. Artifact DOI: 10.5281/zenodo.6624921
Strieving for understandable code that can be reused and further developed, we use pydantic and python typing whenever feasible.
- PyTorch
1.13.1
, machine learning framework based on the Torch library - BoTorch
0.6.0
, a framework for Bayesian Optimization in PyTorch - pandas
1.3.5
, open source data analysis and manipulation tool - Ax
0.2.3
, a platform for managing and optimizing experiments - Hummingbird
0.4.3
, a library for compiling traditional ML models into tensor computations - scikit-learn
1.0.2
, a Python module for machine learning built on top of SciPy - NumPy
1.21.6
, the primary array programming library for the Python language - SciPy
1.7.3
, an open-source software for mathematics, science, and engineering
These packages and all other required packages are specified in the requirements.txt
and
can thus be conveniently installed via pip3 install --user -r requirements.txt
in case a normal
installation is desired. However, we recommend the containerized approach, as described next.
For the experiments described in the paper, we had access to a machine equipped with a GPU, which helped us to conduct all the various experiments in a shorter period of time. It had the following characteristics:
Resource | Details |
---|---|
CPU | Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz |
vCores | 8 |
Memory | 45 GB RAM |
GPU | 1 x NVIDIA Quadro RTX 5000 (16 GB memory) |
To foster the easy deployment and execution of this prototype, we furthermore provide a Dockerfile
for building a container image.
It can be manually built via docker build -t karasu-container:dev .
Note that for your convenience, this process is already handled internally when using our bash functions.
By default, a container started with this image will simply execute a ping
command. Further below, we describe how it can be used for actual
experiments, i.e. by overriding the default command with specific experiment / evaluation tasks.
We present Karasu, a collective and privacy-aware approach for efficient cloud configuration profiling. It trains lightweight performance models using only high-level information of shared workload profilings and combines them into an ensemble method for better exploiting inherent knowledge of the cloud configuration search space. This way, users are able to collaboratively improve their individual prediction capabilities, while obscuring sensitive information. Furthermore, Karasu enables the optimization of multiple objectives or constraints at the same time, like runtime, cost, and carbon footprint.
In the following, we provide instructions for reproducing our results.
We evaluate our approach on a publicly available dataset consisting of performance data from diverse workloads and their various executions in a cloud environment. Specifically, we use this dataset created in the context of proposed cloud configuration approaches. Among other things, it encompasses data obtained from 18 workloads running 69 configurations (scaleout, VM type) in a multi-node setting (one run per configuration). Workloads were implemented in Hadoop and different Spark versions, realized with various algorithms, and tasked with processing diverse datasets.
For our evaluation, it is required to clone this repository, and copy the folder scout/dataset/osr_multiple_nodes
to data/scout_multiple
in our repository.
The initial processing of this dataset will take a few minutes, depending on the concrete machine used.
It is furthermore required to mount the folder data
as well as the to-be-created folder artifacts
to any container you start.
This is all handled internally by the minimalistic bash functions we provide,
so you can directly proceed with the next steps!
To start with, we emulate a shared performance data repository, which requires appropriate data generation using our baselines. The hereby generated data is used in the subsequent examples for both visualizing the capabilities of individual baselines, and offering Karasu a data source to draw from for its ensemble approach. For single-objective optimization (SOO):
./docker_scripts.sh create_soo_data
Likewise, for multi-objective optimization (MOO):
./docker_scripts.sh create_moo_data
In our experiments, each of the executed scripts ran for approx. 10 hours.
The generated data is saved to the artifacts
directory and used within the next steps
where we investigate our research questions (RQs).
What is the general potential of exploiting existing models to boost a target one? We evaluate a scenario where support models are available that originate from the same workload, yet were initialized differently and trained with other runtime targets. To run the experiments and generate the data for analysis, run:
./docker_scripts.sh run_rq1_experiment
In our experiments, the script ran for approx. 2 days.
How good does the introduced approach work in a collaborative scenario, with potentially diverse workloads and limited available data? We evaluate a scenario where all the data in the repository originates from different workloads, with individual characteristics, resource needs, and constraints. To run the experiments and generate the data for analysis, run:
./docker_scripts.sh run_rq2_experiment
In our experiments, the script ran for approx. 2 days.
To evaluate the scenario of heterogeneous data that we discussed and reported in the paper, run:
./docker_scripts.sh run_rq2_hetero_experiment
To evaluate Karasu in an MOO setting, we consider two objectives, namely cost and energy consumption, to be minimized under formulated runtime constraints. To run the experiments and generate the data for analysis, run:
./docker_scripts.sh run_rq3_experiment
In our experiments, the script ran for more than 1 day.
With the generated data in place, one can analyze the results and produce insightful plots (as in our paper). The plots can be created simply by running:
./docker_scripts.sh analysis
Note that running the analysis requires the completion of all aforementioned steps.
In our experiments, we had access to a rather modern machine equipped with a GPU. The experiment execution still required some time (see sections above). Now, with the execution taking place in a docker container, possibly on less sophisticated hardware, the indicated execution times might be prolonged.
Note that it is possible to abort and resume the execution of specific experiments (Emulation, RQ1, RQ2, RQ3) since we inspect on every container restart the already written experiment data and thus skip the associated configurations to prevent data duplication.
No time for data generation on your own? Consider extracting the artifacts.tar.gz
to directly reuse our generated experiment data.
Please get in touch, we are happy to help!
@inproceedings{scheinert2023karasu,
title={Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics},
author={Dominik Scheinert and Philipp Wiesner and Thorsten Wittkopp and Lauritz Thamsen and Jonathan Will and Odej Kao},
booktitle={{IEEE} International Performance, Computing, and Communications Conference, {IPCCC} 2023, Anaheim, CA, USA, November 17-19, 2023},
year={2023}
}
You can also find a preprint here.