Magpie

A static parameter tuning system for storage performance optimization using Deep Reinforcement Learning (DRL).

Environment

Python 3.7
InfluxDB 2.1.1
Telegraf 1.19.2
FileBench 1.5-alpha3

Setup

Telegraf

Telegraf is used to monitor metrics for both server and client side of DFS. You need to install telegraf on each node and configure each one according to instruction in /telegraf.
InfluxDB

Install InfluxDB and fetch metrics from Telegraf.
Actor agent

There is no central configuration management in Lustre and we encountered latency issues to use ssh to apply new configurations. Therefore, a simple web service is running in Lustre server to apply new configurations. You just need to start the server in each Lustre server.
```
cd actor_agent
pip install -r requirements.txt
python actor_agent/server.py
```
FileBench

Install FileBench and distribute workload files to servers which uses your DFS.
```
cd fb_workload && sh sync.sh
```
Magpie

update magpie/config/pro.env according to your environment and install the requirements.
```
conda create magpie
pip install -r requirements
```

Run Magpie

export MAGPIE_ROOT=PATH_TO_MAGPIE_REPO_FOLDER
export PYDANTIC_ENV_FILE=${MAGPIE_ROOT}/magpie/config/pro.env
export WORKLOAD_NAME=videoserver.f
echo "running $WORKLOAD_NAME"
python magpie/tuner/train.py --num-iterations 30 --dfs lustre --enable-observation-normalizer  --experiment-name video_server

Glossary

Name	Description
DFS	Distributed File System

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Magpie

Environment

Setup

Run Magpie

Glossary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Magpie

Environment

Setup

Run Magpie

Glossary