git clone --recurse-submodules https://github.com/casys-kaist/LLMServingSim.git
cd LLMServingSim
Conda can be downloaded from the following link.
curl -O https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
bash Anaconda3-2024.06-1-Linux-x86_64.sh
conda env create -p ./env -f ./environment.yml
conda activate ./env
conda create -n env_name python=3.9
conda activate env_name
conda install conda-forge::libprotobuf=3.6.1
conda install conda-forge::cmake=3.15
conda install cctbx202208::boost-cpp=1.74.0
pip install -r requirements.txt
Common issues while building ASTRA-Sim. If error regarding version of protoc
happens see here.
cd astra-sim
./build/astra_analytical/build.sh
cd extern/graph_frontend/chakra
pip install .
cd ../../../../execution_engine/polymath
pip install .
cd ../..
Config & Dataset Path:
- Network config path:
astra-sim/inputs/network/analytical/{config_name}.json
- NPU config path:
execution_engine/codelets_src/codelets/examples/genesys/configs/{config_name}.json
- Dataset path:
astra-sim/dataset/{dataset_name}.tsv
Test Run
python3 -u main.py --model_name 'gpt3-6.7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'
python3 -u main.py --model_name 'llama-7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'
Parameters | Supporting Options | Default Value | Notes |
---|---|---|---|
model_name | 'gpt2', 'gpt3-6.7b', 'gpt3-125m', 'gpt3-350m', 'gpt3-760m', 'gpt3-1.3bm', 'gpt3-2.7b', 'gpt3-6.7b', 'gpt3-13b', 'gpt3-30b', 'gpt3-175b', 'opt-125m', 'opt-350m', 'opt-1.3b', 'opt-2.7b', 'opt-2.7b', 'opt-6.7b', 'opt-13b', 'opt-30b', 'opt-66b', 'opt-175b', 'llama-7b', 'llama-30b', 'llama-70b' | 'gpt2' | |
npu_num | Integer | 16 | |
max_batch | Integer | 0 | 0: no limit |
batch_delay | Integer | 0 | |
scheduling | 'none', 'orca' | 'orca' | |
parallel | 'pipeline', 'tensor', 'hybrid' | 'hybrid' | |
npu_group | Integer | 1 | |
npu_mem | Integer | 40 | |
kv_manage | 'max', 'pow2', 'oracle', 'vllm' | 'vllm' | |
block_size | Integer | 8 | |
pim_type | 'none', 'local', 'pool' | 'none' | |
sub_batch | Flag | False | Sub-batch Scheduling On/Off |
dataset | Dataset Path | None | None: manually add requests in main.py |
network | JSON File Name | None | None: following convention "fully_connected_{network_dim}d_{number_of_NPUs}d.json" |
output | Output TSV Path | None | None: no tsv output only stdout |
gen | Flag | False | Skip initiation phase On/Off |
fast_run | Flag | False | Skip all compilation and force to use cached trace for fast simulation |
In all outputs, the unit of throughput is tokens/second
, and the unit of simulation time is milliseconds
.
The standard output shows which requests are being processed in each iteration of the simulator and displays the measured throughput at regular intervals. Additionally, it provides a summary of throughput and simulation time at the end.
{output_filename}-throughput.tsv
contains the prompt and generation throughput at each interval.
{output_filename}-simulation-time.tsv
contains the simulation time of each components.
cd evaluation
./evaluation1.sh
./evaluation2.sh
...
./evaluation5.sh
./evaluation_all.sh
For detailed information about the evaluation, please refer to the README
file in the evaluation
folder.
If your error is similar to this, you can use the below solution.
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
17 | #error This file was generated by an older version of protoc which is
| ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
18 | #error incompatible with your Protocol Buffer headers. Please
| ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
19 | #error regenerate this file with a newer version of protoc.
| ^~~~~
This method explicitly sets the conda environment for CMake to use.
-
Activate the Conda Environment: First, activate the desired conda environment.
conda activate your_env_name
-
Set the CMAKE_PREFIX_PATH Environment Variable: Add the path of the activated conda environment to the
CMAKE_PREFIX_PATH
environment variable.export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH
-
Activate the Conda Environment: First, activate the conda environment you want to modify.
conda activate your_env_name
-
Navigate to the Environment's Activation Script Directory: The activation scripts are located in the
etc/conda/activate.d
directory within your conda environment. If this directory does not exist, create it along with the deactivation directory.mkdir -p $CONDA_PREFIX/etc/conda/activate.d mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
-
Create and Edit the Activation Script: Create a script named
set_cmake_prefix.sh
to set theCMAKE_PREFIX_PATH
when the environment is activated.nano $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.sh
Add the following content to this file:
#!/bin/bash export OLD_CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH
-
Create and Edit the Deactivation Script: Create a script named
unset_cmake_prefix.sh
to reset theCMAKE_PREFIX_PATH
when the environment is deactivated.nano $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.sh
Add the following content to this file:
#!/bin/bash export CMAKE_PREFIX_PATH=$OLD_CMAKE_PREFIX_PATH unset OLD_CMAKE_PREFIX_PATH
-
Set Script Permissions: Ensure the scripts are executable.
chmod +x $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.sh chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.sh