Skip to content

Commit

Permalink
docs: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dgcnz committed Oct 31, 2024
1 parent 88a904f commit 218f8d8
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 2 deletions.
3 changes: 1 addition & 2 deletions cpp/src/benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ void benchmark(std::string model_name, int n_warmup = 5, int n_iter = 5)
float mean = std::accumulate(durations.begin(), durations.end(), 0.0) / durations.size();
float sq_sum = std::inner_product(durations.begin(), durations.end(), durations.begin(), 0.0);
float stdev = std::sqrt(sq_sum / durations.size() - mean * mean);
std::cout << "mean: " << mean << " ms" << std::endl;
std::cout << "std: " << stdev << " ms" << std::endl;
std::cout << "Average inference time: " << mean << " ± " << stdev << " ms" << std::endl;
}

int main(int argc, char *argv[])
Expand Down
1 change: 1 addition & 0 deletions docs/src/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ parts:
- caption: Optimization
chapters:
- file: part3/compilation
- file: part3/results
- caption: Other
chapters:
- file: bibliography
88 changes: 88 additions & 0 deletions docs/src/part3/results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Results

## Running the benchmarks

Before running the benchmarks make sure you have compiled your desired model.

```bash
python -m scripts.export_tensorrt --config-name dinov2 amp_dtype=fp32 trt.enabled_precisions="[fp32, bf16, fp16]"
# ...
# OUTPUT DIR: outputs/2024-10-31/10-43-31
```

The outputs of this script will be found in the directory specified by `OUTPUT DIR`. The directory will contain the following files:

```
├── export_tensorrt.log # log file
├── .hydra
│ ├── config.yaml # config file
│ ├── hydra.yaml
│ └── overrides.yaml
├── model.ts # compiled torchscript model
└── predictions.png # sample predictions for the model
```

There are three possible runtimes to benchmark, examples of how to run the benchmarks are shown below:

**Python Runtime, no TensorRT**
```bash
python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=False amp_dtype=fp16
```

**Python Runtime with TensorRT**
```bash
python -m scripts.benchmark_gpu compile_run_path=outputs/2024-10-31/10-43-31 n_iter=100 load_ts=True
```

**C++ Runtime with TensorRT**
```bash
./build/benchmark --model outputs/2024-10-31/10-43-31/model.ts --n_iter=100
```

## Results

**Python Runtime, no TensorRT**

| model's precision | amp_dtype | latency |
| ----------------- | ---------------------- | -------------- |
| fp32 | fp32+fp16 | 66.322 ± 0.927 |
| fp32 | fp32+bf16 | 66.497 ± 1.052 |
| fp32 | fp32 | 76.275 ± 0.587 |

**Python Runtime, with TensorRT**

| model's precision | trt.enabled_precisions | latency |
| ----------------- | ---------------------- | -------------- |
| fp32+fp16 | fp32+bf16+fp16 | 15.369 ± 0.023 |
| fp32 | fp32+bf16+fp16 | 23.164 ± 0.031 |
| fp32 | fp32+bf16 | 25.148 ± 0.030 |
| fp32 | fp32 | 38.381 ± 0.022 |

**C++ Runtime, no TensorRT**

| model's precision | trt.enabled_precisions | latency |
| ----------------- | ---------------------- | -------------- |
| fp32+fp16 | fp32+bf16+fp16 | 15.433 ± 0.029 |
| fp32 | fp32+bf16+fp16 | 23.263 ± 0.027 |
| fp32 | fp32+bf16 | 25.255 ± 0.014 |
| fp32 | fp32 | 38.465 ± 0.029 |




Note: For some reason in the latest version of torch_tensorrt, `bfloat16` precision is not working well and it's not achieving the previously measured performance of (13-14ms) and/or failing compilation.

We include the previous results for completeness:

| Runtime | model's precision | Enabled Precisions | Latency | Memory (MB) |
| ------- | ----------------- | ------------------ | ------- | ----------- |
| cpp+trt | fp32 | fp32+fp16 | 13.984 | 500 |
| cpp+trt | fp32 | fp32+bf16+fp16 | 13.898 | 500 |
| cpp+trt | fp32 | fp32+bf16 | 17.261 | 500 |
| cpp+trt | bf16 | fp32+bf16 | 22.913 | 500 |
| cpp+trt | bf16 | bf16 | 22.938 | 500 |
| cpp+trt | fp32 | fp32 | 37.639 | 770 |




0 comments on commit 218f8d8

Please sign in to comment.