Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Triton 24.04 #1738

Open
wants to merge 5 commits into
base: branch-24.06
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ services:
triton:
container_name: morpheus-triton
runtime: nvidia
image: nvcr.io/nvidia/tritonserver:23.06-py3
image: nvcr.io/nvidia/tritonserver:24.04-py3
command: tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false ${TRITON_MODEL_ARGS}
ports:
- 8000:8000
Expand Down
2 changes: 1 addition & 1 deletion ci/conda/recipes/morpheus/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ outputs:
- scikit-learn =1.3.2.*
- sqlalchemy <2.0 # 2.0 is incompatible with pandas=1.3
- tqdm =4.*
- tritonclient =2.34.*
- tritonclient =2.45.*
- typing_utils =0.1.*
- watchdog =3.0.*
- websockets
Expand Down
2 changes: 1 addition & 1 deletion dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ dependencies:
- rapidjson=1.1.0
- scikit-build=0.17.6
- sysroot_linux-64=2.17
- tritonclient=2.34
- tritonclient=2.45
- ucx=1.15
- zlib=1.2.13

Expand Down
2 changes: 1 addition & 1 deletion docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ This example shows an NLP Pipeline which uses several stages available in Morphe
#### Launching Triton
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

#### Launching Kafka
Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_guide/guides/2_real_world_phishing.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ From the root of the Morpheus project we will launch a Triton Docker container w
```shell
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:23.06-py3 \
nvcr.io/nvidia/tritonserver:24.04-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down Expand Up @@ -381,7 +381,7 @@ From this information, we note that the expected dimensions of the model inputs
### Defining our Pipeline
For this pipeline we will have several configuration parameters such as the paths to the input and output files, we will be using the (click)[https://click.palletsprojects.com/] library to expose and parse these parameters as command line arguments. We will also expose the choice of using the class or function based stage implementation via the `--use_stage_function` command-line flag.

> **Note**: For simplicity, we assume that the `MORPHEUS_ROOT` environment variable is set to the root of the Morpheus project repository.
> **Note**: For simplicity, we assume that the `MORPHEUS_ROOT` environment variable is set to the root of the Morpheus project repository.

To start, we will need to instantiate and set a few attributes of the `Config` class. This object is used for configuration options that are global to the pipeline as a whole. We will provide this object to each stage along with stage-specific configuration parameters.

Expand All @@ -402,7 +402,7 @@ The `feature_length` property needs to match the dimensions of the model inputs,

Ground truth classification labels are read from the `morpheus/data/labels_phishing.txt` file included in Morpheus.

Now that our config object is populated, we move on to the pipeline itself. We will be using the same input file from the previous example.
Now that our config object is populated, we move on to the pipeline itself. We will be using the same input file from the previous example.

Next, we will add our custom recipient features stage to the pipeline. We imported both implementations of the stage, allowing us to add the appropriate one based on the `use_stage_function` value provided by the command-line.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ More advanced users, or those who are interested in using the latest pre-release
- [CUDA 12.1](https://developer.nvidia.com/cuda-12-1-0-download-archive)
- [Docker](https://docs.docker.com/get-docker/)
- [The NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `23.06` or higher
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `24.04` or higher

> **Note about Docker:**
>
Expand Down Expand Up @@ -146,7 +146,7 @@ Many of the validation tests and example workflows require a Triton server to fu
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:23.06-py3 \
nvcr.io/nvidia/tritonserver:24.04-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand All @@ -160,7 +160,7 @@ Note: The above command is useful for testing out Morpheus, however it does load
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:23.06-py3 \
nvcr.io/nvidia/tritonserver:24.04-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ This example utilizes the Triton Inference Server to perform inference.

Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```

From the Morpheus repo root directory, run the following to launch Triton and load the `abp-nvsmi-xgb` XGBoost model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
```

This will launch Triton and only load the `abp-nvsmi-xgb` model. This model has been configured with a max batch size of 32768, and to use dynamic batching for increased performance.
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ To run this example, an instance of Triton Inference Server and a sample dataset

### Triton Inference Server
```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```

##### Deploy Triton Inference Server
From the root of the Morpheus repo, run the following to launch Triton and load the `abp-pcap-xgb` model:
```bash
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/examples/abp_pcap_detection/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/examples/abp_pcap_detection/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models --exit-on-error=false
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/llm/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,12 +182,12 @@ Before running the pipeline, we need to ensure that the following services are r

- Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```

- From the Morpheus repo root directory, run the following to launch Triton and load the `all-MiniLM-L6-v2` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
```

This will launch Triton and only load the `all-MiniLM-L6-v2` model. Once Triton has loaded the model, the following
Expand Down
6 changes: 3 additions & 3 deletions examples/llm/vdb_upload/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,12 +130,12 @@ To retrieve models from LFS run the following:

- Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```

- From the Morpheus repo root directory, run the following to launch Triton and load the `all-MiniLM-L6-v2` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
```

This will launch Triton and only load the `all-MiniLM-L6-v2` model. Once Triton has loaded the model, the following
Expand Down Expand Up @@ -419,7 +419,7 @@ using `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` as an exampl
- Reload the docker container, specifying that we also need to load paraphrase-multilingual-mpnet-base-v2
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver \
-v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver \
--model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model \
all-MiniLM-L6-v2 --load-model sentence-transformers/paraphrase-multilingual-mpnet-base-v2
```
Expand Down
4 changes: 2 additions & 2 deletions examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```

##### Start Triton Inference Server Container
From the Morpheus repo root directory, run the following to launch Triton and load the `log-parsing-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating Triton version here breaks log parsing as described in #1727. The output config would have to be updated as follows:

output [
    {
        name: "output"
        data_type: TYPE_FP32
        dims: [ 256, 23 ]
    }
]

```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp_si_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ This example utilizes the Triton Inference Server to perform inference. The neur
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `24.04-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the `sid-minibert-onnx` model. This model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
4 changes: 2 additions & 2 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
docker pull nvcr.io/nvidia/tritonserver:24.04-py3
```
##### Setup Env Variable
```bash
Expand All @@ -39,7 +39,7 @@ From the Morpheus repo root directory, run the following to launch Triton and lo
```bash
# Run Triton in explicit mode
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/examples/ransomware_detection/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
-v $PWD/examples/ransomware_detection/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:24.04-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down
4 changes: 2 additions & 2 deletions examples/root_cause_analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ This example utilizes the Triton Inference Server to perform inference. The bina
From the Morpheus repo root directory, run the following to launch Triton and load the `root-cause-binary-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
```

Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `24.04-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the model required by our example pipeline. The model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
2 changes: 1 addition & 1 deletion examples/sid_visualization/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ x-with-gpus: &with_gpus

services:
triton:
image: nvcr.io/nvidia/tritonserver:23.06-py3
image: nvcr.io/nvidia/tritonserver:24.04-py3
<<: *with_gpus
command: "tritonserver --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx --model-repository=/models/triton-model-repo"
environment:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@
#include "morpheus/stages/inference_client_stage.hpp"
#include "morpheus/types.hpp"

#include <boost/fiber/fss.hpp>
#include <http_client.h>
#include <mrc/coroutines/task.hpp>

#include <cstdint>
// IWYU pragma: no_include "rxcpp/sources/rx-iterate.hpp"

#include <memory>
#include <mutex>
#include <string>
#include <vector>

Expand Down Expand Up @@ -106,7 +108,11 @@ class MORPHEUS_EXPORT ITritonClient
class MORPHEUS_EXPORT HttpTritonClient : public ITritonClient
{
private:
std::unique_ptr<triton::client::InferenceServerHttpClient> m_client;
std::string m_server_url;
std::mutex m_client_mutex;
boost::fibers::fiber_specific_ptr<triton::client::InferenceServerHttpClient> m_fiber_local_client;

triton::client::InferenceServerHttpClient& get_client();

public:
HttpTritonClient(std::string server_url);
Expand Down
Loading
Loading