Skip to content
This repository was archived by the owner on Oct 15, 2025. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions quickstart/examples/no-sample-app/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# No Sample App Preset

This preset is for when you want to provision the infrastructure around modelservice but not actually create a `modelservice` custom resource.
This can be particularly helpful because of how helm works - it wants to make sure that it owns and deploys all resources in its chart via labels.
The result of this, is that if you create a sample app through the helm charts, and you want to modify or delete your `modelservice` custom resource,
you have to uninstall the whole stack, tweak an example values override file before re-installing through quickstart.

This preset enables you to have more control over that. You can apply the "No Sample App" preset, use `helm template` to template manifests based on
a different preset, extract the resourcess that make up the diff ([see below](./README.md#making-up-the-difference)), and then apply them by hand.
The result should be you can make **some** modifications to your setup without having to use the quickstart `--uninstall` tack in the installer script,
and install everything from the ground up again.

It should be noted that you can only make **some** modifications because
[the baseConfigMap](https://github.com/llm-d/llm-d-deployer/tree/dc90005ef2fc87f1e6dbac282dfc038187efde12/charts/llm-d/templates/modelservice/presets)
presets for `modelservice` are immutable, any desired changes there will require uninstalling and reinstalling via the quickstart install script.

## Deploying Infrastructure

Assuming one is starting from the quickstart directory, a sample deploy might look like:

```bash
HF_TOKEN=$(HFTOKEN) ./llmd-installer.sh --namespace llm-d --values-file examples/no-sample-app/no-sample-app.yaml
```

## Validating the deployment

You should expect to see only see infra related resources that can be plugged into any setup, specifically:

- A gateway, which should spin a gateway deployment / pod.
- A modelservice controller.
- A redis master deployment. This is turned on by default in this example in case the desired modelservice should use the `baseConfigMapRef` of `base-gpu-with-nixl-and-redis-lookup-preset`. If you do not want to use this preset, you can optionally add the following to your `no-sample-app.yaml`:

```yaml
sampleApplication:
enabled: false
redis
enabled: false
```

To sumarrize, in a successful deployment of this preset, expect to see:

```log
NAME READY STATUS RESTARTS AGE
llm-d-inference-gateway-654ddbcb74-25vv4 1/1 Running 0 58m
llm-d-modelservice-7844f89454-lcs2r 1/1 Running 0 58m
llm-d-redis-master-8599696799-2wpv5 1/1 Running 0 58m
```

## Deploying the `ModelService`

To see all resource templates that get created based on a "Sample App" which we will intentionally create manually in this example, you can
[look here](https://github.com/llm-d/llm-d-deployer/tree/main/charts/llm-d/templates/sample-application).

These templates will map into the following manifests that we will create later based on this `no-sample-app` vs another sample app scenario:

1. A `modelservice` custom resource.
- This is the biggest component of a "Sample App" and will create many other children resources, such as prefill, decode, and `endpoint-picker` deployments, services, serviceAccounts, RBAC, an `inferencepool`, an `inferencemodel`, etc.
2. An `httpRoute` that will be accepted by the gateway, with a backendRef of the `inferencepool` that will get created by the `modelservice` CR.
3. (ISTIO ONLY) A `destinationRule`. We have support for Istio as a gateway control plane implementation but for now we are using kgateway as a default, so I don't believe this has ever been tested.
4. A `ClusterRoleBinding` for the `endpoint-picker` that allow for scraping of `endpoint-picker` (epp) metrics with authentication.
- This is included in "sample app" because per "sample app" there is an `endpoint-picker` that gets deployed, with a different `serviceAccount`, named based off the `modelName` in the Sample App.

## Making Up the Difference

Lets walkthrough how one might use this "No Sample App" preset to provisoin the base infrastructure, and then create a sample app by hand to re-create one of the other presets.

First lets pick a "Sample App" to re-create - in this case lets start with "no-features". We are going to use `helm template` with two values files, first
the base values file, and then passing the "no features" preset as an override values file, and then dumping that to a temporary yaml file.

```bash
# navigate to the chart directory
cd ../../../charts/llm-d/

# set your namespace in your no-sample-app infra was already deployed
: "${NAMESPACE:=llm-d}"

TEMP=$(mktemp)

helm template llm-d ./ -n ${NAMESPACE} --values ../../quickstart/examples/no-features/no-features.yaml > $TEMP
```

For those unfamiliar with `helm` or `helm template`, this above command will tell helm to spit out what manifests a `helm upgrade -i` or `helm install`
would plan to create based on the merge of the two values files. It will treat the second values file with higher priority in this merge, and overwrite anything
that appears in both with the value of the 2nd values file (in this case, our `no-features` preset we are trying to re-create).

As mentioned above, that `$TEMP` file should include all the manifests from the `no-features` example, but if you remember we already applied our `no-sample-app` preset, so many of these manifests will already exist in our cluster.

We want to grab just the `modelservice` CR, the `httpRoute`, and `clusterRoleBindigng` for EPP metrics. You can do this by hand, but based on the `no-features` preset were re-creating, you can automate it with the following (this yq):

```bash
yq -i eval-all '
select(
(.kind == "ModelService")
or (.kind == "HTTPRoute")
or (.kind == "ClusterRoleBinding" and .metadata.name == "llm-d-llm-d-modelservice-epp-metrics-scrape")
)
| del(.metadata.labels["helm.sh/chart"])
| del(.metadata.labels["app.kubernetes.io/managed-by"])
' $TEMP

kubectl apply -f $TEMP
```

> [!NOTE]
> The `ClusterRoleBinding` name is based on the `modelservice` controller name, so it can vary.
> It is suggested to copy the manifests by hand, but based on matching up the `CRB` name with the `no-feature` example, this example can be scripted.
> This example also assumes you are using the default `kgateway` control plane and APIs rather than Istio.
> This command depends on the `mikefarah` version of `yq` v4 or greater

In addition to grabbing those resources, the above script will filter out the helm labels that would otherwise get templated into the manifests,
allowing us to create and manage these resources by hand.

Now our pods should reflect that of the `no-feature` example with our new `epp` and single `decode` pods:

```log
NAME READY STATUS RESTARTS AGE
llm-d-inference-gateway-654ddbcb74-25vv4 1/1 Running 0 97m
llm-d-modelservice-7844f89454-lcs2r 1/1 Running 0 97m
llm-d-redis-master-8599696799-2wpv5 1/1 Running 0 97m
meta-llama-llama-3-2-3b-instruct-decode-7c679bb8b4-kbxc6 2/2 Running 0 42s
meta-llama-llama-3-2-3b-instruct-epp-8d4749d47-dnpf2 1/1 Running 0 42s
```
2 changes: 2 additions & 0 deletions quickstart/examples/no-sample-app/no-sample-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sampleApplication:
enabled: false
Loading