The procedures below outline different example scenarios which you may find useful for your ML-Eval setup.
If the LMEvalJob
needs to access a model on HuggingFace with the access token, you can set up the HF_TOKEN
as one of the environment variables for the lm-eval
container.
-
You have logged in to Red Hat OpenShift AI.
-
Your OpenShift cluster administrator has installed OpenShift AI and enabled the TrustyAI service for the data science project where the models are deployed.
To start an evaluation job for a huggingface
model, apply the following YAML file:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
name: evaljob-sample
spec:
model: hf
modelArgs:
- name: pretrained
value: huggingfacespace/model
taskList:
taskNames:
- unfair_tos/
logSamples: true
pod:
container:
env:
- name: HF_TOKEN
value: "My HuggingFace token"
You can also create a secret to store the token, then refer the key from the secretKeyRef
object using the following reference syntax:
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: my-secret
key: hf-token
You can also run evaluations using custom unitxt cards. To do this, include the custom unitxt card in JSON format within the LMEvalJob YAML.
-
You have logged in to Red Hat OpenShift AI.
-
Your OpenShift cluster administrator has installed OpenShift AI and enabled the TrustyAI service for the data science project where the models are deployed.
-
Pass a custom Unitxt Card in JSON format:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
name: evaljob-sample
spec:
model: hf
modelArgs:
- name: pretrained
value: google/flan-t5-base
taskList:
taskRecipes:
- template: "templates.classification.multi_class.relation.default"
card:
custom: |
{
"__type__": "task_card",
"loader": {
"__type__": "load_hf",
"path": "glue",
"name": "wnli"
},
"preprocess_steps": [
{
"__type__": "split_random_mix",
"mix": {
"train": "train[95%]",
"validation": "train[5%]",
"test": "validation"
}
},
{
"__type__": "rename",
"field": "sentence1",
"to_field": "text_a"
},
{
"__type__": "rename",
"field": "sentence2",
"to_field": "text_b"
},
{
"__type__": "map_instance_values",
"mappers": {
"label": {
"0": "entailment",
"1": "not entailment"
}
}
},
{
"__type__": "set",
"fields": {
"classes": [
"entailment",
"not entailment"
]
}
},
{
"__type__": "set",
"fields": {
"type_of_relation": "entailment"
}
},
{
"__type__": "set",
"fields": {
"text_a_type": "premise"
}
},
{
"__type__": "set",
"fields": {
"text_b_type": "hypothesis"
}
}
],
"task": "tasks.classification.multi_class.relation",
"templates": "templates.classification.multi_class.relation.all"
}
logSamples: true
-
Inside the custom card specify the Hugging Face dataset loader:
"loader": {
"__type__": "load_hf",
"path": "glue",
"name": "wnli"
},
To use a PVC as storage for the LMEvalJob results, you can use either managed PVCS or existing PVCs. Managed PVCs are managed by the TrustyAI operator. Existing PVCs are created by the end-user before the LMEvalJob
is created.
Note
|
If both managed and existing PVCs are referenced in outputs, the TrustyAI operator defaults to the managed PVC. |
-
You have logged in to Red Hat OpenShift AI.
-
Your OpenShift cluster administrator has installed OpenShift AI and enabled the TrustyAI service for the data science project where the models are deployed.
To create a managed PVC, specify its size. The managed PVC is named <job-name>-pvc
and is available after the job finishes. When the LMEvalJob
is deleted, the managed PVC is also deleted.
-
Enter the following code:
apiVersion: trustyai.opendatahub.io/v1alpha1 kind: LMEvalJob metadata: name: evaljob-sample spec: # other fields omitted ... outputs: pvcManaged: size: 5Gi
-
outputs
is the section for specifying custom storage locations -
pvcManaged
will create an operator-managed PVC -
size
(compatible with standard PVC syntax) is the only supported value
To use an existing PVC, pass its name as a reference. The PVC must exist when you create the LMEvalJob
.
The PVC is not managed by the TrustyAI operator, so it is available after deleting the LMEvalJob
.
-
Create a PVC. An example is the following:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: "my-pvc" spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
-
Reference the new PVC from the
LMEvalJob
.apiVersion: trustyai.opendatahub.io/v1alpha1 kind: LMEvalJob metadata: name: evaljob-sample spec: # other fields omitted ... outputs: pvcName: "my-pvc"
To run an evaluation job on an InferenceService which is already deployed and running in your namespace, define your LMEvalJob CR, then apply this CR into the same namespace as your model.
-
You have logged in to Red Hat OpenShift AI.
-
Your OpenShift cluster administrator has installed OpenShift AI and enabled the TrustyAI service for the data science project where the models are deployed.
-
You have a namespace that contains an InferenceService with a vLLM model. This example assumes that the vLLM model is already deployed in your cluster.
-
Define your LMEvalJob CR:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
name: evaljob
spec:
model: local-completions
taskList:
taskNames:
- mmlu
logSamples: true
batchSize: 1
modelArgs:
- name: model
value: granite
- name: base_url
value: $ROUTE_TO_MODEL/v1/completions
- name: num_concurrent
value: "1"
- name: max_retries
value: "3"
- name: tokenized_requests
value: "False"
- name: tokenizer
value: ibm-granite/granite-7b-instruct
env:
- name: OPENAI_TOKEN
valueFrom:
secretKeyRef:
name: <secret-name>
key: token
-
Apply this CR into the same namespace as your model. You should see a pod spin up in your model namespace called
evaljob
. In the pod terminal, you can see the output viatail -f output/stderr.log
-
base_url
should be set to the route/service URL of your model. Make sure to include the/v1/completions
endpoint in the URL. -
env.valueFrom.secretKeyRef.name
should point to a secret that contains a token that can authenticate to your model.secretRef.name
should be the secret’s name in the namespace, whilesecretRef.key
should point at the token’s key within the secret. -
secretKeyRef.name
can equal the output of:oc get secrets -o custom-columns=SECRET:.metadata.name --no-headers | grep user-one-token
-
secretKeyRef.key
is set totoken