Skip to content

Commit

Permalink
Updated model serving yamls
Browse files Browse the repository at this point in the history
  • Loading branch information
Maxusmusti committed Jan 22, 2024
1 parent 181e6e5 commit bc594fa
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 6 deletions.
2 changes: 2 additions & 0 deletions language/llama2-70b/api-endpoint-artifacts/model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ metadata:
name: llama-2-70b-chat-isvc
spec:
predictor:
minReplicas: 1
maxReplicas: 1
apiVersion: serving.kserve.io/v1alpha2
serviceAccountName: sa
timeout: 240
Expand Down
12 changes: 6 additions & 6 deletions language/llama2-70b/api-endpoint-artifacts/serving-runtime.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,11 @@ spec:
# value: float16
# Dynamic batch size changes
- name: MAX_BATCH_SIZE
value: "256"
value: "128"
- name: MAX_CONCURRENT_REQUESTS
value: "256"
value: "200"
- name: MAX_BATCH_WEIGHT
value: "540000"
value: "550000"
- name: MAX_SEQUENCE_LENGTH
value: "2048"
- name: MAX_PREFILL_WEIGHT
Expand All @@ -79,16 +79,16 @@ spec:
value: hf_custom_tp
resources: # configure as required
requests:
cpu: 36
memory: 700Gi
cpu: 64
memory: 900Gi
nvidia.com/gpu: 8
limits:
nvidia.com/gpu: 8
- name: transformer-container
image: quay.io/opendatahub/caikit-tgis-serving:fast
env:
- name: RUNTIME_GRPC_SERVER_THREAD_POOL_SIZE
value: "160"
value: "200"
volumeMounts:
- name: config-volume
mountPath: /caikit/config/
Expand Down

0 comments on commit bc594fa

Please sign in to comment.