-
Notifications
You must be signed in to change notification settings - Fork 98
Labels
Milestone
Description
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-GNR
Running nodes
Single Node
Description
OPEA will integrate KubeAI for efficient inference on Kubernetes in v1.4.
@poussa provided the following plan:
- Model Profiles
- 10 models for Gaudi and CPU
- Benchmark and tune each model (1,2,4,8 replicas)
- OPEA Configuration
- Resource Profiles (k8s resources definition)
- Cache Profiles (k8s storage definition)
- Resource and ache profiles are referenced from the models
- Load Balancing
- Validation and tuning on Gaudi
- Benchmarking
- Auto Scaling
- Validation and tuning on Gaudi
- Benchmarking
- Observability
- Validation and tuning on Gaudi