Audience: On-call engineers managing the hyperfleet-adapter service.
The hyperfleet-adapter consumes CloudEvents from a message broker (Google Pub/Sub or RabbitMQ), evaluates preconditions, applies Kubernetes resources or Maestro ManifestWorks, and reports status back to the HyperFleet API.
Ports:
8080— Health endpoints (/healthz,/readyz)9090— Prometheus metrics (/metrics)
Startup sequence:
- Load adapter config and task config
- Initialize OpenTelemetry tracing
- Start health server and metrics server
- Create HyperFleet API client
- Create transport client (Maestro or Kubernetes)
- Build executor
- Create broker subscriber and subscribe to topic
- Mark readiness (
/readyzreturns 200)
Any failure in steps 1–7 causes the process to exit with code 1.
| Endpoint | Probe Type | Behavior |
|---|---|---|
/healthz |
Liveness | Always returns 200 OK |
/readyz |
Readiness | Returns 200 OK when config is loaded and broker is connected |
| Check | Meaning |
|---|---|
config |
Adapter and task configs loaded successfully |
broker |
Broker subscription established |
If /readyz returns 503, inspect the response body for which check is failing:
kubectl exec <pod> -- curl -s localhost:8080/readyz | jq .Symptoms: Pod is in CrashLoopBackOff. Exit code 1 in container logs.
Common causes and resolution:
| Log Pattern | Cause | Resolution |
|---|---|---|
"adapter config file path is required" |
Missing --config flag or HYPERFLEET_ADAPTER_CONFIG env |
Check deployment args and ConfigMap mount |
"task config file path is required" |
Missing --task-config flag or HYPERFLEET_TASK_CONFIG env |
Check deployment args and ConfigMap mount |
"failed to read adapter config file" |
ConfigMap not mounted or wrong path | Verify ConfigMap exists and volumeMount path matches |
"failed to parse adapter config YAML" |
Invalid YAML syntax | Validate config YAML offline with yq |
"unsupported apiVersion" |
Config version mismatch | Update config to match expected apiVersion |
"adapter version mismatch" |
spec.adapter.version doesn't match binary version |
Update config or redeploy matching binary version |
"HyperFleet API base URL not configured" |
Missing spec.clients.hyperfleetApi.baseUrl |
Set base URL in adapter config or HYPERFLEET_API_BASE_URL env |
"maestro config is required" |
Maestro transport selected but no config | Add spec.clients.maestro section to adapter config |
"maestro server address is required" |
Missing Maestro gRPC address | Set grpcServerAddress in maestro config |
Steps:
- Check pod logs:
kubectl logs <pod> --previous - Verify ConfigMaps exist:
kubectl get configmap -l app.kubernetes.io/name=hyperfleet-adapter - Verify volume mounts:
kubectl describe pod <pod>— look for mount paths - Validate config offline:
kubectl get configmap <name> -o yaml | yq .data
Symptoms: Pod starts but /readyz returns 503 with broker: error. Or pod crashes with broker-related errors.
| Log Pattern | Cause | Resolution |
|---|---|---|
"Missing required broker configuration" |
subscriptionId or topic not set |
Check broker ConfigMap and env vars |
"Failed to create subscriber" |
Broker backend misconfiguration | Verify broker connection settings (Pub/Sub project, RabbitMQ URL) |
"Failed to subscribe to topic" |
Topic doesn't exist or permissions denied | Verify topic exists and service account has subscribe permission |
"Subscription error" |
Transient broker error | Usually recovers; monitor for frequency |
"Fatal subscription error, shutting down" |
Unrecoverable broker error | Check broker service health; adapter will restart via liveness probe |
Steps:
- Check broker ConfigMap:
kubectl get configmap <release>-broker -o yaml - For Google Pub/Sub: verify subscription exists and IAM permissions
- For RabbitMQ: verify exchange, queue, and connectivity from the pod network
- Check if broker service is healthy independently
Symptoms: Pod crashes immediately on startup with config-related errors.
| Log Pattern | Cause | Resolution |
|---|---|---|
"referenced file ... does not exist" |
Task config references a file not in the ConfigMap | Ensure all ref: files are included in the ConfigMap |
"CEL parse error" |
Invalid CEL expression in preconditions | Fix CEL syntax in task config |
"undefined template variable" |
Template references a variable not in params | Check param names match template variables |
"mutually exclusive" |
Conflicting config options (e.g. build and buildRef) |
Use only one of the mutually exclusive options |
"manifest is required for maestro/kubernetes transport" |
Resource missing manifest definition | Add manifest to resource in task config |
Steps:
- Check pod logs for the specific validation error
- Pull the ConfigMap content and validate locally
- Use the adapter's dry-run mode to test config changes before deploying:
./adapter serve --config adapter-config.yaml --task-config task-config.yaml --dry-run-event event.json
Symptoms: hyperfleet_adapter_events_processed_total{status="failed"} is increasing. Events are not being processed successfully.
All event failures are ACKed (not retried) to avoid infinite loops on non-transient errors.
| Log Pattern | Phase | Cause | Resolution |
|---|---|---|---|
"Failed to parse event data" |
Params | Malformed CloudEvent payload | Check upstream event producer |
"failed to extract required parameter" |
Params | Missing field in event data | Verify event schema matches task config params |
"Precondition[...] evaluated: FAILED" |
Preconditions | API precondition check returned failure | Check HyperFleet API state for the resource |
"Precondition[...] evaluated: NOT_MET" |
Preconditions | Condition not satisfied (expected) | Normal flow — event skipped |
"failed to render manifest" |
Resources | Template rendering error | Fix Go template syntax in task config |
"failed to apply resource" |
Resources | Transport client error | See Maestro or K8s failure sections |
"PostAction[...] processed: FAILED" |
Post Actions | Status report API call failed | Check HyperFleet API connectivity |
Steps:
- Check error metrics:
rate(hyperfleet_adapter_errors_total[5m]) - Identify the failing phase from
error_typelabel - Check pod logs for the specific event ID and error details
- For persistent failures, use dry-run to reproduce:
./adapter serve --config adapter-config.yaml --task-config task-config.yaml --dry-run-event failing-event.json
Symptoms: Post-action status updates fail. hyperfleet_adapter_errors_total{error_type="post_actions"} is increasing.
| Log Pattern | Cause | Resolution |
|---|---|---|
"HyperFleet API request returned retryable status" |
API returning 5xx, 408, or 429 | Check HyperFleet API health |
"API request failed: ... after N attempt(s)" |
All retries exhausted | API may be down or overloaded |
"context canceled" |
Request timed out | Increase spec.clients.hyperfleetApi.timeout or check API latency |
The adapter retries on 5xx, 408 (Request Timeout), and 429 (Too Many Requests) with configurable backoff (exponential, linear, or constant).
Steps:
- Check HyperFleet API health:
kubectl get pods -l app=hyperfleet-api - Check API response times from the adapter pod:
kubectl exec <pod> -- curl -s -o /dev/null -w "%{http_code} %{time_total}s" http://hyperfleet-api:8000/healthz
- Check for resource exhaustion on the API service
- Review
retryAttemptsandtimeoutin adapter config
Symptoms: Resource apply phase fails. Events fail with resource-related errors.
| Log Pattern | Cause | Resolution |
|---|---|---|
"consumer ... is not registered in Maestro" |
Target cluster consumer not registered | Register the consumer in Maestro before sending events |
"failed to create ManifestWork" |
Maestro API error | Check Maestro server health and logs |
"failed to parse CA certificate" |
Invalid TLS CA certificate | Verify cert files mounted at configured paths |
"failed to create Maestro work client" |
gRPC connection failure | Check Maestro gRPC endpoint connectivity |
Steps:
- Verify Maestro server is running:
kubectl get pods -n maestro - Test gRPC connectivity from the adapter pod:
kubectl exec <pod> -- grpcurl -plaintext maestro-grpc.maestro.svc.cluster.local:8090 grpc.health.v1.Health/Check
- For TLS issues, verify cert secret is mounted and certs are valid:
kubectl exec <pod> -- openssl x509 -in /etc/maestro/certs/grpc/ca.crt -noout -dates
- Verify consumer registration in Maestro for the target cluster
Symptoms: Resource apply phase fails for Kubernetes transport tasks.
| Log Pattern | Cause | Resolution |
|---|---|---|
"failed to create kubernetes client" |
RBAC or kubeconfig issues | Check ServiceAccount and RBAC permissions |
K8sOperationError{Operation:"create",...} |
Resource creation failed | Check RBAC, resource quotas, namespace existence |
K8sOperationError{Operation:"update",...} |
Conflict on update | Usually transient; retried automatically |
"context canceled while waiting for resource deletion" |
Recreate timed out waiting for deletion | Resource may have finalizers preventing deletion |
Retryable K8s errors (automatically retried): timeouts, server unavailable, internal errors, rate limiting, network errors (connection refused, connection reset). Non-retryable: forbidden, unauthorized, bad request, invalid, gone, method not supported, not acceptable.
Steps:
- Check RBAC:
kubectl auth can-i --as=system:serviceaccount:<ns>:<sa> create <resource> - Check resource quotas:
kubectl describe resourcequota -n <namespace> - Check for finalizers blocking deletion:
kubectl get <resource> -o yaml | grep finalizers
kubectl rollout restart deployment/<release>-hyperfleet-adapter -n <namespace>Events are ACKed on failure and not automatically retried. To reprocess:
-
Identify the failed event from logs (look for
event_id) -
Republish the event to the broker topic. The event payload must conform to the async API contract.
For Google Pub/Sub:
gcloud pubsub topics publish <topic> --message '<event-json>'
For RabbitMQ:
rabbitmqadmin publish exchange=<exchange> routing_key=<key> payload='<event-json>'
-
Monitor
hyperfleet_adapter_events_processed_totalfor the reprocessed event
kubectl rollout undo deployment/<release>-hyperfleet-adapter -n <namespace># Check pod health
kubectl get pods -l app.kubernetes.io/name=hyperfleet-adapter
# Check readiness
kubectl exec <pod> -- curl -s localhost:8080/readyz | jq .
# Check metrics
kubectl exec <pod> -- curl -s localhost:9090/metrics | grep hyperfleet_adapter_up
# Verify events are being processed (confirms the adapter is doing work, not just alive).
# If the counter is not incrementing, confirm there are messages in the broker
# before concluding the adapter is stuck.
kubectl exec <pod> -- curl -s localhost:9090/metrics | grep hyperfleet_adapter_events_processed_total| Severity | Condition | Action |
|---|---|---|
| P1 - Critical | Adapter down across all replicas; no events processing | Page on-call SRE. Check broker and API dependencies. |
| P2 - High | High failure rate (>10% events failing for 15+ min) | Notify team. Investigate error_type in metrics. |
| P3 - Medium | Single replica crash-looping; other replicas healthy | Investigate logs. This may be a config or resource issue. |
| P4 - Low | Intermittent errors, auto-recovering | Monitor. Create ticket if pattern persists. |
| Level | Team | When |
|---|---|---|
| L1 | On-call SRE | Initial triage, restart, basic recovery |
| L2 | HyperFleet Adapter team | Config issues, event processing logic, transport client failures |
| L3 | HyperFleet Platform team | Broker infrastructure, Maestro server, HyperFleet API issues |
- Email: openshift-hyperfleet@redhat.com
- Jira: Create a ticket in the HYPERFLEET project with component
adapter - GitHub: Open an issue at openshift-hyperfleet/hyperfleet-adapter