Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OVH] CrashloopBackoff - OVHcloud API error (status code 404) #4948

Open
arnaudclerc opened this issue Dec 13, 2024 · 0 comments
Open

[OVH] CrashloopBackoff - OVHcloud API error (status code 404) #4948

arnaudclerc opened this issue Dec 13, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@arnaudclerc
Copy link

Hello, We encounter a recurrent crash of external-dns in our cluster when running it against OVH provider.

RAW JSON log below
[
  {
    "line": "{\"level\":\"info\",\"msg\":\"config: {APIServerURL: KubeConfig: RequestTimeout:30s DefaultTargets:[] GlooNamespaces:[gloo-system] SkipperRouteGroupVersion:zalando.org/v1 Sources:[ingress] Namespace: AnnotationFilter: LabelFilter: IngressClassNames:[nginx-public] FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false IgnoreIngressTLSSpec:false IgnoreIngressRulesSpec:false GatewayNamespace: GatewayLabelFilter: Compatibility: PublishInternal:false PublishHostIP:false AlwaysPublishNotReadyAddresses:false ConnectorSourceServer:localhost:8080 Provider:ovh ProviderCacheTime:15m0s GoogleProject: GoogleBatchChangeSize:1000 GoogleBatchChangeInterval:1s GoogleZoneVisibility: DomainFilter:[XXXXX.fr] ExcludeDomains:[] RegexDomainFilter: RegexDomainExclusion: ZoneNameFilter:[] ZoneIDFilter:[] TargetNetFilter:[] ExcludeTargetNets:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSProfiles:[] AWSAssumeRoleExternalID: AWSBatchChangeSize:1000 AWSBatchChangeSizeBytes:32000 AWSBatchChangeSizeValues:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AWSPreferCNAME:false AWSZoneCacheDuration:0s AWSSDServiceCleanup:false AWSZoneMatchParent:false AWSDynamoDBRegion: AWSDynamoDBTable:external-dns AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: AzureSubscriptionID: AzureUserAssignedIdentityClientID: AzureActiveDirectoryAuthorityHost: CloudflareProxied:false CloudflareDNSRecordsPerPage:100 CoreDNSPrefix:/skydns/ AkamaiServiceConsumerDomain: AkamaiClientToken: AkamaiClientSecret: AkamaiAccessToken: AkamaiEdgercPath: AkamaiEdgercSection: OCIConfigFile:/etc/kubernetes/oci.yaml OCICompartmentOCID: OCIAuthInstancePrincipal:false OCIZoneScope:GLOBAL OCIZoneCacheDuration:0s InMemoryZones:[] OVHEndpoint:ovh-eu OVHApiRateLimit:10 PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSSkipTLSVerify:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:xxx_xxxxx_xxx-1 TXTPrefix: TXTSuffix: TXTEncryptEnabled:false TXTEncryptAESKey: Interval:5m0s MinEventSyncInterval:5m0s Once:false DryRun:false UpdateEvents:true LogFormat:json MetricsAddress::7979 LogLevel:trace TXTCacheInterval:0s TXTWildcardReplacement: ExoscaleEndpoint: ExoscaleAPIKey: ExoscaleAPISecret: ExoscaleAPIEnvironment:api ExoscaleAPIZone:ch-gva-2 CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: ResolveServiceLoadBalancerHostname:false RFC2136Host: RFC2136Port:0 RFC2136Zone:[] RFC2136Insecure:false RFC2136GSSTSIG:false RFC2136CreatePTR:false RFC2136KerberosRealm: RFC2136KerberosUsername: RFC2136KerberosPassword: RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false RFC2136MinTTL:0s RFC2136BatchChangeSize:50 RFC2136UseTLS:false RFC2136SkipTLSVerify:false NS1Endpoint: NS1IgnoreSSL:false NS1MinTTLSeconds:0 TransIPAccountName: TransIPPrivateKeyFile: DigitalOceanAPIPageSize:50 ManagedDNSRecordTypes:[A AAAA CNAME] ExcludeDNSRecordTypes:[] GoDaddyAPIKey: GoDaddySecretKey: GoDaddyTTL:0 GoDaddyOTE:false OCPRouterName: IBMCloudProxied:false IBMCloudConfigFile:/etc/kubernetes/ibmcloud.json TencentCloudConfigFile:/etc/kubernetes/tencent-cloud.json TencentCloudZoneType: PiholeServer: PiholePassword: PiholeTLSInsecureSkipVerify:false PluralCluster: PluralProvider: WebhookProviderURL:http://localhost:8888 WebhookProviderReadTimeout:5s WebhookProviderWriteTimeout:10s WebhookServer:false TraefikDisableLegacy:false TraefikDisableNew:false}\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164062484242",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"Instantiating new Kubernetes client\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164062574530",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"Using inCluster-config based on serviceaccount-token\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164062591071",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"Created Kubernetes client https://10.X.X.X:443\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164064074600",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"Records cache provider: refreshing records list cache\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164166059991",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"OVH: 1 zones found\",\"time\":\"2024-12-13T07:16:04Z\"}",
    "timestamp": "1734074164369265059",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"OVH: 2055 endpoints have been found\",\"time\":\"2024-12-13T07:19:33Z\"}",
    "timestamp": "1734074373208752702",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"All records are already up to date\",\"time\":\"2024-12-13T07:19:33Z\"}",
    "timestamp": "1734074373224216172",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"info\",\"msg\":\"OVH: 1 zones found\",\"time\":\"2024-12-13T07:36:05Z\"}",
    "timestamp": "1734075365454231679",
    "fields": {
      "app": "external-dns",
      "detected_level": "info"
    }
  },
  {
    "line": "{\"level\":\"fatal\",\"msg\":\"Failed to do run once: OVHcloud API error (status code 404): Client::NotFound: \\\"Record does not exist\\\" (X-OVH-Query-Id: EU.ext-4.675bdedc.3273899.<ticket-ID>)\",\"time\":\"2024-12-13T07:16:03Z\"}",
    "timestamp": "1734074163707443613",
    "fields": {
      "app": "external-dns",
      "detected_level": "fatal"
    }
  }
]

What happened:

the container crashes every 10 (ish) times, the cluster where external-dns runs is in stand-by (no ingress creation events that could triggers a crash

What you expected to happen:

We should be able to either:

  • ignore the error and continue
  • use 'trace' logLevel to describe the api call in error (with body details)
  • fix the error if identified

How to reproduce it (as minimally and precisely as possible) - using helm chart:

- name: external-dns
  version: 8.7.0
  repository: https://charts.bitnami.com/bitnami
please find the argoCD values file below
sources:
- ingress
logLevel: trace
interval: 5m
logFormat: json
provider:
name: ovh
env:
- name: OVH_CONSUMER_KEY
  value: "OVH_CONSUMER_KEY"
- name: OVH_APPLICATION_KEY
  value: "OVH_APPLICATION_KEY"
- name: OVH_APPLICATION_SECRET
  value: "OVH_APPLICATION_SECRET"
domainFilters:
- dns-name.fr
extraArgs:
- --ingress-class=nginx-public
- --ovh-api-rate-limit=10
- --provider-cache-time=15m
- --events
- --txt-cache-interval=15m
- --min-event-sync-interval=5m
txtOwnerId: "cluster-region-project"
policy: "sync"
resources:
requests:
  cpu: 50m
  memory: 128Mi
limits:
  cpu: 500m
  memory: 512Mi
serviceMonitor:
enabled: true
interval: 10m
labels:
  app: external-dns-public
initContainers:
- name: init-jitter
  securityContext:
    runAsUser: 65534
  image: bardiir/pv
  command:
  - /bin/sh
  - -c
  - yes | pv -SpeL1 -s $((RANDOM % 10)) > /dev/null

Environment:

  • External-DNS version : registry.k8s.io/external-dns/external-dns:v0.15.0 (container in a pod)
  • DNS provider: OVH
  • Others:
    • full configuration provided to the container through the chart:
    - --log-level=trace
    - --log-format=json
    - --interval=5m
    - --source=ingress
    - --policy=sync
    - --registry=txt
    - --txt-owner-id=cluster-region-project
    - --domain-filter=domain-name.fr
    - --provider=ovh
    - --ingress-class=nginx-public
    - --ovh-api-rate-limit=10
    - --provider-cache-time=15m
    - --events
    - --min-event-sync-interval=5m
  • container security policy
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  privileged: false
  readOnlyRootFilesystem: true
  runAsGroup: 65532
  runAsNonRoot: true
  runAsUser: 65532
@arnaudclerc arnaudclerc added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant