Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications intermittently not being sent when notification queue is overloaded and/or notification services are slow to respond #146

Open
chatchai-outreach opened this issue Jan 11, 2023 · 0 comments

Comments

@chatchai-outreach
Copy link

Describe the bug
Notification engine fails to remove on-deployed key from notified.notifications.argoproj.io annotation before ArgoCD app state moves to Synced & Healthy

This happens when notification-engine picks up an app from the queue to process after the states already move to Synced and Healthy. It could happen when the queue is huge and/or notification services are slow to respond.

Our system
number of applications: >160
notification services:

  • slack
  • opslevel (webhook)
  • internal deployment registry (webhook)

trigger.on-deployed: | - description: Application is synced and healthy. send: - app-deployed when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy'

Reproducible steps

  1. add (1) a slack notification service and (2) a notification service that delays its response (ie. 30s)

service.slack: | token: $slack-token icon: ":argo:" signingSecret: $slack-signing-secret service.webhook.test: | url: <slow-notification-service-url> headers: - name: X-Delay-Duration value: 30s

  1. annotate all apps with on-sync-running, on-sync-succeeded, on-deployed

k annotate application --all -n argocd notifications.argoproj.io/subscribe.on-sync-running.test=""
k annotate application --all -n argocd notifications.argoproj.io/subscribe.on-sync-succeeded.test=""
k annotate application --all -n argocd notifications.argoproj.io/subscribe.on-deployed.test=""
k annotate application --all -n argocd notifications.argoproj.io/subscribe.on-deployed.slack="#my-channel"

Note: this will make notification-engine stuck in waiting state for the response from the slow-notification-service during sync-running and sync-succeeded while argocd completes sync & rollout of the test app, whose states have already transition to Synced and Healthy.

PS. this issue can occur easily with

  • more apps (with frequent state changes)
  • more notification services (with slow response)
  • smaller sync window (from OutOfSync to Synced & Healthy)
  1. update target revision of a test app

Version
notifications-engine: v0.3.1
argocd:
{
"Version": "v2.4.16+7b5899b",
"BuildDate": "2022-11-01T21:17:46Z",
"GitCommit": "7b5899be33d16af7c57523d85ebacaa6f345cb95",
"GitTreeState": "clean",
"GoVersion": "go1.18.8",
"Compiler": "gc",
"Platform": "linux/amd64",
"KustomizeVersion": "v4.4.1 2021-11-11T23:36:27Z",
"HelmVersion": "v3.8.1+g5cb9af4",
"KubectlVersion": "v0.23.1",
"JsonnetVersion": "v0.18.0"
}

@chatchai-outreach chatchai-outreach changed the title Notifications are not being sent intermittently when notification queue is overloaded and/or notification services are slow to respond Notifications intermittently not being sent when notification queue is overloaded and/or notification services are slow to respond Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant