Skip to content

conallob/silence-manager

Repository files navigation

Silence Manager

A Kubernetes CronJob utility written in Golang that synchronizes Prometheus Alertmanager silences with ticket tracking systems (initially Jira).

Overview

Silence Manager ensures that Alertmanager silences and tracking tickets remain synchronized by:

  1. Extending silences when the associated ticket is still open and the silence is about to expire
  2. Deleting silences when the associated ticket is marked as resolved
  3. Reopening tickets and recreating silences when a ticket is closed but the alert refires

Architecture

The application uses abstract interfaces to support multiple alertmanager and ticket system implementations:

  • AlertManager Interface: Abstracts alertmanager operations (currently supports Prometheus Alertmanager)
  • Ticket System Interface: Abstracts ticket operations (currently supports Atlassian Jira)

This design allows for easy extension to support additional systems in the future.

Features

  • Kubernetes-native service discovery - Automatically discovers Alertmanager across all namespaces
  • Optional metrics publishing - Publish metrics to Prometheus Pushgateway or OpenTelemetry Collector (disabled by default)
  • Automatic silence extension for open tickets
  • Automatic silence deletion for resolved tickets
  • Automatic ticket reopening and silence recreation for refired alerts
  • Configurable thresholds and durations
  • Runs as a Kubernetes CronJob
  • Comprehensive logging

Project Structure

silence-manager/
├── cmd/
│   └── silence-manager/    # Main application entry point
├── pkg/
│   ├── alertmanager/        # Alertmanager interface and Prometheus implementation
│   ├── ticket/              # Ticket interface and Jira implementation
│   ├── sync/                # Synchronization logic
│   ├── metrics/             # Metrics publishing (Pushgateway, OTel)
│   ├── k8s/                 # Kubernetes service discovery
│   └── config/              # Configuration management
├── deployments/             # Kubernetes manifests
├── Dockerfile               # Container image build
└── README.md

Prerequisites

  • Go 1.21 or higher
  • Docker (for building container images)
  • Kubernetes cluster
  • Prometheus Alertmanager instance
  • Jira account with API token

Configuration

The application is configured via environment variables:

Required Configuration

Variable Description Example
JIRA_URL Jira instance URL https://yourcompany.atlassian.net
JIRA_USERNAME Jira username (email) [email protected]
JIRA_API_TOKEN Jira API token your-api-token
JIRA_PROJECT_KEY Jira project key OPS

Optional Configuration

Alertmanager Configuration

Variable Description Default
ALERTMANAGER_URL Alertmanager URL (if not set, auto-discovery is enabled) (empty - auto-discovery)
ALERTMANAGER_AUTO_DISCOVER Enable auto-discovery (automatically enabled when URL is empty) true when URL is empty
ALERTMANAGER_DISCOVERY_SERVICE_NAME Service name pattern to match for discovery alertmanager
ALERTMANAGER_DISCOVERY_SERVICE_LABEL Label selector for service discovery app=alertmanager
ALERTMANAGER_DISCOVERY_PORT Port to use for discovered services 9093
ALERTMANAGER_DISCOVERY_NAMESPACES Comma-separated list of preferred namespaces to search first monitoring,default
ALERTMANAGER_AUTH_TYPE Authentication type: none, basic, or bearer none
ALERTMANAGER_USERNAME Username for basic auth -
ALERTMANAGER_PASSWORD Password for basic auth -
ALERTMANAGER_BEARER_TOKEN Bearer token for token auth -

Auto-Discovery Behavior:

  • When ALERTMANAGER_URL is not set, the application will automatically search for Alertmanager services across all namespaces
  • Discovery searches first in preferred namespaces (monitoring, default by default), then all other namespaces
  • Services are matched by label selector (app=alertmanager) or by name pattern (alertmanager)
  • The first matching service found is used
  • All discovered services are logged for visibility

Sync Configuration

Variable Description Default
SYNC_ANNOTATION_PREFIX Prefix for annotations linking silences and tickets silence-manager
SYNC_EXPIRY_THRESHOLD_HOURS Hours before expiry to extend silence 24
SYNC_EXTENSION_DURATION_HOURS Hours to extend silence by 168 (7 days)
SYNC_DEFAULT_SILENCE_DURATION_HOURS Default duration for new silences 168 (7 days)
SYNC_CHECK_ALERTS Check for refired alerts true

Metrics Configuration (Optional)

Silence Manager can optionally publish metrics to either a Prometheus Pushgateway or an OpenTelemetry Collector. Metrics publishing is disabled by default.

Variable Description Default
METRICS_ENABLED Enable metrics publishing false
METRICS_BACKEND Metrics backend: pushgateway or otel (required if enabled)
METRICS_URL Metrics backend URL (optional if auto-discovery is enabled) (empty - auto-discovery)
METRICS_PUSHGATEWAY_JOB_NAME Job name for Pushgateway silence_manager
METRICS_OTEL_INSECURE Use insecure connection for OTel true
METRICS_DISCOVERY_SERVICE_NAME Service name pattern for discovery (backend-specific)
METRICS_DISCOVERY_SERVICE_LABEL Label selector for discovery (backend-specific)
METRICS_DISCOVERY_PORT Port for discovered services (backend-specific)
METRICS_DISCOVERY_NAMESPACES Comma-separated list of preferred namespaces monitoring,default

Published Metrics:

Metric Type Labels Description
silence_manager_build_info Gauge version, commit, build_date Build information for silence-manager
silence_manager_silence_last_checked Gauge silence_id, ticket Unix timestamp of when a silence was last checked
silence_manager_silence_expiring_in Gauge silence_id, ticket Seconds until a silence expires

Auto-Discovery for Metrics Backends:

When METRICS_URL is not set and metrics are enabled, the application will automatically search for metrics backend services:

  • Pushgateway: Searches for services with label app=pushgateway or name containing pushgateway on port 9091
  • OTel Collector: Searches for services with label app=opentelemetry-collector or name containing otel-collector on port 4318 (OTLP HTTP)

Example Configuration:

# Enable metrics with Pushgateway auto-discovery
metrics-enabled: "true"
metrics-backend: "pushgateway"
# metrics-url not set - auto-discovery will be used

# Or with explicit URL
metrics-enabled: "true"
metrics-backend: "pushgateway"
metrics-url: "http://pushgateway.monitoring.svc.cluster.local:9091"

# Or with OpenTelemetry Collector
metrics-enabled: "true"
metrics-backend: "otel"
metrics-url: "otel-collector.monitoring.svc.cluster.local:4318"
metrics-otel-insecure: "true"

Building

Local Build

go build -o silence-manager ./cmd/silence-manager

Docker Build

docker build -t silence-manager:latest .

For Kubernetes

# Build and tag for your registry
docker build -t your-registry/silence-manager:latest .
docker push your-registry/silence-manager:latest

Installation

Using Pre-built Binaries

Download the latest release for your platform from the Releases page:

# Example for Linux amd64
wget https://github.com/conallob/silence-manager/releases/download/v0.1.0/silence-manager_Linux_x86_64.tar.gz
tar -xzf silence-manager_Linux_x86_64.tar.gz
./silence-manager

Using Container Images

Pre-built multi-arch container images are available from GitHub Container Registry:

# Pull the latest version
docker pull ghcr.io/conallob/silence-manager:latest

# Or a specific version
docker pull ghcr.io/conallob/silence-manager:v0.1.0

Supported architectures: amd64, arm64

Using Go Install

go install github.com/conallob/silence-manager/cmd/silence-manager@latest

Releasing

This project uses GoReleaser for automated releases. To create a new release:

  1. Create and push a new tag:

    git tag -a v0.1.0 -m "Release v0.1.0"
    git push origin v0.1.0
  2. GitHub Actions will automatically:

    • Build binaries for multiple platforms (Linux, macOS, Windows)
    • Build multi-arch container images (amd64, arm64)
    • Push container images to GitHub Container Registry
    • Create a GitHub release with artifacts and release notes
  3. The release will be available at:

    • GitHub Releases: https://github.com/conallob/silence-manager/releases
    • Container Registry: ghcr.io/conallob/silence-manager:VERSION

Deployment

1. Create Namespace

kubectl create namespace monitoring

2. Create Secret

You have multiple options for managing secrets:

Option A: kubectl (Simple)

Create a secret with your credentials:

kubectl create secret generic silence-manager-secrets \
  --from-literal=jira-url=https://yourcompany.atlassian.net \
  [email protected] \
  --from-literal=jira-api-token=your-api-token \
  -n monitoring

If your Alertmanager requires authentication:

# For basic auth
kubectl create secret generic silence-manager-secrets \
  --from-literal=jira-url=https://yourcompany.atlassian.net \
  [email protected] \
  --from-literal=jira-api-token=your-api-token \
  --from-literal=alertmanager-username=admin \
  --from-literal=alertmanager-password=your-password \
  -n monitoring

# For bearer token auth
kubectl create secret generic silence-manager-secrets \
  --from-literal=jira-url=https://yourcompany.atlassian.net \
  [email protected] \
  --from-literal=jira-api-token=your-api-token \
  --from-literal=alertmanager-bearer-token=your-bearer-token \
  -n monitoring

Option B: External Secrets Operator (Recommended for Production)

For production environments, use External Secrets Operator to sync secrets from your secret management system (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, GCP Secret Manager, etc.):

  1. Install External Secrets Operator:

    helm repo add external-secrets https://charts.external-secrets.io
    helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
  2. Create a SecretStore pointing to your secrets backend (example for AWS Secrets Manager):

    apiVersion: external-secrets.io/v1beta1
    kind: SecretStore
    metadata:
      name: aws-secretsmanager
      namespace: monitoring
    spec:
      provider:
        aws:
          service: SecretsManager
          region: us-east-1
          auth:
            jwt:
              serviceAccountRef:
                name: external-secrets-sa
  3. Create an ExternalSecret resource (see deployments/externalsecret.yaml.example)

  4. The operator will automatically create and sync the silence-manager-secrets Kubernetes secret

3. Configure Settings

Edit deployments/configmap.yaml to set your desired configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: silence-manager-config
  namespace: monitoring
data:
  # Alertmanager Configuration
  alertmanager-auth-type: "none"  # Options: "none", "basic", "bearer"

  # Jira Configuration
  jira-project-key: "YOUR-PROJECT-KEY"

  # Sync Configuration
  sync-annotation-prefix: "silence-manager"
  sync-expiry-threshold-hours: "24"
  sync-extension-duration-hours: "168"
  sync-default-silence-duration-hours: "168"
  sync-check-alerts: "true"

  # Metrics Configuration (Optional - commented out by default)
  # metrics-enabled: "true"
  # metrics-backend: "pushgateway"
  # metrics-url: "http://pushgateway.monitoring.svc.cluster.local:9091"

Set alertmanager-auth-type to:

  • "none" - No authentication (default)
  • "basic" - Basic authentication (requires alertmanager-username and alertmanager-password in secret)
  • "bearer" - Bearer token authentication (requires alertmanager-bearer-token in secret)

4. Deploy with Kustomize

kubectl apply -k deployments/

This will deploy:

  • ServiceAccount for the CronJob
  • ClusterRole and ClusterRoleBinding for Kubernetes service discovery
  • ConfigMap with configuration settings
  • CronJob that runs every 15 minutes

Or apply manifests individually:

kubectl apply -f deployments/serviceaccount.yaml
kubectl apply -f deployments/clusterrole.yaml
kubectl apply -f deployments/clusterrolebinding.yaml
kubectl apply -f deployments/configmap.yaml
kubectl apply -f deployments/cronjob.yaml

Note: The default configuration uses auto-discovery to find Alertmanager services across all namespaces. If you prefer to specify an explicit URL, uncomment and set the ALERTMANAGER_URL environment variable in deployments/cronjob.yaml.

5. Update CronJob Image

Update the image in deployments/cronjob.yaml to point to your container registry:

containers:
- name: silence-manager
  image: your-registry/silence-manager:latest

Usage

Creating Linked Silences and Tickets

To link a silence with a ticket, include the ticket reference in the silence comment using the format:

# silence-manager: PROJECT-123
<additional comment>

The prefix (silence-manager by default) can be customized using the SYNC_ANNOTATION_PREFIX environment variable. The synchronizer will automatically extract the ticket reference and manage the silence accordingly.

Manual Trigger

To manually trigger a sync run for testing:

kubectl create job --from=cronjob/silence-manager manual-sync-1 -n monitoring

View Logs

# View CronJob logs
kubectl logs -l app=silence-manager -n monitoring --tail=100

# View specific job logs
kubectl logs job/silence-manager-<timestamp> -n monitoring

How It Works

Synchronization Logic

The synchronizer runs on a schedule (default: every 15 minutes) and performs the following:

  1. Retrieve all active silences from Alertmanager
  2. For each silence with a ticket reference:
    • Fetch the associated ticket from Jira
    • If ticket is resolved: Delete the silence
    • If ticket is open and silence expires soon: Extend the silence
    • If ticket is open and silence has expired: Extend the silence
  3. Check for refired alerts (if enabled):
    • Retrieve all active alerts from Alertmanager
    • If an alert has a ticket reference and the ticket is closed: Reopen the ticket and create a new silence

Ticket-Silence Coupling

The coupling between silences and tickets is maintained through annotations with a configurable prefix (default: silence-manager):

  • Silence comments contain ticket references: # silence-manager: PROJECT-123
  • Ticket descriptions contain silence references: silence-manager: <silence-id>

The prefix can be customized using the SYNC_ANNOTATION_PREFIX environment variable.

Extending the Application

Adding a New Ticket System

  1. Implement the ticket.TicketSystem interface in pkg/ticket/
  2. Add configuration for the new system in pkg/config/
  3. Update cmd/silence-manager/main.go to instantiate the new implementation

Adding a New Alertmanager Implementation

  1. Implement the alertmanager.AlertManager interface in pkg/alertmanager/
  2. Add configuration for the new system in pkg/config/
  3. Update cmd/silence-manager/main.go to instantiate the new implementation

Troubleshooting

Silence Not Being Extended

  • Check that the ticket reference is in the correct format in the silence comment
  • Verify the ticket exists and is accessible with the provided credentials
  • Check the logs for any errors

Ticket Not Being Reopened

  • Ensure SYNC_CHECK_ALERTS is set to true
  • Verify alerts have the ticket label set
  • Check that the Jira workflow allows transitions from the ticket's current state

Authentication Errors

  • Verify Jira API token is valid
  • Ensure the username matches the API token owner
  • Check that the user has appropriate permissions in the Jira project

Development

Running Tests

go test ./...

Running Locally

# Set environment variables
export JIRA_URL="https://yourcompany.atlassian.net"
export JIRA_USERNAME="[email protected]"
export JIRA_API_TOKEN="your-api-token"
export JIRA_PROJECT_KEY="OPS"
export ALERTMANAGER_URL="http://localhost:9093"

# Run the application
go run ./cmd/silence-manager

License

[Add your license here]

Contributing

[Add contribution guidelines here]

About

A small utility to couple an Alertmanager silence and a tracking ticket in sync

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •