This stack include:
- Loki
- Promtail
- Grafana
- Victoria Metrics Stack
- Alertmanager
- Kube-Bench Exporter
- Falco Exporter
- Trivy-Operator
Alerts From AlertManager
- Some alerts:
- Loki Alerts on Errors in Logs
- Some default alerts
- Kubernetes Node Not Ready
- Kubernetes Memory Pressure
- Kubernetes Disk Pressure
- Kubernetes Network Unavailable
- Kubernetes Out Of Capacity
- Kubernetes Container Oom Killer
- Kubernetes Job Failed
- Kubernetes Cronjob Suspended
- Kubernetes Persistentvolumeclaim Pending
- Kubernetes Volume Out Of Disk Space
- Kubernetes Volume Full In Four Days
- Kubernetes Persistentvolume Error
- Kubernetes Statefulset Down
- Kubernetes Hpa Scaling Ability
- Kubernetes Hpa Metric Availability
- Kubernetes Hpa Scale Capability
- Kubernetes Hpa Underutilized
- Kubernetes Pod Not Healthy
- Kubernetes Pod CrashLooping
- Kubernetes Replicasset Mismatch
- Kubernetes Deployment Replicas Mismatch
- Kubernetes Statefulset Replicas Mismatch
- Kubernetes Deployment Generation Mismatch
- Kubernetes Statefulset Generation Mismatch
- Kubernetes Statefulset Update Not RolledOut
- Kubernetes Daemonset Rollout Stuck
- Kubernetes Daemonset Misscheduled
- Kubernetes Cronjob Too Long
- Kubernetes Job Slow Completion
- Kubernetes Api Server Errors
- Kubernetes Api Client Errors
- Kubernetes Client Certificate Expires Next Week
- Kubernetes Client Certificate Expires Soon
- Kubernetes Api Server Latency
- Loki 5.. errors
- Severity level - Error
- Ledger Error
This is step-by-step how to install
Here are some details about install depends ..
2 CPU 4 GB RAM
v1.28.2
git clone https://github.com/chabanyknikita/security-monitoring-template.git
cd security-monitoring-template
helm repo add jetstack https://charts.jetstack.io
helm repo add stable https://charts.helm.sh/stable
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo add aqua https://aquasecurity.github.io/helm-charts/
helm repo update
grafana:
enabled: false
alertmanager:
enabled: false
vmalert:
enabled: false
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.8.0 \
--set installCRDs=true
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
Go to charts/monitoring/values.yaml and change in section victoria-metrics-k8s-stack.alertmanager.config this values on your:
chat_id: <Chat Id> "chat_id must be integer"
bot_token: <Bot Token> "bot_token must be string"
- You can get this values from:
KEYS | VALUES |
---|---|
TELEGRAM_ADMIN | Your chat id you can get from (@userinfobot) |
TELEGRAM_TOKEN | Your telegram bot token you can get from (@botfather) |
Only 1 user can see bot alerts
If you need access to grafana via domain:
Go to charts/monitoring/values.yaml in section victoria-metrics-k8s-stack.grafana.ingress and
- Change grafana.ingress.enabled to true
- Change grafana.ingress.hosts to your domain
- Change grafana.ingress.tls.hosts to your domain
Example:
ingress:
enabled: true
annotations:
certmanager.k8s.io/cluster-issuer: letsencrypt
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: "true"
pathType: ImplementationSpecific
hosts:
- grafana.example.com
tls:
- secretName: grafana-ingress-tls
hosts:
- grafana.example.com
- Got to charts/monitoring/values.yaml in section loki-distributed.ruler.directories and change in all rules namespace on your, which you want to follow
Example:
- alert: Error 5**
expr: rate({namespace="stage", container!="horizon"} |~ "status=5.." | logfmt | label_format duration=duration,time=time,filename=filename,pid=pid,stream=stream,node_name=node_name,app=app,instance=instance[1m])>0
for: 0m
labels:
severity: error
annotations:
summary: Error {{ $labels.status }} in {{ $labels.container }}
# Or you can follow more than 1 namespace:
- alert: Error 5**
expr: rate({namespace=~"monitoring|stage|prod"} |~ "status=5.." | logfmt | label_format duration=duration,time=time,filename=filename,pid=pid,stream=stream,node_name=node_name,app=app,instance=instance[1m])>0
for: 0m
labels:
severity: error
annotations:
summary: Error {{ $labels.status }} in {{ $labels.container }}
helm upgrade ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx \
--set controller.metrics.enabled=true \
--set-string controller.podAnnotations."prometheus\.io/scrape"="true" \
--set-string controller.podAnnotations."prometheus\.io/port"="10254"
helm upgrade -i nfs-server stable/nfs-server-provisioner --set persistence.enabled=true,persistence.size=20Gi -n monitoring --create-namespace
kubectl apply -f charts/monitoring/charts/crd/templates/crd.yaml
helm upgrade -i trivy-operator aqua/trivy-operator --namespace trivy-system --create-namespace --version 0.20.6 --values charts/trivy-operator/trivy-values.yaml
helm upgrade -i falco --set falco.grpc.enabled=true --set falco.grpc_output.enabled=true --set driver.kind=ebpf falcosecurity/falco
helm upgrade -i falco-exporter falcosecurity/falco-exporter
helm install event-generator falcosecurity/event-generator --namespace event-generator --create-namespace --set config.loop=false --set config.actions=""
helm upgrade -i monitoring charts/monitoring --values charts/monitoring/values.yaml -n monitoring
kubectl get secret --namespace monitoring stack-grafana \
-ojsonpath="{.data.admin-password}" | base64 --decode ; echo
-
Credentials:
- login: admin
- password: from previous step
-
If you enable ingress go to your domain and paste credentials
-
If you don't enable ingress do port-forwarding and go http://localhost:3000:
kubectl port-forward service/stack-grafana -n monitoring 3000:80