Skip to content

Commit

Permalink
Fbnoroot2 (#704)
Browse files Browse the repository at this point in the history
* [SECURITY] Eliminate need for root user for log collecting pods
* Enable multiline parsing by default
* Improve handling of messages from CrunchyData exporter containers
* FB as K8s Event collector - security tweaks
* OpenSearch pods: readOnlyRootFilesystem set to 'true'
* Tighten container security: FB - Event collection
* OpenSearch pods: set 'securityContext.privileged' to 'false'
* Tighten container security: ES Exporter
  • Loading branch information
gsmith-sas authored Jan 8, 2025
1 parent 7fc71a5 commit a195233
Show file tree
Hide file tree
Showing 27 changed files with 382 additions and 41 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ registry/repository/image_name:version

| Subsystem| Component | Fully Qualified Container-Image Name (registry/repository/image_name:version)|
|----|----|----|
| Logging | BusyBox (OpenSearch) | __OS_SYSCTL_FULL_IMAGE__ |
| Logging | Fluent Bit | __FB_FULL_IMAGE__ |
| Logging | Elasticsearch Exporter | __ES_EXPORTER_FULL_IMAGE__ |
| Logging | initContainer (Fluent Bit, OpenSearch) | __OS_SYSCTL_FULL_IMAGE__ |
| Logging | OpenSearch | __OS_FULL_IMAGE__ |
| Logging | OpenSearch Dashboards| __OSD_FULL_IMAGE__ |
| Metrics | Alertmanager | __ALERTMANAGER_FULL_IMAGE__ |
Expand Down
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# SAS Viya Monitoring for Kubernetes

## Unreleased
* **Logging**
* [SECURITY] Fluent Bit log collecting pods no longer run as `root` user. In addition, the database used to
maintain state information for the log collector has moved to a hostPath volume and been renamed. A new initContainer
has been added to handle migrating any existing state information and make adjustments to file ownership/permissions.
NOTE: This initContainer runs under as `root` user but only runs briefly during the initial deployment process.
* [SECURITY] OpenSearch pods has been reconfigured to allow `readOnlyRootFilesystem` to be set to 'true'. A
new initContainer has been added to facilitate this.
* [SECURITY] Runtime security controls for log monitoring stack (i.e. Fluent Bit, OpenSearch, OpenSearch
Dashboards and Elasticsearch Exporter) pods have been tightened. Changes include: adding seccompProfile;
and disallowing privileged containers, privilege escalation and removing all Linux capabilities. As noted
above, some initContainers require less restrictive security but these only run briefly during the initial
deployment process.
* [SECURITY] On OpenShift, all Fluent Bit pods now use custom SCC objects to support changes described above.
* [CHANGE] Improved handling of long log messages and those from some Crunchy Data pods


## Version 1.2.32 (09DEC2024)
* **Overall**
* [CHANGE] Comments added to user.env files within samples/generic-base to clarify security best-practices; other
Expand All @@ -8,6 +25,7 @@ cleanup.
* [SECURITY] Set `seccompProfile` to `RuntimeDefault` for OpenSearch, OpenSearch Dashboards and Fluent Bit pods in
non-OpenShift environments.


## Version 1.2.31 (15NOV2024)
* **Logging**
* [UPGRADE] OpenSearch and OpenSearch Dashboards upgraded from 2.15.0 to 2.17.1
Expand All @@ -16,6 +34,7 @@ required a new serviceMonitor (elasticsearch-v2) be deployed.
* [UPGRADE] Fluent Bit upgraded from 3.1.3 to 3.1.9
* [UPGRADE] OpenSearch Data Source Plugin to Grafana upgraded from 2.18.0 to 2.21.1


## Version 1.2.30 (11OCT2024)
* **Logging**
* [SECURITY] OpenSearch Dashboards pod `securityContext` updated to set allowPrivilegeEscalation to 'false'
Expand Down
9 changes: 7 additions & 2 deletions bin/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,11 @@ function generateImageKeysFile {

#arg1 Full container image
#arg2 name of template file
#arg3 prefix to insert in placeholders (optional)
#arg3 prefix to insert in placeholders (optional; defaults to "")
#arg4 flag to override omit_image_key logic (optional; defaults to "false")

#NOTE: arg4 is required to handle 2 initContainers (for OpenSearch and Fluent Bit)
# for which the template file contains settings other than image specs

local pullsecret_text

Expand All @@ -299,6 +303,7 @@ function generateImageKeysFile {
fi

prefix=${3:-""}
ignoreOmitImageKeys=${4:-"false"}

imageKeysFile="$TMP_DIR/imageKeysFile.yaml"
template_file=$2
Expand All @@ -310,7 +315,7 @@ function generateImageKeysFile {
log_debug "Modifying an existing imageKeysFile"
fi

if [ "$V4M_OMIT_IMAGE_KEYS" == "true" ]; then
if [ "$V4M_OMIT_IMAGE_KEYS" == "true" ] && [ "$ignoreOmitImageKeys" != "true" ]; then
cp $TMP_DIR/empty.yaml $imageKeysFile
return 0
fi
Expand Down
1 change: 1 addition & 0 deletions component_versions.env
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ FLUENTBIT_HELM_CHART_REPO=fluent
FLUENTBIT_HELM_CHART_NAME=fluent-bit
FLUENTBIT_HELM_CHART_VERSION=0.47.10
FB_FULL_IMAGE="cr.fluentbit.io/fluent/fluent-bit:3.1.9"
FB_INITCONTAINER_FULL_IMAGE="docker.io/library/busybox:latest"

#OpenSearch
OPENSEARCH_HELM_CHART_REPO=opensearch
Expand Down
40 changes: 28 additions & 12 deletions logging/bin/deploy_fluentbit_azmonitor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ fi
log_info "Deploying Fluent Bit (Azure Monitor)"

#Generate yaml file with all container-related keys#Generate yaml file with all container-related keys
generateImageKeysFile "$FB_FULL_IMAGE" "logging/fb/fb_container_image.template"
generateImageKeysFile "$FB_FULL_IMAGE" "logging/fb/fb_container_image.template"
generateImageKeysFile "$FB_INITCONTAINER_FULL_IMAGE" "logging/fb/fb_initcontainer_image.template" "" "true"

# Fluent Bit user customizations
FB_AZMONITOR_USER_YAML="${FB_AZMONITOR_USER_YAML:-$USER_DIR/logging/user-values-fluent-bit-azmonitor.yaml}"
Expand All @@ -66,22 +67,22 @@ if [ "$(kubectl -n $LOG_NS get secret connection-info-azmonitor -o name 2>/dev/n

if [ "$AZMONITOR_CUSTOMER_ID" != "NotProvided" ] && [ "$AZMONITOR_SHARED_KEY" != "NotProvided" ]; then
log_info "Creating secret [connection-info-azmonitor] in [$LOG_NS] namespace to hold Azure connection information."
kubectl -n $LOG_NS create secret generic connection-info-azmonitor --from-literal=customer_id=$AZMONITOR_CUSTOMER_ID --from-literal=shared_key=$AZMONITOR_SHARED_KEY
kubectl -n "$LOG_NS" create secret generic connection-info-azmonitor --from-literal=customer_id="$AZMONITOR_CUSTOMER_ID" --from-literal=shared_key="$AZMONITOR_SHARED_KEY"
else
log_error "Unable to create secret [$LOG_NS/connection-info-azmonitor] because missing required information: [AZMONITOR_CUSTOMER_ID: $AZMONITOR_CUSTOMER_ID ; AZMONITOR_SHARED_KEY: $AZMONITOR_SHARED_KEY]."
log_error "You must provide this information via environment variables or create the secret [connection-info-azmonitor] before running this script."
exit 1
fi
else
log_info "Obtaining connection information from existing secret [$LOG_NS/connection-info-azmonitor]"
export AZMONITOR_CUSTOMER_ID=$(kubectl -n $LOG_NS get secret connection-info-azmonitor -o=jsonpath="{.data.customer_id}" |base64 --decode)
export AZMONITOR_SHARED_KEY=$(kubectl -n $LOG_NS get secret connection-info-azmonitor -o=jsonpath="{.data.shared_key}" |base64 --decode)
export AZMONITOR_CUSTOMER_ID=$(kubectl -n "$LOG_NS" get secret connection-info-azmonitor -o=jsonpath="{.data.customer_id}" |base64 --decode)
export AZMONITOR_SHARED_KEY=$(kubectl -n "$LOG_NS" get secret connection-info-azmonitor -o=jsonpath="{.data.shared_key}" |base64 --decode)
fi

# Check for an existing Helm release of stable/fluent-bit
if helm3ReleaseExists fbaz $LOG_NS; then
log_info "Removing an existing release of deprecated stable/fluent-bit Helm chart from from the [$LOG_NS] namespace [$(date)]"
helm $helmDebug delete -n $LOG_NS fbaz
helm $helmDebug delete -n "$LOG_NS" fbaz

if [ $(kubectl get servicemonitors -A |grep fluent-bit-v2 -c) -ge 1 ]; then
log_debug "Updated serviceMonitor [fluent-bit-v2] appears to be deployed."
Expand All @@ -94,19 +95,19 @@ else
fi

# Multiline parser setup
LOG_MULTILINE_ENABLED="${LOG_MULTILINE_ENABLED}"
LOG_MULTILINE_ENABLED="${LOG_MULTILINE_ENABLED:-true}"
if [ "$LOG_MULTILINE_ENABLED" == "true" ]; then
LOG_MULTILINE_PARSER="docker, cri"
else
LOG_MULTILINE_PARSER=""
fi

# Create ConfigMap containing Fluent Bit configuration
kubectl -n $LOG_NS apply -f $FB_CONFIGMAP
kubectl -n "$LOG_NS" apply -f $FB_CONFIGMAP

# Create ConfigMap containing Viya-customized parsers (delete it first)
kubectl -n $LOG_NS delete configmap fbaz-viya-parsers --ignore-not-found
kubectl -n $LOG_NS create configmap fbaz-viya-parsers --from-file=logging/fb/viya-parsers.conf
kubectl -n "$LOG_NS" delete configmap fbaz-viya-parsers --ignore-not-found
kubectl -n "$LOG_NS" create configmap fbaz-viya-parsers --from-file=logging/fb/viya-parsers.conf

TRACING_ENABLE="${TRACING_ENABLE:-false}"
if [ "$TRACING_ENABLE" == "true" ]; then
Expand Down Expand Up @@ -146,13 +147,25 @@ fi
MON_NS="${MON_NS:-monitoring}"

# Create ConfigMap containing Kubernetes container runtime log format
kubectl -n $LOG_NS delete configmap fbaz-env-vars --ignore-not-found
kubectl -n $LOG_NS create configmap fbaz-env-vars \
kubectl -n "$LOG_NS" delete configmap fbaz-env-vars --ignore-not-found
kubectl -n "$LOG_NS" create configmap fbaz-env-vars \
--from-literal=KUBERNETES_RUNTIME_LOGFMT=$KUBERNETES_RUNTIME_LOGFMT \
--from-literal=LOG_MULTILINE_PARSER="${LOG_MULTILINE_PARSER}" \
--from-literal=MON_NS="${MON_NS}"

kubectl -n $LOG_NS label configmap fbaz-env-vars managed-by=v4m-es-script
kubectl -n "$LOG_NS" label configmap fbaz-env-vars managed-by=v4m-es-script

# Check to see if we are upgrading from earlier version requiring root access
if [ "$( kubectl -n $LOG_NS get configmap fbaz-dbmigrate-script -o name --ignore-not-found)" != "configmap/fbaz-dbmigrate-script" ]; then
log_debug "Removing FB pods (if they exist) to allow migration."
kubectl -n "$LOG_NS" delete daemonset v4m-fbaz --ignore-not-found
fi

# Create ConfigMap containing Fluent Bit database migration script
kubectl -n "$LOG_NS" delete configmap fbaz-dbmigrate-script --ignore-not-found
kubectl -n "$LOG_NS" create configmap fbaz-dbmigrate-script --from-file logging/fb/migrate_fbstate_db.sh
kubectl -n "$LOG_NS" label configmap fbaz-dbmigrate-script managed-by=v4m-es-script


## Get Helm Chart Name
log_debug "Fluent Bit Helm Chart: repo [$FLUENTBIT_HELM_CHART_REPO] name [$FLUENTBIT_HELM_CHART_NAME] version [$FLUENTBIT_HELM_CHART_VERSION]"
Expand All @@ -170,9 +183,12 @@ helm $helmDebug upgrade --install v4m-fbaz --namespace $LOG_NS \
--set fullnameOverride=v4m-fbaz \
$chart2install

#pause to allow migration script to complete (if necessary)
sleep 20

#Container Security: Disable Token Automounting at ServiceAccount; enable for Pod
disable_sa_token_automount $LOG_NS v4m-fbaz
# FB pods will restart after following call if automount is not already enabled
enable_pod_token_automount $LOG_NS daemonset v4m-fbaz

# Force restart of daemonset to ensure we pick up latest config changes
Expand Down
35 changes: 26 additions & 9 deletions logging/bin/deploy_fluentbit_opensearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ helm2ReleaseCheck fb-$LOG_NS
# Check for an existing Helm release of stable/fluent-bit
if helm3ReleaseExists fb $LOG_NS; then
log_verbose "Removing an existing release of deprecated stable/fluent-bit Helm chart from from the [$LOG_NS] namespace [$(date)]"
helm $helmDebug delete -n $LOG_NS fb
helm $helmDebug delete -n "$LOG_NS" fb

if [ $(kubectl get servicemonitors -A |grep fluent-bit-v2 -c) -ge 1 ]; then
log_debug "Updated serviceMonitor [fluent-bit-v2] appears to be deployed."
Expand All @@ -68,7 +68,8 @@ else
fi

#Generate yaml file with all container-related keys
generateImageKeysFile "$FB_FULL_IMAGE" "logging/fb/fb_container_image.template"
generateImageKeysFile "$FB_FULL_IMAGE" "logging/fb/fb_container_image.template"
generateImageKeysFile "$FB_INITCONTAINER_FULL_IMAGE" "logging/fb/fb_initcontainer_image.template" "" "true"

# Fluent Bit user customizations
FB_OPENSEARCH_USER_YAML="${FB_OPENSEARCH_USER_YAML:-$USER_DIR/logging/user-values-fluent-bit-opensearch.yaml}"
Expand Down Expand Up @@ -98,19 +99,19 @@ fi
log_debug "Using FB ConfigMap:" $FB_CONFIGMAP

# Multiline parser setup
LOG_MULTILINE_ENABLED=${LOG_MULTILINE_ENABLED}
LOG_MULTILINE_ENABLED=${LOG_MULTILINE_ENABLED:-true}
if [ "$LOG_MULTILINE_ENABLED" == "true" ]; then
LOG_MULTILINE_PARSER="docker, cri"
else
LOG_MULTILINE_PARSER=""
fi

# Create ConfigMap containing Fluent Bit configuration
kubectl -n $LOG_NS apply -f $FB_CONFIGMAP
kubectl -n "$LOG_NS" apply -f $FB_CONFIGMAP

# Create ConfigMap containing Viya-customized parsers (delete it first)
kubectl -n $LOG_NS delete configmap fb-viya-parsers --ignore-not-found
kubectl -n $LOG_NS create configmap fb-viya-parsers --from-file=logging/fb/viya-parsers.conf
kubectl -n "$LOG_NS" delete configmap fb-viya-parsers --ignore-not-found
kubectl -n "$LOG_NS" create configmap fb-viya-parsers --from-file=logging/fb/viya-parsers.conf

TRACING_ENABLE="${TRACING_ENABLE:-false}"
if [ "$TRACING_ENABLE" == "true" ]; then
Expand Down Expand Up @@ -150,14 +151,25 @@ fi
MON_NS="${MON_NS:-monitoring}"

# Create ConfigMap containing Kubernetes container runtime log format
kubectl -n $LOG_NS delete configmap fb-env-vars --ignore-not-found
kubectl -n $LOG_NS create configmap fb-env-vars \
kubectl -n "$LOG_NS" delete configmap fb-env-vars --ignore-not-found
kubectl -n "$LOG_NS" create configmap fb-env-vars \
--from-literal=KUBERNETES_RUNTIME_LOGFMT="$KUBERNETES_RUNTIME_LOGFMT" \
--from-literal=LOG_MULTILINE_PARSER="${LOG_MULTILINE_PARSER}" \
--from-literal=SEARCH_SERVICENAME="${ES_SERVICENAME}" \
--from-literal=MON_NS="${MON_NS}"

kubectl -n $LOG_NS label configmap fb-env-vars managed-by=v4m-es-script
kubectl -n "$LOG_NS" label configmap fb-env-vars managed-by=v4m-es-script

# Check to see if we are upgrading from earlier version requiring root access
if [ "$( kubectl -n $LOG_NS get configmap fb-dbmigrate-script -o name --ignore-not-found)" != "configmap/fb-dbmigrate-script" ]; then
log_debug "Removing FB pods (if they exist) to allow migration."
kubectl -n "$LOG_NS" delete daemonset v4m-fb --ignore-not-found
fi

# Create ConfigMap containing Fluent Bit database migration script
kubectl -n "$LOG_NS" delete configmap fb-dbmigrate-script --ignore-not-found
kubectl -n "$LOG_NS" create configmap fb-dbmigrate-script --from-file logging/fb/migrate_fbstate_db.sh
kubectl -n "$LOG_NS" label configmap fb-dbmigrate-script managed-by=v4m-es-script

## Get Helm Chart Name
log_debug "Fluent Bit Helm Chart: repo [$FLUENTBIT_HELM_CHART_REPO] name [$FLUENTBIT_HELM_CHART_NAME] version [$FLUENTBIT_HELM_CHART_VERSION]"
Expand All @@ -176,8 +188,13 @@ helm $helmDebug upgrade --install --namespace $LOG_NS v4m-fb \
--set fullnameOverride=v4m-fb \
$chart2install

#pause to allow migration script to complete (if necessary)
log_debug "Pausing to allow migration script to complete"
sleep 20

#Container Security: Disable Token Automounting at ServiceAccount; enable for Pod
disable_sa_token_automount $LOG_NS v4m-fb
# FB pods will restart after following call if automount is not already enabled
enable_pod_token_automount $LOG_NS daemonset v4m-fb

# Force restart of daemonset to ensure we pick up latest config changes
Expand Down
1 change: 1 addition & 0 deletions logging/bin/deploy_opensearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ fi
#Generate yaml files with all container-related keys
generateImageKeysFile "$OS_FULL_IMAGE" "logging/opensearch/os_container_image.template"
generateImageKeysFile "$OS_SYSCTL_FULL_IMAGE" "$imageKeysFile" "OS_SYSCTL_"
generateImageKeysFile "$OS_FULL_IMAGE" "logging/opensearch/os_initcontainer_image.template" "" "true"

# get credentials
export ES_ADMIN_PASSWD=${ES_ADMIN_PASSWD}
Expand Down
14 changes: 14 additions & 0 deletions logging/bin/deploy_openshift_prereqs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,20 @@ fi
# link OpenSearch serviceAccounts to 'privileged' scc
oc adm policy add-scc-to-user privileged -z v4m-os -n $LOG_NS

# create the 'v4m-logging-v2' SCC, if it does not already exist
if oc get scc v4m-logging-v2 2>/dev/null 1>&2; then
log_info "Skipping scc creation; using existing scc [v4m-logging-v2]"
else
oc create -f logging/openshift/fb_v4m-logging-v2_scc.yaml
fi

# create the 'v4m-k8sevents' SCC, if it does not already exist
if oc get scc v4m-k8sevents 2>/dev/null 1>&2; then
log_info "Skipping scc creation; using existing scc [v4m-k8sevents]"
else
oc create -f logging/openshift/fb_v4m-k8sevents_scc.yaml
fi

log_info "OpenShift Prerequisites have been deployed."

log_debug "Script [$this_script] has completed [$(date)]"
Expand Down
2 changes: 1 addition & 1 deletion logging/bin/remove_fluentbit_azmonitor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ kubectl -n $LOG_NS delete configmap fbaz-fluent-bit-config --ignore-not-found
kubectl -n $LOG_NS delete configmap fbaz-viya-parsers --ignore-not-found
kubectl -n $LOG_NS delete configmap fbaz-viya-tracing --ignore-not-found
kubectl -n $LOG_NS delete configmap fbaz-env-vars --ignore-not-found

kubectl -n $LOG_NS delete configmap fbaz-dbmigrate-script --ignore-not-found

# Should we leave secret in place?
log_info "Removing Connection information (secret)"
Expand Down
1 change: 1 addition & 0 deletions logging/bin/remove_fluentbit_opensearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ kubectl -n $LOG_NS delete configmap fb-fluent-bit-config --ignore-not-found
kubectl -n $LOG_NS delete configmap fb-viya-parsers --ignore-not-found
kubectl -n $LOG_NS delete configmap fb-viya-tracing --ignore-not-found
kubectl -n $LOG_NS delete configmap fb-env-vars --ignore-not-found
kubectl -n $LOG_NS delete configmap fb-dbmigrate-script --ignore-not-found

log_debug "Script [$this_script] has completed [$(date)]"
echo ""
5 changes: 4 additions & 1 deletion logging/bin/remove_openshift_artifacts.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@ if [ "$OPENSHIFT_ARTIFACTS_REMOVE" != "true" ]; then
fi

# remove custom OpenShift SCC
oc delete scc v4mlogging --ignore-not-found
oc delete scc v4mlogging --ignore-not-found
oc delete scc v4m-logging-v2 --ignore-not-found
oc delete scc v4m-k8sevents --ignore-not-found



log_info "OpenShift Prerequisites have been removed."
Expand Down
3 changes: 3 additions & 0 deletions logging/esexporter/values-es-exporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,6 @@ prometheusRule:
annotations:
description: The heap usage is over 90% for 15m
summary: Elasticsearch node {{$labels.node}} heap usage is high

securityContext:
privileged: false
26 changes: 26 additions & 0 deletions logging/fb/fb_initcontainer_image.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
initContainers:
- name: chowner-v4m-fb-storage
image: __IMAGE_REPO_3LEVEL__:__IMAGE_TAG__
imagePullPolicy: IfNotPresent
command: ['sh', '-c', "./usr/bin/migrate_fbstate_db.sh"]
securityContext:
privileged: true
allowPrivilegeEscalation: true
readOnlyRootFilesystem: true
capabilities:
drop: ["all"]
add: ["CHOWN"]
runAsUser: 0
runAsNonRoot: false
volumeMounts:
- name: v4m-fb-storage
mountPath: /var/log/v4m-fb-storage
- name: dbmigrate-script
mountPath: /usr/bin/migrate_fbstate_db.sh
readOnly: false
subPath: migrate_fbstate_db.sh
- mountPath: /var/log
name: varlog
readOnly: true


Loading

0 comments on commit a195233

Please sign in to comment.