Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024/add/tempo #910

Draft
wants to merge 80 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
f0d8cf0
wip
mrnicegyu11 Sep 19, 2024
e906b41
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
14c751d
Merge remote-tracking branch 'upstream/main' into main
mrnicegyu11 Oct 23, 2024
293f63c
Add csi-s3 and have portainer use it
mrnicegyu11 Oct 24, 2024
f7f72ec
Change request @hrytsuk 1GB max portainer volume size
mrnicegyu11 Oct 25, 2024
94cfb76
t push
mrnicegyu11 Oct 28, 2024
509c717
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Oct 29, 2024
1a65ecf
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 13, 2024
77ee45e
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Nov 25, 2024
20569c7
Fix wrong filename
mrnicegyu11 Nov 26, 2024
b2d13b7
Fix registry local deploy
mrnicegyu11 Nov 27, 2024
28660ac
Traefik local deployment fixes
mrnicegyu11 Nov 27, 2024
65907fc
Fix local deployment graylog provisioning
mrnicegyu11 Nov 27, 2024
0961600
Fix j2, double venv
mrnicegyu11 Nov 28, 2024
541df1c
Add python version
mrnicegyu11 Nov 28, 2024
c92ac11
Idempotency for admin-panels
mrnicegyu11 Dec 2, 2024
b3b3ae1
Remove faulty command
mrnicegyu11 Dec 2, 2024
36b193b
Local deploy fixes
mrnicegyu11 Dec 2, 2024
cd22e09
Clean Up Local Minio
mrnicegyu11 Dec 3, 2024
1369e50
init work
mrnicegyu11 Dec 3, 2024
c2c0440
Remove unused code
mrnicegyu11 Dec 3, 2024
511dc0f
Update Minio
mrnicegyu11 Dec 3, 2024
3c2ff2b
Merge branch '2024/makeLocalDeployWorkAgain' into 2024/add/tempo
mrnicegyu11 Dec 3, 2024
c9c70d6
Arch Linux Certificates Customization
mrnicegyu11 Dec 3, 2024
7b8be53
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 5, 2024
bcd61cd
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 12, 2024
c682845
Merge remote-tracking branch 'upstream/main' into 2024/add/tempo
mrnicegyu11 Dec 13, 2024
58e1030
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 Dec 13, 2024
5b1c3fb
Add grafana terrform tooling
mrnicegyu11 Dec 16, 2024
00bc0c7
Make osparc-config dotenv-precommit pass: Use all caps env-vars
mrnicegyu11 Dec 16, 2024
2e009bc
Refactoring: jinja2 takes .env file path as explicit argument (like i…
mrnicegyu11 Dec 16, 2024
8903007
Make CI_ENV_FILE vailable in makefile
mrnicegyu11 Dec 16, 2024
f520e7c
Refactor makefile targets
mrnicegyu11 Dec 16, 2024
764bb22
Add grafana terraform gitignore
mrnicegyu11 Dec 16, 2024
ead277a
Rename envvar: TF_STATE_S3_GRAFANAKEY
mrnicegyu11 Dec 16, 2024
13f92ba
Remove old scripts, makefile targets
mrnicegyu11 Dec 18, 2024
ebe66c0
Remove unused files
mrnicegyu11 Dec 18, 2024
fcab094
undue arch style commit
mrnicegyu11 Dec 18, 2024
2ba8070
Remove references to Tempo
mrnicegyu11 Dec 18, 2024
015cef0
Merge remote-tracking branch 'upstream/main' into 2024/add/grafanaTer…
mrnicegyu11 Jan 3, 2025
ebb9c2c
CHange request YH: Stop trying tor ecah grafana eventually
mrnicegyu11 Jan 3, 2025
a929e35
Change request YH: Move tf scripts to terraform folder
mrnicegyu11 Jan 3, 2025
989eaa3
Change request YH: stricter check
mrnicegyu11 Jan 3, 2025
75c79fe
Add files remove typo
mrnicegyu11 Jan 3, 2025
dc76837
Merge branch 'main' into 2024/add/grafanaTerraform
mrnicegyu11 Jan 6, 2025
bb59907
Add terraform fmt pre-commit hook
mrnicegyu11 Jan 3, 2025
c52b3f9
Use ansible.env file in lieu of ci.env if available
mrnicegyu11 Jan 3, 2025
c2c748f
Rename and refactor
mrnicegyu11 Jan 7, 2025
3404851
Merge remote-tracking branch 'upstream/main' into 2024/add/grafanaTer…
mrnicegyu11 Jan 8, 2025
73c384f
wip
mrnicegyu11 Jan 9, 2025
0aa010e
wip
mrnicegyu11 Jan 9, 2025
9f7be0b
remove line
mrnicegyu11 Jan 10, 2025
8eb2e73
Merge branch 'main' into 2024/add/grafanaTerraform
mrnicegyu11 Jan 13, 2025
ac627ea
Makefile repo base dir without git
mrnicegyu11 Jan 14, 2025
dc08b16
Grafana terraform ceph fixes
mrnicegyu11 Jan 14, 2025
22edf67
Fix indentation
mrnicegyu11 Jan 10, 2025
51e23c4
Add manual to traefik redirect capture all rule (#933)
YuryHrytsuk Jan 16, 2025
8728f6c
Introduce rolling docker config / secret update concept :tada: :rocke…
YuryHrytsuk Jan 30, 2025
ef8bf6b
Update traefik router hardcoded priorities (#953)
YuryHrytsuk Jan 30, 2025
bcecab7
Configure redis replicas via ENV (#957)
YuryHrytsuk Jan 31, 2025
e055bbd
Filestash: remove special docker node label (#959)
YuryHrytsuk Feb 3, 2025
998081a
rabbit: configurable replicas (#964)
YuryHrytsuk Feb 3, 2025
83e21a8
💄 minor: Change DNS Server to Quad9 (#967)
mrnicegyu11 Feb 5, 2025
0e86628
single replica (#968)
sanderegg Feb 10, 2025
54be62a
Remove docker api proxy from validate simcore settings (#972)
YuryHrytsuk Feb 13, 2025
9001e23
Add appmotiongateway add dalco
mrnicegyu11 Feb 26, 2025
f5d5e63
Add appmotiongateway add dalco - 2
mrnicegyu11 Feb 26, 2025
721069e
Add appmotiongateway add dalco - 3
mrnicegyu11 Feb 26, 2025
80b24ff
Seperate dalco-staging: disable redis special handling (#976)
mrnicegyu11 Mar 3, 2025
bf5e264
Fix deploy ops failure
mrnicegyu11 Mar 4, 2025
5959bd5
Merge branch 'main' into 2024/add/grafanaTerraform
mrnicegyu11 Mar 5, 2025
7433f2a
Make curl in ensure_grafana_online_ timeout after 10s
mrnicegyu11 Mar 7, 2025
2da8740
Timeout in wait_graylog_is_online
mrnicegyu11 Mar 7, 2025
a87ee6b
Fix osparc.local pydantic validation failure director-v0
mrnicegyu11 Mar 7, 2025
aa4aca6
Merge branch '2024/add/grafanaTerraform' into 2024/add/tempo
mrnicegyu11 Mar 7, 2025
6aa441e
Merge remote-tracking branch 'upstream/main' into 2024/add/tempo
mrnicegyu11 Mar 11, 2025
ea2dde4
Move create tempo bucket function to monitoring stack makefile
mrnicegyu11 Mar 20, 2025
690b997
wip
mrnicegyu11 Mar 20, 2025
8cffcc8
Merge remote-tracking branch 'upstream/main' into 2024/add/tempo
mrnicegyu11 Mar 20, 2025
6635f75
fix faulty commit
mrnicegyu11 Mar 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ docs/_build
/services/monitoring/pgsql_query_exporter_config.yaml
/services/monitoring/docker-compose.yml
/services/monitoring/smokeping_prober_config.yaml

services/monitoring/tempo_config.yaml

# Simcore: Contains location of repo.config file on the machine and of the whole config directory
.config.location
Expand Down
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ down-maintenance: ## Stop the maintenance mode
fi \
,)


# Misc: info & clean
.PHONY: info info-vars info-local
info: ## Displays some important info
Expand Down
30 changes: 22 additions & 8 deletions services/monitoring/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,24 @@ REPO_BASE_DIR := $(abspath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))../..)
# TARGETS --------------------------------------------------
include ${REPO_BASE_DIR}/scripts/common.Makefile

define create-s3-bucket
# ensure bucket is available in S3...
@set -o allexport; \
source .env; \
echo Creating bucket "$${TEMPO_S3_BUCKET}";\
${REPO_BASE_DIR}/scripts/create-s3-bucket.bash "$${TEMPO_S3_BUCKET}" && \
set +o allexport; \
# bucket is available in S3
endef

.PHONY: up
up: .init .env config.prometheus ${TEMP_COMPOSE} ## Deploys or updates current stack "$(STACK_NAME)". If MONITORED_NETWORK is not specified, it will create an attachable network
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE} $(STACK_NAME)
$(MAKE) grafana-import

.PHONY: up-local
up-local: .init .env config.prometheus.simcore ${TEMP_COMPOSE}-local ## Deploys or updates current stack "$(STACK_NAME)". If MONITORED_NETWORK is not specified, it will create an attachable network
@$(create-s3-bucket)
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE}-local $(STACK_NAME)
$(MAKE) grafana-import

Expand Down Expand Up @@ -49,28 +60,28 @@ up-master: .init .env config.monitoring config.prometheus.ceph.simcore ${TEMP_C
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE}-master ${STACK_NAME}
$(MAKE) grafana-import

${TEMP_COMPOSE}: docker-compose.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}: docker-compose.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< > $@

${TEMP_COMPOSE}-letsencrypt-http: docker-compose.yml docker-compose.letsencrypt.http.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-letsencrypt-http: docker-compose.yml docker-compose.letsencrypt.http.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.http.yml > $@

${TEMP_COMPOSE}-letsencrypt-dns: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-letsencrypt-dns: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.dns.yml > $@

${TEMP_COMPOSE}-dalco: docker-compose.yml docker-compose.dalco.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-dalco: docker-compose.yml docker-compose.dalco.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.dalco.yml > $@

${TEMP_COMPOSE}-public: docker-compose.yml docker-compose.public.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-public: docker-compose.yml docker-compose.public.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.public.yml > $@

${TEMP_COMPOSE}-aws: docker-compose.yml docker-compose.aws.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-aws: docker-compose.yml docker-compose.aws.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.aws.yml > $@

${TEMP_COMPOSE}-master: docker-compose.yml docker-compose.master.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-master: docker-compose.yml docker-compose.master.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.master.yml > $@

${TEMP_COMPOSE}-local: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
${TEMP_COMPOSE}-local: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.dns.yml > $@

docker-compose.yml: docker-compose.yml.j2 .env .venv pgsql_query_exporter_config.yaml
Expand Down Expand Up @@ -137,6 +148,9 @@ pgsql_query_exporter_config.yaml: pgsql_query_exporter_config.yaml.j2 ${REPO_CON
smokeping_prober_config.yaml: smokeping_prober_config.yaml.j2 ${REPO_CONFIG_LOCATION} .env .venv
$(call jinja, $<, .env, $@);

tempo_config.yaml: tempo_config.yaml.j2 ${REPO_CONFIG_LOCATION} .env .venv
$(call jinja, $<, .env, $@);

.PHONY: grafana/assets
grafana/assets: ${REPO_CONFIG_LOCATION}
@$(MAKE_C) grafana assets
26 changes: 26 additions & 0 deletions services/monitoring/docker-compose.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ networks:
configs:
alertmanager_config:
file: ./alertmanager/config.yml
tempo_config:
file: ./tempo_config.yaml
node_exporter_entrypoint:
file: ./node-exporter/docker-entrypoint.sh
prometheus_config:
Expand Down Expand Up @@ -397,3 +399,27 @@ services:
reservations:
memory: 32M
cpus: "0.1"
tempo:
image: grafana/tempo:2.6.1
command: "-target=scalable-single-binary -config.file=/etc/tempo.yaml"
configs:
- source: tempo_config
target: /etc/tempo.yaml
networks:
- monitored
deploy:
labels:
- traefik.enable=true
- traefik.docker.network=${PUBLIC_NETWORK}
- traefik.http.services.tempo.loadbalancer.server.port=9095
- traefik.http.routers.tempo.rule=Host(`${MONITORING_DOMAIN}`) && PathPrefix(`/tempo`)
- traefik.http.routers.tempo.priority=10
- traefik.http.routers.tempo.entrypoints=https
- traefik.http.routers.tempo.tls=true
- traefik.http.middlewares.tempo_replace_regex.replacepathregex.regex=^/tempo/?(.*)$$
- traefik.http.middlewares.tempo_replace_regex.replacepathregex.replacement=/$${1}
- traefik.http.routers.tempo.middlewares=ops_whitelist_ips@swarm, ops_gzip@swarm, tempo_replace_regex
resources:
limits:
memory: 2000M
cpus: "2.0"
8 changes: 8 additions & 0 deletions services/monitoring/grafana/terraform/datasources.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,11 @@ resource "grafana_data_source" "prometheuscatchall" {
is_default = false
uid = "RmZEr52nz"
}

resource "grafana_data_source" "tempo" {
type = "tempo"
name = "tempo"
url = var.TEMPO_URL
basic_auth_enabled = false
is_default = false
}
6 changes: 3 additions & 3 deletions services/monitoring/grafana/terraform/main.tf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ terraform {
skip_credentials_validation = true
skip_requesting_account_id = true
skip_metadata_api_check = true
skip_region_validation = true
skip_s3_checksum = true
skip_region_validation = true
skip_s3_checksum = true
use_path_style = true
endpoints = {
endpoints = {
s3 = "{{ GRAFANA_TERRAFORM_STATE_BACKEND_S3_ENDPOINT }}"
}
{% endif %}
Expand Down
4 changes: 4 additions & 0 deletions services/monitoring/grafana/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ variable "GRAFANA_URL" {
description = "grafana_url"
sensitive = false
}
variable "TEMPO_URL" {
description = "tempo_url"
sensitive = false
}
variable "GRAFANA_AUTH" {
description = "Username:Password"
sensitive = true
Expand Down
6 changes: 6 additions & 0 deletions services/monitoring/template.env
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,9 @@ MONITORING_PROMETHEUS_PGSQL_GID_MONITORED=${MONITORING_PROMETHEUS_PGSQL_GID_MONI
MONITORING_PROMETHEUS_SMOKEPING_TARGETS=${MONITORING_PROMETHEUS_SMOKEPING_TARGETS}
PUBLIC_NETWORK=${PUBLIC_NETWORK}
MONITORED_NETWORK=${MONITORED_NETWORK}
TEMPO_S3_BUCKET=${TEMPO_S3_BUCKET}
STORAGE_DOMAIN=${STORAGE_DOMAIN}
S3_REGION=${S3_REGION}
S3_ACCESS_KEY=${S3_ACCESS_KEY}
S3_SECRET_KEY=${S3_SECRET_KEY}
TF_VAR_PROMETHEUS_CATCHALL_URL=${TF_VAR_PROMETHEUS_CATCHALL_URL}
52 changes: 52 additions & 0 deletions services/monitoring/tempo_config.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
server:
http_listen_port: 3200

distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
otlp:
protocols:
http:
grpc:

#ingester:
# max_block_duration: 5m # cut the headblock when this much time passes. this should probably be left alone normally

compactor:
compaction:
block_retention: 96h # overall Tempo trace retention.

metrics_generator:
registry:
external_labels:
source: tempo
cluster: {{ MACHINE_FQDN }}
storage:
path: /var/tempo/generator/wal
remote_write:
- url: {{ TF_VAR_PROMETHEUS_CATCHALL_URL }}/api/v1/write

storage:
trace:
backend: s3 # backend configuration to use
wal:
path: /var/tempo/wal # where to store the wal locally
s3:
bucket: {{ TEMPO_S3_BUCKET }} # how to store data in s3
endpoint: {{STORAGE_DOMAIN}}
region: {{S3_REGION}}
access_key: {{S3_ACCESS_KEY}}
secret_key: {{S3_SECRET_KEY}}
insecure: false
tls_insecure_skip_verify: true
# For using AWS, select the appropriate regional endpoint and region
# endpoint: s3.dualstack.us-west-2.amazonaws.com
# region: us-west-2

querier:
frontend_worker:
frontend_address: localhost:9095

overrides:
defaults:
metrics_generator:
processors: ['service-graphs', 'span-metrics']
Loading