Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace loki-stack helm chart with loki + promtail #23

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mjtrangoni
Copy link
Contributor

@mjtrangoni mjtrangoni commented Jan 10, 2024

Hi @darkowlzz,

related to #19, this replaces the deprecated loki-stack helm chart with the more powerful loki + promtail helm charts.

By this change Loki will be upgraded from v2.6.1 to v2.9.3, and to get it running in my test environment I had to do some changes to the the loki dashboard. Let me know if you agree with those changes.

Also found out that the prometheus-mixin projects use / as separator in the dashboards names, so I changed the titles to make it more uniform as well.

This adds Dashboards Links as Dropdown menu via the tag flux, similar to what the Loki dashboards have.

Thank you!

"value": [
"$__all"
]
"current": {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to keep the default namespace value for this, otherwise the default dashboard shows logs of every application that's deployed across namespaces with controller label. Setting it to flux-system namespace ensures that only flux controller logs are displayed by default. It also limits the list of controllers shown in the controller drop-down menu to flux controller only, making it more of a flux logs dashboard and not a generic logs dashboard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will revert this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't seem to have flux-system as the default namespace value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using fluxcd in different namespace in our project https://github.com/aenix-io/cozystack, so it would be better to rely on additional labels of release of FluxCD.

@mjtrangoni mjtrangoni force-pushed the replace-loki-stack branch 2 times, most recently from 0a7c54c to 0f3bcb4 Compare January 11, 2024 16:13
Comment on lines 308 to 311
"label": "app",
"refId": "LokiVariableQueryEditor-VariableQuery",
"stream": "{namespace=~\"$namespace\"}",
"type": 1
Copy link
Contributor

@darkowlzz darkowlzz Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is about. Wouldn't just label_values(app) work for the controller query like before?

@kingdonb
Copy link
Member

kingdonb commented Feb 7, 2024

@mjtrangoni It looks like there are some extra unnecessary changes in this PR that can be removed, so it only contains the changes that you intended - are you still there?

If we don't hear back from you soon, we will probably try to rebase the extra changes out

@mjtrangoni
Copy link
Contributor Author

@kingdonb sorry, you are right. I will remove all my changes around the dashboard, and let this be merged.
BTW, I found that the table_manager removal was not working and switched to the compactor, which in my case is working well

@kingdonb
Copy link
Member

I don't think we should configure storage, if the user attempts to install this example on a kind cluster then it will break because no persistent volume claims are allowed. I just tested it and on my cluster I had this issue, because there was no storage driver for CSI configured:

  Warning  FailedScheduling  3m14s  default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Other than that issue I think this is a great config update thank you for contributing it @mjtrangoni

I'm just going by precedent that the old version with loki-stack did not configure the filesystem storage, I think that means it's using impermanent / non-persistent or ephemeral (tmp) storage, and that means that data is lost when loki restarts. I think that is OK. This is just an example config, users are not meant to literally take this repo and implement it without any changes. But maybe some docs about how to configure the storage sensibly would be a helpful addition?

I just don't want the guide to break for kind users, so I guess we shouldn't be configuring storage as that will break it.

@stefanprodan
Copy link
Member

Both Prometheus and Grafana use an emptyDir for storage, we should do the same for Loki as it was before.

@kingdonb
Copy link
Member

grafana/loki#10621

IDK if the new Loki chart actually supports emptyDir, from the open issue on their git repository I suspect it doesn't.

Maybe there is some way that we can hack it, but I tried messing with the config for half an hour and wound up at this issue where it seems like this wouldn't be supported as no production environment is complete without storage.

I'm not sure what to do, it really sucks that loki doesn't seemingly have a way to install on kind clusters where there is no persistent volume (or maybe the local-path-storage is included now, I don't actually use kind myself, this is a talos cluster)

@stefanprodan
Copy link
Member

The local-path-storage provider is included in Kind.

@mjtrangoni
Copy link
Contributor Author

I will give it a try with the new loki v3.0 helm chart next week, and also rebase against the upstream changes.

@Reintjuu
Copy link

@mjtrangoni, did you get the chance to update it to v3.0?

@JordanP
Copy link
Contributor

JordanP commented Nov 6, 2024

I tested this with Helm Chart version 6.x which deploys Loki v3.2. There were quite some changes in the deployment options for Loki. For this repo, since it's an example, I recommand we install Loki in single-binary/monolithic mode. A basic values.yaml file is provided here in the official repo.

I came up with the following values, I tested it's working:

loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  schemaConfig:
    configs:
      - from: 2024-04-01
        object_store: filesystem
        store: tsdb
        schema: v13
        index:
          prefix: index_
          period: 24h
  storage:
    type: 'filesystem'
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
      admin_api_directory: /tmp/loki/admin
  limits_config:
    retention_period: 24h
deploymentMode: SingleBinary
singleBinary:
  replicas: 1

# chunksCache deploys memcached, which needs quite some RAM
chunksCache:
  enabled: false
# resultsCache deploys memcached, which needs quite some RAM
resultsCache:
  enabled: false

# Monitoring section determines which monitoring features to enable
monitoring:
  serviceMonitor:
    enabled: true
    labels:
      app.kubernetes.io/part-of: kube-prometheus-stack

# The Loki canary pushes logs to and queries from this loki installation
# to test that it's working correctly
lokiCanary:
  enabled: false
# -- Section for configuring optional Helm test
test:
  enabled: false
extraObjects:
  - apiVersion: v1
    kind: ConfigMap
    metadata:
      name: loki-datasource
      labels:
        app: loki
        chart: loki
        release: loki
        grafana_datasource: "1"
        app.kubernetes.io/part-of: kube-prometheus-stack
    data:
      loki-datasource.yaml: |-
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy
          url: http://loki:{{ .Values.loki.server.http_listen_port }}
          version: 1
          isDefault: false

# Zero out replica counts of other deployment modes
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0
ingester:
  replicas: 0
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
distributor:
  replicas: 0
compactor:
  replicas: 0
indexGateway:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0

We can discuss the chunks_directory and the resultsCache and chunksCache.

cc @Reintjuu @mjtrangoni

@georgek
Copy link

georgek commented Dec 18, 2024

The above worked for me but I did have to quote the data in the schemaConfig as per grafana/loki#14126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants