Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'now()' not working on label values #1

Closed

Conversation

kir4h
Copy link
Contributor

@kir4h kir4h commented Jan 19, 2020

First of all, thanks for your project! While browsing for a way to generate metrics from arbitrary logs, I didn't want to go to complex/grok based solutions since my requirement was pretty simple. I thought about writing this functionality myself only to find out someone had already done so :)

According to the documentation, the function now() should be valid for label values:

Supported values (for metric results and label values):

  • Static numbers, such as 1, 10.3e5
  • Regex group references, such as $pool from the regexp pool=(?P<pool>[^ ]+) msg
  • now() which returns the current Unix time

This PR makes now() evaluated also for labels.

I have split the PR into 3 commits:

  • The fix itself
  • An amendment to the test of the now() usage in metric values. Please correct me if I'm wrong or misunderstood the test
  • Updating of the packages through dep ensure

Feel free to drop any commit or let me know if any changes required/suggested.

@kir4h kir4h changed the title Fix now not working on label values Fix 'now()' not working on label values Jan 19, 2020
@kir4h kir4h requested a review from hoffie January 19, 2020 14:06
@hoffie
Copy link
Owner

hoffie commented Jan 20, 2020

Thanks for the PR, looks interesting, will have a deeper look in the coming days. :)

@hoffie
Copy link
Owner

hoffie commented Jan 29, 2020

I really like it code-wise, thanks for your effort!

I am a bit unsure about the semantics. Prometheus-wise, such metrics are going to be problematic, aren't they? Having a timestamp as a label might well lead to cardinality explosion.

Can you describe what the use case for this is? Based on that, we can better judge whether to merge this as-is or if there are any other options for achieving your goals.

I agree that something has to be fixed -- the docs are currently wrong. Either the feature gets implemented or the docs will have to be corrected.

Let me know what you think.

@kir4h
Copy link
Contributor Author

kir4h commented Jan 30, 2020

am a bit unsure about the semantics. Prometheus-wise, such metrics are going to be problematic, aren't they? Having a timestamp as a label might well lead to cardinality explosion.

Indeed! I realized this some days after raising my PR, while making usage of the output :)

Can you describe what the use case for this is?

Scenario
My journey began with a set of shell scripts I wanted to instrument. I started using Pushgateway for this, by adding a small logic to some of them to push their metrics. But then I realized I shouldn't "contaminate" the scripts with prometheus awareness, and ended up thinking that "something" should derive that info from the logs and generate those metrics (that's how I ended up using this project)

When running such scripts regularly through cron jobs, a nice metric is being aware on when the script was successfully executed for the last time (so that I can generate an alert for instance if it doesn't happen when it's supposed to).

While using this project, I realized I had lost one of the goodies from pushgateway: for every pushed metric for a given group, generates additional metrics (push_time_seconds and push_failure_time_seconds) that allowed knowing when was the last time the metric was reported.

First approach

In order to overcome this, I though of using labels for my metric. Whenever a regexp was matched in a log file, and that a corresponding metrics was to be issued, a label was added to the timestamp. This way, I could read this value in Prometheus/Grafana and do something about it.

It was my first contact with metric generation, so only later I realized the cardinality issue: Each metric, because of having different label value for timestamp, was increasing cardinality, and this would continue happening on every metric issuing. At that point I should have come back and commented it in this PR, but it didn't cross my mind, sorry!

Second approach

My second (and current approach) has been to add a "sidecar" metric to every match, that reports this value

logs:
- path: /logs/my.log
  patterns:
  - match: .*total time execution:\s*(?P<seconds>\d+)\s*seconds
    metric: duration_seconds
    type: gauge
    help: Execution time in seconds
    action: set
    continue: true
    value: $seconds
    labels:
      instance: myInstance
      job: myJobName
  - match: .*total time execution:\s*(?P<seconds>\d+)\s*seconds
    metric: last_execution
    type: gauge
    help: Last execution time in seconds
    action: set
    continue: true
    value: now()
    labels:
      instance: myInstance
      job: myJobName

This isn't ideal because for every match, I need to inject a repetitive block that issues this "sidecar metric"

Possible third approach

I was thinking that a possible approach would be multilog_exporter mimicking pushgateway behaviour, having a boolean flag that when set would produce an additional metric reporting the last time the metric was updated for a given group.

I still have little Prometheus knowledge so I might be missing something, but I guess if Pushgateway does this is because there is a legitimate use case for it.

What do you think?

@hoffie
Copy link
Owner

hoffie commented Feb 8, 2020

Your second approach would be what I have suggested with the current version, yes. :)

I was thinking that a possible approach would be multilog_exporter mimicking pushgateway behaviour, having a boolean flag that when set would produce an additional metric reporting the last time the metric was updated for a given group.

This sounds like a useful enhancement. Would probably nice to be able to control that by metric? Would you want to work on a PR for that one? Guess a new one would make sense.

@kir4h
Copy link
Contributor Author

kir4h commented Feb 9, 2020

Closing this in favour of #3

@kir4h kir4h closed this Feb 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants