Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(logfwd): add log labels #312

Merged
merged 68 commits into from
Nov 10, 2023
Merged

feat(logfwd): add log labels #312

merged 68 commits into from
Nov 10, 2023

Conversation

barrettj12
Copy link
Contributor

@barrettj12 barrettj12 commented Sep 27, 2023

This PR builds on the previous log forwarding PRs (#209, #252, #256, #267) by allowing the user to specify labels for logs forwarded from Pebble.

Labels will be defined in a new field labels inside the log target:

log-targets:
  tgt1:
    type: loki
    location: https://my.loki.server.com:3100/loki/api/v1/push
    labels:
      key1: val1
      key2: val2

Labels will support environment variable substitutions ($ENV_VAR), and these will be interpreted "dynamically" in the context of the relevant service.

Notes on the design

Labels can contain $ENVIRONMENT_VARIABLES, which will be substituted in the environment of the relevant service. However, the "service environment" could mean:

  1. the environment map as defined in the plan
  2. the actual environment in which the process is run

We decided to use (1) instead, as it is more explicit, can be "statically" evaluated just from the plan, and doesn't require extra data to be passed through from the service manager and cached.

Furthermore, anytime we change the labels, we first flush any logs currently in the buffer, so that these logs are sent with the correct (old) labels.

QA steps

Set up a simple logging service:

#!/bin/bash
while true; do
  echo "hello"
  sleep 1
done

Set up a plan using labels and environment variables:

services:
  svc1: &svc1
    command: /path/to/logsvc
    startup: enabled
    override: merge
    environment:
      OWNER: alice
      IP: 100.32.2.1
      PORT: 9090

  svc2:
    <<: *svc1
    environment:
      OWNER: bob
      IP: 100.32.2.5
      PORT: 3000

log-targets:
  tgt1:
    type: loki
    override: merge
    location: http://10.42.0.222:3100/loki/api/v1/push
    services: [all]
    labels:
      owner: user-$OWNER
      address: http://${IP}:${PORT}

Run Pebble:

PEBBLE_DEBUG=1 go run ./cmd/pebble run

and verify that the labels are being correctly interpreted.

Links

Jira card: JUJU-4643

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good change. Just need to consider the problem we discovered next week.

internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/loki/loki.go Outdated Show resolved Hide resolved
@barrettj12 barrettj12 marked this pull request as ready for review October 3, 2023 05:14
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor suggestions and a suggested fix for the SetLabels concurrency problem.

README.md Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/loki/loki.go Outdated Show resolved Hide resolved
internals/overlord/logstate/loki/loki.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/plan/plan.go Outdated Show resolved Hide resolved
@barrettj12

This comment was marked as resolved.

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thanks for sticking with this.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Some further notes on one unrelated change done, though:

internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
@jnsgruk jnsgruk requested a review from niemeyer October 27, 2023 10:35
@canonical canonical deleted a comment from barrettj12 Nov 7, 2023
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through this once more and added a bunch of nit comments. Let's fix those up and get the PR description into shape, do final QA, and then ping me and we can try to merge this week!

README.md Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/loki/loki.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager_test.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager_test.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer_test.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer_test.go Outdated Show resolved Hide resolved
@@ -218,6 +219,92 @@ func (s *gathererSuite) TestRetryLoki(c *C) {
}
}

// Test to catch race conditions in gatherer
func (s *gathererSuite) TestRace(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this name more specific, like TestXyzRace with what it's actually testing a race in or for? And after that let's remove the redundant doc comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty non-specific test, it's just designed to hammer the log gatherer and find any potential race conditions. It's now finding some artificial race conditions though. Needs a rewrite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the test so it passes. I'm still not sure what to call it though - do you have any suggestions?

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'd like @hpidcock to give this one last look, but I'm ready to merge.

hpidcock

This comment was marked as resolved.

The i/o timeout was coming from the context being cancelled
during the "dial" stage. Increase the context timeout to avoid this.
@barrettj12 barrettj12 merged commit ae3351f into master Nov 10, 2023
17 checks passed
@barrettj12 barrettj12 deleted the labels branch November 10, 2023 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants