Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(logfwd): log forwarding implementation II #256

Merged
merged 31 commits into from
Aug 22, 2023

Conversation

barrettj12
Copy link
Contributor

@barrettj12 barrettj12 commented Jul 6, 2023

This PR contains the second part of #165, including the actual mechanics of log forwarding in Pebble. It builds upon #209 and #252. This includes an abstract logClient interface but doesn't include the specific Loki/syslog implementations - this will come in later PRs.

This is a modification of #218, designed to fix fundamental issues identified with that PR.

Current design

For each log target, there is a single "gatherer", which receives logs from a bunch of services and writes them to a "client". The gatherer runs a control loop in a separate goroutine, which waits to receive logs on a channel, and writes these to the client.

For each service, the gatherer spawns a "log puller". Each puller runs in a separate goroutine, pulling logs from an iterator on the service's ringbuffer. Pulled logs are sent to the gatherer's control loop on the shared channel.

The logClient interface represents a client to a specific type of backend. In future PRs, we will add lokiClient and syslogClient types which implement logClient.

logClient includes two methods

type logClient interface {
	Write(context.Context, servicelog.Entry) error
	Flush(context.Context) error
}

Client implementations have some freedom about the semantics of these methods.

  • For a buffering client (e.g. HTTP), Write could add the log to the client's internal buffer, while Flush would prepare and send an HTTP request with the buffered logs.
  • For a non-buffering client (TCP?), Write could serialise the log directly to the open connection, while Flush would be a no-op.

Teardown

Gracefully stopping a log gatherer is a complex process with multiple steps.

  • At the instant we attempt to stop the gatherer, it may be in the middle of flushing its client. So, wait a small amount of time for the client to finish flushing, but cancel the flush if this takes too long.
  • The service may have emitted some final logs on shutdown. Give each puller some time to pull the final logs from its iterator - but again, force kill it if this is taking too long.
  • Once the pullers are all finished, we must have received all the logs we're gonna get, so we can safely shut down the main loop.
  • Do a final flush of the client to send off any remaining buffered logs.

All this logic is encapsulated in the gatherer.stop() method.

QA

I've included some sample implementations of logClient here. They just print the logs to stdout. These can be used to verify that the log forwarding mechanics work properly.

Create a simple logging service, e.g.

#!/bin/bash
while true; do
  echo "Hello"
  sleep 1
done

and a simple plan using this service

services:
  svc1: &logsvc
    command: /home/jb/git/canonical/pebble/logfwd-impl2/pebble/logsvc
    startup: enabled
    override: merge
  svc2: *logsvc

log-targets:
  tgt1:
    override: merge
    services: [all]
    type: loki
    location: unnecessary

Add the fake.go file to the repo.

Comment out the following line

return nil, fmt.Errorf("unknown type %q for log target %q", target.Type, target.Name)

and replace it with e.g.

return &nonBufferingClient{}, nil          // unbuffered
return &bufferingClient{}, nil             // unlimited-size buffer, will flush on timeout only
return &bufferingClient{threshold: 3}, nil // buffer with max size: 3 logs

You might also want to change the gatherer's tick period:

tickPeriod = 1 * time.Second

Run Pebble with

go run ./cmd/pebble run

and verify the logs are printing to stdout.


JUJU-3776

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jordan! Even if there are some loose ends, this is clearly a good step in the right direction. Here is some early feedback to help shaping it up.

internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far, I think we need to be a bit more careful around defining a graceful termination path for pullers (go until the end of the ring buffer), gatherers (wait up to a timeout for the pullers to stop, then wait for up to a timeout for the client to flush).

The graceful shutdown path would be used when a service restarts or when pebble shuts down. All cases around reconfiguration can be much more forceful in ending all log forwarding for unwanted targets/services.

internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
internals/overlord/logstate/manager.go Outdated Show resolved Hide resolved
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left a few comments, all relatively minor.

internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
@jnsgruk jnsgruk requested a review from niemeyer August 1, 2023 08:11
@jnsgruk jnsgruk added the Priority Look at me first label Aug 1, 2023
Errors from the gatherer tomb are not being surfaced to the user, as there is
nothing checking tomb.Err().
Additionally, a write/flush failure should not bring down the gatherer - it
could be e.g. a temporary outage.
Better to log the failures rather than exiting the main loop.
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, Jordan. We're clearly approaching the end of the road.

internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/stateengine.go Outdated Show resolved Hide resolved
internals/overlord/stateengine_test.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive PR. I have learnt a lot from studying your code. I have really tried to give you an in depth review - I hope some things will be useful for you. If not, please discard.

internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/stateengine.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/overlord.go Outdated Show resolved Hide resolved
internals/overlord/servstate/handlers.go Outdated Show resolved Hide resolved
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jordan. This is looking great.

There's still one important issue to address, unfortunately:

internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
hpidcock and others added 3 commits August 14, 2023 14:06
This avoids runtime panic from calling Go after all goroutines have returned.

Also, we need to kill the pullerGroup tomb in Stop() to tell the pullerGroup
that it's time to shut down.
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Left a few very nitty comments (some of which are "take it or leave it" naming suggestions), but I'm happy with this.

internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Outdated Show resolved Hide resolved
internals/overlord/logstate/gatherer.go Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Show resolved Hide resolved
internals/overlord/logstate/puller.go Outdated Show resolved Hide resolved
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy where this ended up. Thank-you for all the hard work on it.

internals/overlord/servstate/manager.go Show resolved Hide resolved
internals/overlord/logstate/fake.go Outdated Show resolved Hide resolved
internals/overlord/logstate/puller.go Show resolved Hide resolved
@jnsgruk jnsgruk requested a review from niemeyer August 15, 2023 07:47
@barrettj12 barrettj12 changed the title [JUJU-3776] Log forwarding implementation II feat(logfwd): log forwarding implementation II Aug 16, 2023
// This goroutine lives for the lifetime of the pullerGroup. This is so that,
// if needed, we can safely remove all pullers and then add more, without
// causing a panic (tombs can't be reused once all the tracked goroutines
// have finished).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gustavo said this is a nice comment. :-)

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes. Looks good! +1

@benhoyt benhoyt merged commit 315ef31 into canonical:master Aug 22, 2023
13 checks passed
@benhoyt
Copy link
Contributor

benhoyt commented Aug 22, 2023

@barrettj12 We discussed this at our APAC spec review meeting live with Gustavo and agreed to merge. Merge party!

@benhoyt benhoyt deleted the logfwd-impl2 branch August 22, 2023 07:36
@jnsgruk jnsgruk mentioned this pull request Aug 31, 2023
niemeyer pushed a commit that referenced this pull request Sep 25, 2023
Add a loki.Client type that encodes sends Pebble logs to Loki. This implements the logstate.logClient interface introduced in #256.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority Look at me first
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants