Connect reminder service to minder server to dispatch reminders #3630

Vyom-Yadav · 2024-06-16T16:09:58Z

Summary

Provide a brief overview of the changes and the issue being addressed.
Explain the rationale and any background necessary for understanding the changes.
List dependencies required by this change, if any.

Issue #2262 (Task - I)

Adds logic for sending reminders to the minder server. Minder server processes these reminders to trigger a TopicQueueReconcileRepoInit event to reconcile a repository.

Change Type

Mark the type of change your PR introduces:

Bug fix (resolves an issue without affecting existing features)
Feature (adds new functionality without breaking changes)
Breaking change (may impact existing functionalities or require documentation updates)
Documentation (updates or additions to documentation)
Refactoring or test improvements (no bug fixes or new functionality)

Testing

Outline how the changes were tested, including steps to reproduce and any relevant configurations.
Attach screenshots if helpful.

Tested locally (no unit tests)

Review Checklist:

Reviewed my own code for quality and clarity.
Added comments to complex or tricky code sections.
Updated any affected documentation.
Included tests that validate the fix or feature.
Checked that related changes are merged.

cc @evankanderson

Vyom-Yadav · 2024-06-16T16:14:53Z

internal/reminder/reminder.go

+			// Commit the transaction i.e update reminder_last_sent
+			// only if the messages were sent successfully
+			if err = tx.Commit(); err != nil {
+				logger.Error().Err(err).Msg("unable to commit transaction")
+				errorSlice = append(errorSlice, err)
+			}


UpdateReminderLastSentById queries are part of a transaction and are only committed if the messages are published successfully. Still, the transaction could fail, which would result in reminders being sent, but the reminder_last_sent field won't be updated (better than false positive i.e. reminder_last_sent was updated but reminders weren't sent)

I think this is fine. Did you consider to split sending the messages into a separate function or return earlier if len(messages) == 0? This might be a personal preference but I find it easier to read code with fewer indentation levels.

Something like:

if len(messages) == 0 { return errorSlice } // the rest of the code

It seems like the reminder_last_sent field is mostly diagnostic, because we're actually ticking forward on the (in-memory) repository cursor, rather than using reminder_last_sent to actually dictate when repos are reconsidered. I think this is correct, but that suggests to me that strict update correctness is less necessary.

(I'm not arguing against keeping this data, but it seems like the actual mechanism that prevents re-evaluation is when the reminder is received and a rule evaluation is complete, which could be minutes after the reminder is sent on a particularly bad day.)

Vyom-Yadav · 2024-06-16T16:15:45Z

internal/reminder/reminder.go

+	// TODO: Collect Metrics
+	// Potential metrics:
+	// - Gauge: Number of reminders in the current batch
+	// - UpDownCounter: Average reminders sent per batch
+	// - Histogram: reminder_last_sent time distribution


How do we want to export the metrics? And some thoughts on adding these metrics?

Hmm, I wonder what metrics would be useful as a minder operator.

About the histogram, I think a histogram that shows me how out of date (how much before a cutoff) a repo was when it was selected would be useful.

I think the metrics would be interesting to fine-tune the algorithm, did you see some other value in the metrics?

Since reminder is a separate service, then I guess implementation-wise it should just have its own metrics server that could be scraped separately.

My $0.02 would be:

Create a MeterProvider ala https://github.com/stacklok/minder/blob/main/internal/controlplane/server.go#L188.

Wire the metrics into prometheus. Since this has no external interface, you can simply set up an internal http.Server to serve the metrics -- we could put other debug information there later.

If you call otel.SetMeterProvider in the server setup, you don't need to worry about plumbing directly from the code recording the metric to the server; OTel will handle this in the backend (with some globals, but that's fine in this case). That means that the metrics code can do the simple thing from the OTel docs:
var timeShift metrics.Float64Histogram func init() { meter := otel.GetMeterProvider().Meter("reminder") // Or could be "" var err error timeShift, err = meter.Float64Counter("send_delay", api.WithDescription("Delay in revisit after eligible"), api.WithUnit("s")) if err != nil { panic("Couldn't register meter: %v", err) } } // At call site: timeShift.Record(ctx, send_delay) // No need for attributes

Signed-off-by: Vyom-Yadav <[email protected]>

coveralls · 2024-06-16T16:18:11Z

coverage: 53.167% (-0.2%) from 53.4%
when pulling d480686 on Vyom-Yadav:send-reminder-events
into cd7a0f2 on stacklok:main.

coveralls · 2024-06-16T16:19:39Z

coverage: 53.172% (-0.2%) from 53.4%
when pulling d480686 on Vyom-Yadav:send-reminder-events
into cd7a0f2 on stacklok:main.

coveralls · 2024-06-16T16:25:04Z

coverage: 53.177% (-0.2%) from 53.4%
when pulling d480686 on Vyom-Yadav:send-reminder-events
into cd7a0f2 on stacklok:main.

jhrozek

The code looks quite good to me. I left replies to your comments, neither of them is blocking. It wasn't clear to me if you wanted to tackle the metrics as part of this PR, but I normally prefer smaller PRs.

evankanderson

It looks like the current architecture re-uses the existing events configuration for Minder, since it's assuming that the repo.reminder.event events are in a Watermill instance which Minder is configured to see. I think that's fine, but I didn't recall for sure where we'd landed on that earlier. This setup is 👍 for me from a simplicity PoV.

evankanderson · 2024-06-20T20:57:43Z

internal/reminder/messages/messages.go

+	// Project is the project that the event is relevant to
+	Project uuid.UUID `json:"project"`
+	// RepositoryID is id of the repository to be reconciled
+	RepositoryID int64 `json:"repository" validate:"gte=0"`
+	// ProviderID is the provider of the repository
+	ProviderID uuid.UUID `json:"provider"`


My preference would be to order these Project > Provider > Repo, which is the ownership order.

evankanderson · 2024-06-20T20:58:34Z

internal/reminder/messages/messages.go

+	// Project is the project that the event is relevant to
+	Project uuid.UUID `json:"project"`
+	// RepositoryID is id of the repository to be reconciled
+	RepositoryID int64 `json:"repository" validate:"gte=0"`


IMO, this should use the id field from the repositories table, not the repo_id from GitHub. That will help insulate this from GitHub-specific fields that are different in (for example) GitLab or BitBucket in the future.

Additionally, id is the primary key. It looks like our current indexes are set up off repo_id; sorry for not noticing this previously...

Looking further, it looks like we use the GitHub-matching repo_id in NewRepoReconcilerMessage already, so that argues towards using it here as well.

evankanderson · 2024-06-20T21:05:48Z

internal/reminder/messages/messages.go

+	validate := validator.New()
+	if err := validate.Struct(&evt); err != nil {
+		return nil, err
+	}


What's the benefit of calling validate and checking that the numeric repo_id is non-negative? It seems like the caller of this method will still need to check whether or not repo_id actually exists in the database, so this seems like it adds some extra code without really accomplishing the necessary validation; the necessary validation requires a database connection, and would presumably return the actual row if needed.

evankanderson · 2024-06-20T21:23:13Z

internal/reminder/reminder.go

+	// TODO: Collect Metrics
+	// Potential metrics:
+	// - Gauge: Number of reminders in the current batch
+	// - UpDownCounter: Average reminders sent per batch
+	// - Histogram: reminder_last_sent time distribution


My $0.02 would be:

Create a MeterProvider ala https://github.com/stacklok/minder/blob/main/internal/controlplane/server.go#L188.

Wire the metrics into prometheus. Since this has no external interface, you can simply set up an internal http.Server to serve the metrics -- we could put other debug information there later.

If you call otel.SetMeterProvider in the server setup, you don't need to worry about plumbing directly from the code recording the metric to the server; OTel will handle this in the backend (with some globals, but that's fine in this case). That means that the metrics code can do the simple thing from the OTel docs:
var timeShift metrics.Float64Histogram func init() { meter := otel.GetMeterProvider().Meter("reminder") // Or could be "" var err error timeShift, err = meter.Float64Counter("send_delay", api.WithDescription("Delay in revisit after eligible"), api.WithUnit("s")) if err != nil { panic("Couldn't register meter: %v", err) } } // At call site: timeShift.Record(ctx, send_delay) // No need for attributes

evankanderson · 2024-06-20T21:26:51Z

internal/reminder/reminder.go

+		return []error{err}
+	}
+
+	defer tx.Rollback()


Do we need all of the following database reads and writes to go into a single transaction (but not the getRepositoryBatch call above in the transaction, or the r.updateRepositoryCursor in the outer getRepositoryBatch call)?

evankanderson · 2024-06-20T21:29:54Z

internal/reminder/reminder.go

+			Time("previously", repo.ReminderLastSent.Time).
+			Msg("updating reminder_last_sent")
+
+		err = qtx.UpdateReminderLastSentById(ctx, repo.ID)


Intentional that you update when the reminder is sent before you send the message?

... except that with transactions, this actually sort-of happens after the message was sent, too -- you may be taking out a lock on the repo row for the duration of the iteration. Transactions can make before/after behavior hard to reason about in some cases, so I'd be cautious about introducing them when they aren't needed / aren't producing specific guarantees.

evankanderson · 2024-06-20T21:31:37Z

internal/reminder/reminder.go

 			logger.Error().Err(err).Str("repo", repo.ID.String()).Msg("unable to update reminder_last_sent")
-			return []error{err}
+			errorSlice = append(errorSlice, err)


In general:

Either handle an error (by logging, recovering, etc) OR propagate an error, but don't do both.

Logging the error and also propagating it often means that you'll end up with errors being double- or even triple-logged, as different parts of the stack log the error but then pass it up to be logged in the next part of the stack.

evankanderson · 2024-06-20T22:05:51Z

internal/reminder/reminder.go

+		if err != nil {
+			errorSlice = append(errorSlice, fmt.Errorf("error publishing messages: %w", err))
+		} else {


Rather than an if/else here, generally prefer to use early-return, particularly since this seems like a terminal case.

More generally, I'm not convinced that there's a large benefit to batching the Publish messages, rather than sending each in the previous part of the loop.

evankanderson · 2024-06-20T22:15:31Z

internal/reminder/reminder.go

+			// Commit the transaction i.e update reminder_last_sent
+			// only if the messages were sent successfully
+			if err = tx.Commit(); err != nil {
+				logger.Error().Err(err).Msg("unable to commit transaction")
+				errorSlice = append(errorSlice, err)
+			}


It seems like the reminder_last_sent field is mostly diagnostic, because we're actually ticking forward on the (in-memory) repository cursor, rather than using reminder_last_sent to actually dictate when repos are reconsidered. I think this is correct, but that suggests to me that strict update correctness is less necessary.

(I'm not arguing against keeping this data, but it seems like the actual mechanism that prevents re-evaluation is when the reminder is received and a rule evaluation is complete, which could be minutes after the reminder is sent on a particularly bad day.)

evankanderson · 2024-06-20T22:19:31Z

internal/reminderprocessor/reminder_processor.go

+func (rp *ReminderProcessor) reminderMessageHandler(msg *message.Message) error {
+	evt, err := remindermessages.RepoReminderEventFromMessage(msg)
+	if err != nil {
+		return fmt.Errorf("error unmarshalling reminder event: %w", err)
+	}
+
+	log.Info().Msgf("Received reminder event: %v", evt)
+
+	repoReconcileMsg, err := reconcilermessages.NewRepoReconcilerMessage(evt.ProviderID, evt.RepositoryID, evt.Project)
+	if err != nil {
+		return fmt.Errorf("error creating repo reconcile event: %w", err)
+	}
+
+	// This is a non-fatal error, so we'll just log it and continue with the next ones
+	if err := rp.evt.Publish(events.TopicQueueReconcileRepoInit, repoReconcileMsg); err != nil {
+		log.Printf("error publishing reconciler event: %v", err)
+	}
+	return nil
+}


It feels like this is simply forwarding messages from one queue to another; why not simply sending the TopicQueueReconcileRepoInit message directly from reminder.go and skip this extra intermediary? (There might be a good reason, but we should encapsulate that with a comment if there is such a reason.)

Vyom-Yadav requested a review from a team as a code owner June 16, 2024 16:09

Vyom-Yadav force-pushed the send-reminder-events branch from c444dfd to b5a1289 Compare June 16, 2024 16:11

Vyom-Yadav commented Jun 16, 2024

View reviewed changes

Connect reminder service to minder server to dispatch reminders

d480686

Signed-off-by: Vyom-Yadav <[email protected]>

Vyom-Yadav force-pushed the send-reminder-events branch from b5a1289 to d480686 Compare June 16, 2024 16:16

evankanderson self-requested a review June 20, 2024 13:02

evankanderson linked an issue Jun 20, 2024 that may be closed by this pull request

Add Background Reconciliation for Entities #2262

Open

2 tasks

jhrozek reviewed Jun 20, 2024

View reviewed changes

evankanderson reviewed Jun 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect reminder service to minder server to dispatch reminders #3630

Connect reminder service to minder server to dispatch reminders #3630

Vyom-Yadav commented Jun 16, 2024 •

edited

Loading

Vyom-Yadav Jun 16, 2024

jhrozek Jun 20, 2024

evankanderson Jun 20, 2024

Vyom-Yadav Jun 16, 2024

jhrozek Jun 20, 2024

evankanderson Jun 20, 2024

coveralls commented Jun 16, 2024

coveralls commented Jun 16, 2024

coveralls commented Jun 16, 2024

jhrozek left a comment

evankanderson left a comment

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

evankanderson Jun 20, 2024

Connect reminder service to minder server to dispatch reminders #3630

Are you sure you want to change the base?

Connect reminder service to minder server to dispatch reminders #3630

Conversation

Vyom-Yadav commented Jun 16, 2024 • edited Loading

Summary

Change Type

Testing

Review Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 16, 2024

coveralls commented Jun 16, 2024

coveralls commented Jun 16, 2024

jhrozek left a comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav commented Jun 16, 2024 •

edited

Loading