Merge snapcore/snapd/state changes #259

dmitry-lyfar · 2023-07-23T21:52:38Z

Hello there! As discussed with @niemeyer, the latest changes in snapd's state management were merged into Pebble. Please find the summary below:

State:

Add WaitStatus support that allows a task to wait until further action to continue its execution. The WaitStatus is treated mostly as DoneStatus, except it is not ready status.
Add CopyState function.
Add Change.AbortUnreadyLanes.
Add Change.CheckTaskDependencies to check if tasks have circular dependencies.
Add task and change callbacks invoked on a status change.
Implement timings.GetSaver interface in state.State. Note, however, the snapcore/snapd/timings package that provides the interface is not included in this PR. Pebble has a similar "timing" package. It might be reasonable to introduce "timings" separately to replace the "timing" package to keep concerns (state and timings) separated.

State engine:

Add StateStarterUp interface to perform possible expensive initialisation in a separate StartUp method.

Overlord:

Implement the StateStarterUp interface.

Daemon:

Call Overlord StartUp during the start process to complete its initialisation. Note, that there is no obvious requirement to introduce this separation for Pebble. It can still initialise Overlord in New(). The separation was mimicked to maintain better compatibility with possible future mergers of the snapcore/snapd/overlord/* changes and unit tests.

Misc:

Add jsonutil package required by the new state's CopyState function.
Rename osutil.CanStat to osutil.FileExist.
Rename task.FakeTime to task.MockTime.
Add testutil.ErrorIs and testutil.DeepUnsortedMatches checkers.

Unit-testing:

Pebble's unit tests fail occasionally if ran as:

go test ./... -count 10

This PR does not appear to introduce any regression to the current tests' behaviour.

State: - Add WaitStatus support that allows a task to wait until further action to continue its execution. The WaitStatus is treated mostly as DoneStatus, except it is not a ready status. - Add CopyState function. - Add Change.AbortUnreadyLanes. - Add Change.CheckTaskDependencies to check if tasks have circular dependencies. - Add task and change callbacks invoked on a status change. - Implement timings.GetSaver interface in state.State. Note, however, the snapcore/snapd/timings package that provides the interface is not included in this PR. pebble has a similar "timing" package. It might be reasonable to introduce "timings" separately to replace the "timing" package to keep concerns (state and timings) separated. State engine: - Add StateStarterUp interface to perform possible expensive initialisation in a separate StartUp method. Overlord: - Implement the StateStarterUp interface Daemon: - Call Overlord StartUp during the start process to complete its initialisation. Note, that there is no obvious requirement to introduce this separation for pebble. It can still initialise Overlord in New(). The separation was mimicked to maintain better compatibility with possible future mergers of the snapcore/snapd/overlord/* changes. Misc: - Add jsonutil package required by the new state's CopyState function. - Rename osutil.CanStat to osutil.FileExist. - Rename task.FakeTime to task.MockTime. - Add testutil.ErrorIs and testutil.DeepUnsortedMatches checkers.

flotter

Enormous amount of work, thank you.

I have a couple of first pass comments added with a few questions here:

I found that some changes pulled in here belong to commits with additional changes and comments providing the original context. It is extremely difficult for me to review some of the code without the original context explained. I wonder if some of the groups of changes would be better to cherry-pick from the original commit as a whole, e.g:

commit 4815875ea89eace1349d30fe863371e251e08ebd
overlord: only initialize loopTomb when needed

I would break this PR up into smaller logical commits. It makes reviewing easier as one can select only on commit, and use that as context to understand which files are impacted?
I wonder if some things like jsonutil and testutil could be moved to x-go to prevent this exercise from repeating itself on those common code areas?

flotter · 2023-07-24T10:17:37Z

internals/cli/cmd_run.go

+	err = d.Start()
+	if err != nil {
+		return err
+	}


I think we can simplify this for error only returns to mimic d.Init() ?

if err := d.Start(); err != nil { return err }

flotter · 2023-07-24T10:37:01Z

internals/daemon/daemon.go

+func (d *Daemon) Overlord() *overlord.Overlord {
+	return d.overlord
+}
+


Currently, daemon tests are bundled as part of the same package (not daemon_test). There are currently roughly 68 occurrences of d.overlord.xxx obtaining the overlord instance directly using the private struct field.

I think we should either stick to the earlier scheme, or update all the d.overlord references in tests.

flotter · 2023-07-24T10:51:37Z

internals/daemon/daemon.go

 	}
 	if d.overlord == nil {
 		panic("internal error: no Overlord")
 	}

 	d.StartTime = time.Now()

+	// now perform expensive overlord/manages initialization


I think the comment intends to read:
// now perform expensive overlord/manager initialization => "manager"

flotter · 2023-07-24T10:57:41Z

internals/testutil/containschecker.go

+// -*- Mode: Go; indent-tabs-mode: t -*-
+
+/*
+ * Copyright (C) 2015-2018 Canonical Ltd


From a previous comment I've received in Pebble, I think the preference is to drop the formatting hint, as this is mostly not needed anymore today.

It looks like we could update the copyright date to current, as the patch moved it backwards actually.

Please consider everywhere.

flotter · 2023-07-24T11:02:38Z

internals/testutil/containschecker.go

An observation: testutil according to my investigation depends only on reaper and logger. It seems like this package and deps could be moved to x-go relatively easily, removing the need for future cross-project maintenance.

flotter · 2023-07-24T12:15:49Z

internals/osutil/stat.go

 // It may return false on permission issues.
-func CanStat(path string) bool {
+func FileExists(path string) bool {


I would group this rename (and the subsequent changes) in its own commit in the PR. It makes it very easy to review such a change in isolation of the rest of the PR by selecting only one commit at a time.

flotter · 2023-07-24T12:30:22Z

internals/overlord/overlord.go

+}
+
+// StartUp proceeds to run any expensive Overlord or managers initialization.
+// After this is done once it is a noop.


// After this is done once, it is a noop.

flotter · 2023-07-24T12:49:00Z

internals/overlord/overlord.go

+	if o.loopTomb == nil {
+		o.loopTomb = new(tomb.Tomb)
+	}


Could you add some context for this? However, it does not seem to include the whole feature for lazy overlord startup.

This appears to be related to:

commit 4815875ea89eace1349d30fe863371e251e08ebd Author: Alberto Mardegan <[email protected]> Date: Mon Dec 6 10:25:07 2021 +0300 overlord: only initialize loopTomb when needed

flotter · 2023-07-24T12:58:33Z

internals/overlord/state/change.go

@@ -1,7 +1,7 @@
 // -*- Mode: Go; indent-tabs-mode: t -*-

 /*
- * Copyright (c) 2016 Canonical Ltd
+ * Copyright (C) 2016-2023 Canonical Ltd


flotter · 2023-07-24T13:04:45Z

internals/overlord/state/change.go

@@ -157,7 +181,6 @@ type marshalledChange struct {
 	Clean   bool                        `json:"clean,omitempty"`
 	Data    map[string]*json.RawMessage `json:"data,omitempty"`
 	TaskIDs []string                    `json:"task-ids,omitempty"`
-	Lanes   int                         `json:"lanes,omitempty"`


Is this safe for us to just remove? What will be the impact for current systems ?

dmitry-lyfar · 2023-10-13T02:55:55Z

Hi @benhoyt , @flotter (cc @niemeyer).

After a discussion with @flotter over the comments above, it is suggested to try the following approach for this PR instead:

Separate changes into smaller logical blocks that can be reviewed and accepted independently. I’ll try to keep them close to ~500 lines.
Do not rename FakeTime to MockTime as in the snapd’s code base.
Do not introduce the StateStarterUp interface (i.e. separating the start up logic from a manager's creation in a StartUp method); all Pebble managers initialise swiftly enough at creation. Thus, we'll avoid numerous changes to the Daemon and its existing test suite.
Do not introduce GetMaybeTimings method to State as it involves another dependency on the snapcore/snapd/timings.GetSaver interface and is not really useful in Pebble. It appears the approaches to measuring spans in Pebble and snapd diverted substantially (perhaps, Pebble does not save/use timings now at all).

Thus, the changesets I imagine this PR to be divided into (roughly):

Extend testutil with testutil.ErrorIs to prepare for the move of the updated state package state, change and task data entry functions. These functions return NoStateError with a key name if a data key is not found contrary to the previously used ErrNoState. Thus, the direct comparison of a returned error with ErrNoState instead of errors.Is would fail.
Extend testutil with testutil.DeepUnsortedMatches. This checker is used by 2 new large TaskRunner unit tests. It’s a useful checker to have for other Pebble unit tests too.
(Optional) Update osutil’s CanStat -> FileExist and corresponding packages/tests using it. (CanStat, FileExists). This is something I’m on the fence with as CanStat is used twice in the state’s package code: to load and copy the state. The majority of osutil.CanStat usage is elsewhere in the Pebble’s code base. Should we, instead, just keep the state package to use CanStat and avoid noisy changes?
Add StartOfOperation time tracking to Overlord so that Overlord can use it to prune outdated changes (the new state package introduces the start of operation time to the state.Prune parameters so a change, when considered for pruning, will use the start of operation time as its spawn time if it was spawned before the latest overlord start).
Add new changes from the state package (excluding CopyState functionality). Also, update code and tests that would now fail on state entry functions error checking (patch, overlord, daemon).
(Optional) Add CopyState function (copy.go from the original state package) and the required safejson package. This is not used by Pebble and, given it is a completely isolated piece of functionality, it can now be dropped and introduced when/if required.

Please give me your thoughts on the above. If it looks reasonable, I can go ahead, cancel this PR and send changes in stages.

benhoyt · 2023-10-13T03:25:04Z

Yes, that looks very reasonable to me, thanks @dmitry-lyfar. Some specifics:

You're right that Pebble doesn't use those span timings at all now, so I think not introducing that is the right call.
I'm not sure why CanStat diverged from FileExists. There's only an "initial commit" in Pebble, so I suspect Gustavo went through and renamed / cleaned up a few things when he originally copied the Pebble code. I think CanStat says more what it's actually checking than FileExists, so CanStat is probably "better". So apart from the fact that it'd introduce a bunch of noise in the rest of Pebble, I'd leave that one out (and rename FileExists to CanStat in snapd if anything).
CopyState -- yeah, it sounds like we should wait with that one for now.

niemeyer · 2023-10-19T09:33:11Z

We talked about this today in our call and agreed to break it down. Some more detailed feedback:

Separate changes into smaller logical blocks that can be reviewed and accepted independently.

Sounds good, thank you.

Do not rename FakeTime to MockTime as in the snapd’s code base.

Yes, we should go the opposite direction: snapd still needs to move towards Fake* terminology. It was mixed, and when it was moved to Pebble it was standardized.

Do not introduce the StateStarterUp interface (i.e. separating the start up logic from a manager's creation in a StartUp method); all Pebble managers initialise swiftly enough at creation. Thus, we'll avoid numerous changes to the Daemon and its existing test suite.

We do want to do this. The change in snapd was actually made on my request precisely after Pebble was split, because it makes little sense for machine start/restart logic to live inside a package that is responsible for general state management and work dispatch. So let's please integrate this change too, ideally on its own PR like is being proposed for everything else.

Do not introduce GetMaybeTimings method to State as it involves another

Why would we not introduce it? As I remember, we're not making extensive use of timers yet, but this is a nice feature that would be nice to have avaialble, and the fact we're not using this extensively makes the job easier supposedly.

Thus, the changesets I imagine this PR to be divided into (roughly):

From those descriptions it's not clear what is being proposed that is a feature that does not exist in snapd itself and what is a feature that is just being migrated. So it's harder to agree/disagree. In general: what is being ported from snapd in support of general state logic is good. What is being created inside Pebble itself, or that is being ported from snapd but is unrelated to the state package itself, not so great but we can talk about it. Feel free to provide more details so we can consider specific changes, either in their own PR or in our calls.

For now, I suggest closing this PR.

dmitry-lyfar · 2023-10-20T06:21:36Z

We do want to do this. The change in snapd was actually made on my request precisely after Pebble was split, because it makes little sense for machine start/restart logic to live inside a package that is responsible for general state management and work dispatch. So let's please integrate this change too, ideally on its own PR like is being proposed for everything else.

Ack.

Do not introduce GetMaybeTimings method to State as it involves another

Why would we not introduce it? As I remember, we're not making extensive use of timers yet, but this is a nice feature that would be nice to have avaialble, and the fact we're not using this extensively makes the job easier supposedly.

The interface implementation itself is tiny as it stores/retrieves raw data to the state. It is largerly used by the independent timings package which I was thinking makes sense to merge with some changes that would use it as it is not directly related to the state management.

I will add the GetSaver implementation now so it can be used later when someone introduces timings and their usage in Pebble later on.

Thus, the changesets I imagine this PR to be divided into (roughly):

From those descriptions it's not clear what is being proposed that is a feature that does not exist in snapd itself and what is a feature that is just being migrated. So it's harder to agree/disagree. In general: what is being ported from snapd in support of general state logic is good. What is being created inside Pebble itself, or that is being ported from snapd but is unrelated to the state package itself, not so great but we can talk about it. Feel free to provide more details so we can consider specific changes, either in their own PR or in our calls.

Sounds good, closing this one now.

This PR is made in preparation to merge the latest snapd state package changes into pebble as per the plan discussed in #259. Both ErrorIs and DeepUnsortedMatches checkers are used by the latest state package test suites. These are cherry-picked from the snapd repository with some minor changes (fixed Pebble linter's warnings, updated license header).

flotter reviewed Jul 24, 2023

View reviewed changes

dmitry-lyfar closed this Oct 20, 2023

benhoyt mentioned this pull request Nov 15, 2023

feat(stateengine): merge StateStarterUp interface from snapd #327

Closed

This was referenced Nov 17, 2023

feat: add StateStarterUp to move expensive init ops to dedicated methods #330

Merged

feat(testutil): add ErrorIs and DeepUnsortedMatches checkers #331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge snapcore/snapd/state changes #259

Merge snapcore/snapd/state changes #259

dmitry-lyfar commented Jul 23, 2023

flotter left a comment

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

flotter Jul 24, 2023

dmitry-lyfar commented Oct 13, 2023

benhoyt commented Oct 13, 2023

niemeyer commented Oct 19, 2023

dmitry-lyfar commented Oct 20, 2023

Merge snapcore/snapd/state changes #259

Merge snapcore/snapd/state changes #259

Conversation

dmitry-lyfar commented Jul 23, 2023

flotter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmitry-lyfar commented Oct 13, 2023

benhoyt commented Oct 13, 2023

niemeyer commented Oct 19, 2023

dmitry-lyfar commented Oct 20, 2023