Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge snapcore/snapd/state changes #259

Closed

Conversation

dmitry-lyfar
Copy link
Contributor

Hello there! As discussed with @niemeyer, the latest changes in snapd's state management were merged into Pebble. Please find the summary below:

State:

  • Add WaitStatus support that allows a task to wait until further action to continue its execution. The WaitStatus is treated mostly as DoneStatus, except it is not ready status.
  • Add CopyState function.
  • Add Change.AbortUnreadyLanes.
  • Add Change.CheckTaskDependencies to check if tasks have circular dependencies.
  • Add task and change callbacks invoked on a status change.
  • Implement timings.GetSaver interface in state.State. Note, however, the snapcore/snapd/timings package that provides the interface is not included in this PR. Pebble has a similar "timing" package. It might be reasonable to introduce "timings" separately to replace the "timing" package to keep concerns (state and timings) separated.

State engine:

  • Add StateStarterUp interface to perform possible expensive initialisation in a separate StartUp method.

Overlord:

  • Implement the StateStarterUp interface.

Daemon:

  • Call Overlord StartUp during the start process to complete its initialisation. Note, that there is no obvious requirement to introduce this separation for Pebble. It can still initialise Overlord in New(). The separation was mimicked to maintain better compatibility with possible future mergers of the snapcore/snapd/overlord/* changes and unit tests.

Misc:

  • Add jsonutil package required by the new state's CopyState function.
  • Rename osutil.CanStat to osutil.FileExist.
  • Rename task.FakeTime to task.MockTime.
  • Add testutil.ErrorIs and testutil.DeepUnsortedMatches checkers.

Unit-testing:

Pebble's unit tests fail occasionally if ran as:

go test ./... -count 10

This PR does not appear to introduce any regression to the current tests' behaviour.

State:
- Add WaitStatus support that allows a task to wait until further action
  to continue its execution. The WaitStatus is treated mostly as
  DoneStatus, except it is not a ready status.
- Add CopyState function.
- Add Change.AbortUnreadyLanes.
- Add Change.CheckTaskDependencies to check if tasks have circular
  dependencies.
- Add task and change callbacks invoked on a status change.
- Implement timings.GetSaver interface in state.State. Note, however,
  the snapcore/snapd/timings package that provides the interface is not
  included in this PR. pebble has a similar "timing" package. It might
  be reasonable to introduce "timings" separately to replace the
  "timing" package to keep concerns (state and timings) separated.

State engine:
- Add StateStarterUp interface to perform possible expensive
  initialisation in a separate StartUp method.

Overlord:
- Implement the StateStarterUp interface

Daemon:
- Call Overlord StartUp during the start process to complete its
  initialisation. Note, that there is no obvious requirement to
  introduce this separation for pebble. It can still initialise Overlord
  in New(). The separation was mimicked to maintain better compatibility
  with possible future mergers of the snapcore/snapd/overlord/* changes.

Misc:
- Add jsonutil package required by the new state's CopyState function.
- Rename osutil.CanStat to osutil.FileExist.
- Rename task.FakeTime to task.MockTime.
- Add testutil.ErrorIs and testutil.DeepUnsortedMatches checkers.
Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enormous amount of work, thank you.

I have a couple of first pass comments added with a few questions here:

  1. I found that some changes pulled in here belong to commits with additional changes and comments providing the original context. It is extremely difficult for me to review some of the code without the original context explained. I wonder if some of the groups of changes would be better to cherry-pick from the original commit as a whole, e.g:
commit 4815875ea89eace1349d30fe863371e251e08ebd
overlord: only initialize loopTomb when needed
  1. I would break this PR up into smaller logical commits. It makes reviewing easier as one can select only on commit, and use that as context to understand which files are impacted?

  2. I wonder if some things like jsonutil and testutil could be moved to x-go to prevent this exercise from repeating itself on those common code areas?

Comment on lines +185 to +188
err = d.Start()
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this for error only returns to mimic d.Init() ?

	if err := d.Start(); err != nil {
		return err
	}

Comment on lines +388 to +391
func (d *Daemon) Overlord() *overlord.Overlord {
return d.overlord
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, daemon tests are bundled as part of the same package (not daemon_test). There are currently roughly 68 occurrences of d.overlord.xxx obtaining the overlord instance directly using the private struct field.

I think we should either stick to the earlier scheme, or update all the d.overlord references in tests.

}
if d.overlord == nil {
panic("internal error: no Overlord")
}

d.StartTime = time.Now()

// now perform expensive overlord/manages initialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment intends to read:
// now perform expensive overlord/manager initialization => "manager"

Comment on lines +1 to +4
// -*- Mode: Go; indent-tabs-mode: t -*-

/*
* Copyright (C) 2015-2018 Canonical Ltd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. From a previous comment I've received in Pebble, I think the preference is to drop the formatting hint, as this is mostly not needed anymore today.

  2. It looks like we could update the copyright date to current, as the patch moved it backwards actually.

Please consider everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation: testutil according to my investigation depends only on reaper and logger. It seems like this package and deps could be moved to x-go relatively easily, removing the need for future cross-project maintenance.

// It may return false on permission issues.
func CanStat(path string) bool {
func FileExists(path string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would group this rename (and the subsequent changes) in its own commit in the PR. It makes it very easy to review such a change in isolation of the rest of the PR by selecting only one commit at a time.

}

// StartUp proceeds to run any expensive Overlord or managers initialization.
// After this is done once it is a noop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// After this is done once, it is a noop.

Comment on lines +307 to +309
if o.loopTomb == nil {
o.loopTomb = new(tomb.Tomb)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some context for this? However, it does not seem to include the whole feature for lazy overlord startup.

This appears to be related to:

commit 4815875ea89eace1349d30fe863371e251e08ebd
Author: Alberto Mardegan <[email protected]>
Date:   Mon Dec 6 10:25:07 2021 +0300

    overlord: only initialize loopTomb when needed

@@ -1,7 +1,7 @@
// -*- Mode: Go; indent-tabs-mode: t -*-

/*
* Copyright (c) 2016 Canonical Ltd
* Copyright (C) 2016-2023 Canonical Ltd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lower case (c) 2023 Canonical Ltd is the norm ? New copyright notices drop the starting date.

@@ -157,7 +181,6 @@ type marshalledChange struct {
Clean bool `json:"clean,omitempty"`
Data map[string]*json.RawMessage `json:"data,omitempty"`
TaskIDs []string `json:"task-ids,omitempty"`
Lanes int `json:"lanes,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe for us to just remove? What will be the impact for current systems ?

@dmitry-lyfar
Copy link
Contributor Author

Hi @benhoyt , @flotter (cc @niemeyer).

After a discussion with @flotter over the comments above, it is suggested to try the following approach for this PR instead:

  • Separate changes into smaller logical blocks that can be reviewed and accepted independently. I’ll try to keep them close to ~500 lines.
  • Do not rename FakeTime to MockTime as in the snapd’s code base.
  • Do not introduce the StateStarterUp interface (i.e. separating the start up logic from a manager's creation in a StartUp method); all Pebble managers initialise swiftly enough at creation. Thus, we'll avoid numerous changes to the Daemon and its existing test suite.
  • Do not introduce GetMaybeTimings method to State as it involves another dependency on the snapcore/snapd/timings.GetSaver interface and is not really useful in Pebble. It appears the approaches to measuring spans in Pebble and snapd diverted substantially (perhaps, Pebble does not save/use timings now at all).

Thus, the changesets I imagine this PR to be divided into (roughly):

  • Extend testutil with testutil.ErrorIs to prepare for the move of the updated state package state, change and task data entry functions. These functions return NoStateError with a key name if a data key is not found contrary to the previously used ErrNoState. Thus, the direct comparison of a returned error with ErrNoState instead of errors.Is would fail.
  • Extend testutil with testutil.DeepUnsortedMatches. This checker is used by 2 new large TaskRunner unit tests. It’s a useful checker to have for other Pebble unit tests too.
  • (Optional) Update osutil’s CanStat -> FileExist and corresponding packages/tests using it. (CanStat, FileExists). This is something I’m on the fence with as CanStat is used twice in the state’s package code: to load and copy the state. The majority of osutil.CanStat usage is elsewhere in the Pebble’s code base. Should we, instead, just keep the state package to use CanStat and avoid noisy changes?
  • Add StartOfOperation time tracking to Overlord so that Overlord can use it to prune outdated changes (the new state package introduces the start of operation time to the state.Prune parameters so a change, when considered for pruning, will use the start of operation time as its spawn time if it was spawned before the latest overlord start).
  • Add new changes from the state package (excluding CopyState functionality). Also, update code and tests that would now fail on state entry functions error checking (patch, overlord, daemon).
  • (Optional) Add CopyState function (copy.go from the original state package) and the required safejson package. This is not used by Pebble and, given it is a completely isolated piece of functionality, it can now be dropped and introduced when/if required.

Please give me your thoughts on the above. If it looks reasonable, I can go ahead, cancel this PR and send changes in stages.

@benhoyt
Copy link
Contributor

benhoyt commented Oct 13, 2023

Yes, that looks very reasonable to me, thanks @dmitry-lyfar. Some specifics:

  • You're right that Pebble doesn't use those span timings at all now, so I think not introducing that is the right call.
  • I'm not sure why CanStat diverged from FileExists. There's only an "initial commit" in Pebble, so I suspect Gustavo went through and renamed / cleaned up a few things when he originally copied the Pebble code. I think CanStat says more what it's actually checking than FileExists, so CanStat is probably "better". So apart from the fact that it'd introduce a bunch of noise in the rest of Pebble, I'd leave that one out (and rename FileExists to CanStat in snapd if anything).
  • CopyState -- yeah, it sounds like we should wait with that one for now.

@niemeyer
Copy link
Contributor

We talked about this today in our call and agreed to break it down. Some more detailed feedback:

Separate changes into smaller logical blocks that can be reviewed and accepted independently.

Sounds good, thank you.

Do not rename FakeTime to MockTime as in the snapd’s code base.

Yes, we should go the opposite direction: snapd still needs to move towards Fake* terminology. It was mixed, and when it was moved to Pebble it was standardized.

Do not introduce the StateStarterUp interface (i.e. separating the start up logic from a manager's creation in a StartUp method); all Pebble managers initialise swiftly enough at creation. Thus, we'll avoid numerous changes to the Daemon and its existing test suite.

We do want to do this. The change in snapd was actually made on my request precisely after Pebble was split, because it makes little sense for machine start/restart logic to live inside a package that is responsible for general state management and work dispatch. So let's please integrate this change too, ideally on its own PR like is being proposed for everything else.

Do not introduce GetMaybeTimings method to State as it involves another

Why would we not introduce it? As I remember, we're not making extensive use of timers yet, but this is a nice feature that would be nice to have avaialble, and the fact we're not using this extensively makes the job easier supposedly.

Thus, the changesets I imagine this PR to be divided into (roughly):

From those descriptions it's not clear what is being proposed that is a feature that does not exist in snapd itself and what is a feature that is just being migrated. So it's harder to agree/disagree. In general: what is being ported from snapd in support of general state logic is good. What is being created inside Pebble itself, or that is being ported from snapd but is unrelated to the state package itself, not so great but we can talk about it. Feel free to provide more details so we can consider specific changes, either in their own PR or in our calls.

For now, I suggest closing this PR.

@dmitry-lyfar
Copy link
Contributor Author

We do want to do this. The change in snapd was actually made on my request precisely after Pebble was split, because it makes little sense for machine start/restart logic to live inside a package that is responsible for general state management and work dispatch. So let's please integrate this change too, ideally on its own PR like is being proposed for everything else.

Ack.

Do not introduce GetMaybeTimings method to State as it involves another

Why would we not introduce it? As I remember, we're not making extensive use of timers yet, but this is a nice feature that would be nice to have avaialble, and the fact we're not using this extensively makes the job easier supposedly.

The interface implementation itself is tiny as it stores/retrieves raw data to the state. It is largerly used by the independent timings package which I was thinking makes sense to merge with some changes that would use it as it is not directly related to the state management.

I will add the GetSaver implementation now so it can be used later when someone introduces timings and their usage in Pebble later on.

Thus, the changesets I imagine this PR to be divided into (roughly):

From those descriptions it's not clear what is being proposed that is a feature that does not exist in snapd itself and what is a feature that is just being migrated. So it's harder to agree/disagree. In general: what is being ported from snapd in support of general state logic is good. What is being created inside Pebble itself, or that is being ported from snapd but is unrelated to the state package itself, not so great but we can talk about it. Feel free to provide more details so we can consider specific changes, either in their own PR or in our calls.

Sounds good, closing this one now.

benhoyt pushed a commit that referenced this pull request Nov 20, 2023
This PR is made in preparation to merge the latest snapd state package
changes into pebble as per the plan discussed in
#259. 

Both ErrorIs and DeepUnsortedMatches checkers are used by the latest
state package test suites. These are cherry-picked from the snapd
repository with some minor changes (fixed Pebble linter's warnings,
updated license header).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants