Skip to content

Release Process

Ryan Oaks edited this page Aug 14, 2023 · 81 revisions

Overview

The google and google-beta providers release new minor versions approximately once a week, and major versions approximately once a year. Patch releases are created on an as-needed basis for emergency bugfixes (such as if a popular resource has become unusable in a minor version). The google and google-beta provider versions are kept in sync.

Running a release

Releases are run by the Bug Onduty, releasing the prior BOD's tag on the Monday of their BOD week, and creating a new tag for the next week on Wednesday.

On Monday

  1. Ensure that you do this before 12:00pm PST. If you don't, make the release the next day.

  2. Confirm that the release branches are ready and the changelogs are merged - last week's onduty should have created them.

    • Check that the latest commit SHA on the release branch matches the TeamCity build number for the Tuesday night (Wednesday morning) run
    • Check the style of the changelog (the On Wednesday section has a bullet point with common violations to look for)
    • Merge the changelog PRs
  3. Kick off the release process for TPG, substituting your remote name for the provider repo if it is not upstream, typically origin

    • git pull upstream --tags
    • git checkout release-X.Y.Z
    • git tag vX.Y.Z
    • git push upstream vX.Y.Z

    This will trigger a Github Action to run the release for the tagged commit.

  4. Wait until the previous step is successful. Double-check releases.hashicorp.com/terraform-provider-google shows the current release. Failure often requires manual cleanup so it's better to not cause manual work on multiple builds.

  5. Confirm that the release notes for the published GitHub release are correct.

    • The release workflow creates the release notes automatically from the CHANGELOG, and relies on it being in a standardised format.
  6. Kick off the release process for TPGB.

    • git pull upstream --tags
    • git checkout release-X.Y.Z
    • git tag vX.Y.Z
    • git push upstream vX.Y.Z

    This will trigger a Github Action to run the release for the tagged commit.

  7. Wait until the previous step is successful. Double-check releases.hashicorp.com/terraform-provider-google-beta shows the current release.

  8. Confirm that the release notes for the published GitHub release are correct.

    • The release workflow creates the release notes automatically from the CHANGELOG, and relies on it being in a standardised format.
  9. After the release has been confirmed, update the CHANGELOG on main branch TPG & TPGB repositories with the correct headers

    • The simplest way to make this change is to use the GitHub UI and commit directly to main: TPG changelog and TPGB changelog
    • The newly released header will need the correct release date. ex: ## 3.18.0 (April 20, 2020)
    • The top of the changelog should have the next unreleased version as a heading. ex: ## 3.19.0 (Unreleased)

On Wednesday

  1. Identify the commit SHA used by the builds from this morning, for both TPG and TPGB.
    • To do this, navigate to the GA and Beta providers' nightly test projects in TeamCity Cloud.
    • Click on any of the package build configurations listed in a project to view its past builds.
    • Find the build that was triggered on today's date.
    • The commit SHA tested in the nightly tests for that provider can be found by either of these methods:
      • Looking at the build's id ; all builds triggered in nightly tests set their id as the shortened version of the commit SHA
      • Clicking on the build and searching for "build.vcs.number" in the Parameters tab
  2. For each repository, run the following release-cutting script, filling in values as appropriate.
# Fixed values- consider setting them in `.bash_profile` or `bashrc`
# REMOTE is the name of the primary repo's remote on your machine. Typically `upstream` or `origin`
REMOTE=upstream
# MM_REPO should point to your checked-out copy of the GoogleCloudPlatform/magic-modules repo
MM_REPO="path/to/magic-modules"
# https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token, no permissions
export GITHUB_TOKEN=

# Fill these in each time
# COMMIT_SHA is build.vcs.number in TeamCity
COMMIT_SHA= 
RELEASE_VERSION=4.XX.0
PREVIOUS_RELEASE_VERSION=4.XX.0

COMMIT_SHA_OF_LAST_RELEASE=`git merge-base main v${PREVIOUS_RELEASE_VERSION}`
REPO_NAME=$(basename $(git rev-parse --show-toplevel))
# use [ -n "$COMMIT_SHA" ] to make sure COMMIT_SHA is set, `git checkout` is a valid command on its own
git pull $REMOTE main --tags && [ -n "$COMMIT_SHA" ] && git checkout $COMMIT_SHA && git checkout -b release-$RELEASE_VERSION && git push -u $REMOTE release-$RELEASE_VERSION
COMMIT_SHA_OF_LAST_COMMIT_IN_CURRENT_RELEASE=`git rev-list -n 1 HEAD`
go install github.com/paultyng/changelog-gen@master
changelog-gen -repo $REPO_NAME -branch main -owner hashicorp -changelog ${MM_REPO}/.ci/changelog.tmpl -releasenote ${MM_REPO}/.ci/release-note.tmpl -no-note-label "changelog: no-release-note" $COMMIT_SHA_OF_LAST_RELEASE $COMMIT_SHA_OF_LAST_COMMIT_IN_CURRENT_RELEASE
open https://github.com/hashicorp/$REPO_NAME/edit/release-$RELEASE_VERSION/CHANGELOG.md

Note: go install github.com/paultyng/changelog-gen@91f787dc0cf1eedf016a444740cd28a4f1370e4 for Go 1.17+ Using this commit is necessary to get a version that supports the -no-note-label flag.

Note: If you see the following error: zsh: command not found: changelog-gen, make sure to change the shell from zsh to bash.

  1. We want to be reasonably confident that new releases are not introducing new bugs. Go through every test failure in the nightly builds for both GA and Beta:
    • To see all test failures for a given nightly test run:
      • Look at the Change Log tab for a given project (GA Change Log, Beta Change Log) and find the commit used to cut the release for that provider. Click on the commit SHA and navigate to the Problems & Tests tab on the new page. The page will list all test failures linked to that change.
    • In general, failing tests will fall into the following categories:
      • Old bugs / test failures. These are tests which have been failing consistently with the same error since prior to the previous release. Search for the test name (and/or error message and/or resource) in Issues and create an issue if none exists. Label as a bug if the underlying behavior is also broken, and as a "test failure" otherwise.
      • Flakey tests. These are tests which have been failing occasionally with the same error (or set of errors) for some time. It could be an issue with the test code, an inconsistent API response, or a quota issue. Search for the test name (and/or error message and/or resource) in Issues and create an issue if none exists. If you're not sure if a test is flakey, try to get it to pass in CI / locally. Label as "test failure"
      • New bugs. This is a test which you can confirm is failing due to a change in behavior that is new to this release. If the bug meets the requirements for a cherry-pick: revert the change on the release branch, and open an Issue to fix the bug prior to the next release. Be sure to let next week's Bug Onduty know this has happened so that they can make sure the fix is put in place (or take necessary action otherwise.)
  2. Update the CHANGELOG as needed
    • Ask the team if you have questions about any of their changelogs
    • The generated changelog will have a section at the top for PRs it doesn’t know what to do with, which is any PR without the “changelog: no-release-note” label and without a release note or with an unrecognised release note type
    • Any PRs that were already released (e.g., as part of a patch release) need to be removed
      • This can be done by just adding a “changelog: no-release-note” label to them
    • Any style guide violations should, as a best effort and as time permits, be fixed. The usual suspects are:
      • Not starting with a service
      • Starting with a resource instead of a service
      • Starting with a service, but including it within `` marks (this was a case of mistaken guidance telling people the wrong thing)
      • Not using the past tense--a changelog is read from the perspective of a user looking for what we did in a release, and so should be written in past tense
      • Not describing the change from a user’s perspective. Our user’s don’t know what “computed” means, so it’s not helpful to tell them we made a field computed in a release. A better way to phrase it is to focus on the user impact: “fixed spurious diffs in google_compute_instance when no description is set”.
    • After fixing style violations, make sure that the entries in each section are ordered alphabetically
    • Make sure to only include beta-only release notes in the terraform-provider-google-beta CHANGELOG
      • When you do include them, remove (beta) from the end of any beta-only release notes
  3. Double-check for semver compliance--make sure we’re not accidentally releasing breaking changes as part of a minor release, etc.
  4. Send the PR against the release branch to the team, allowing them to review the changelog. Ensure that you respond to comments and merge it before Friday so that the release shepherd next week is not delayed.

Cherrypicking

When to cherry-pick

In general cherrypicks should only be used to fix bugs with extreme impact, similar to the qualifications for a backport. For example, if Terraform failed to work entirely (i.e. a crash during plan or apply), a "perma-destroy" issue (a perma-diff on a ForceNew field), dropped resources from state inadvertently, or deleted a resource bypassing plan, a cherry-pick may be appropriate. A perma-diff alone is generally not a qualifier.

Features should generally never be cherry-picked, and should wait for the next release instead. Cherry-picking features incentivizes rushing a change in, and if we miss making an integration test run it's possible we won't catch failing tests introduced by the new feature.

Delaying a release so that it includes a change is roughly identical to a cherry-pick, and we should generally only do it for a reason we would cherry-pick.

Making a cherrypick

For both google and google-beta as appropriate, prior to creating the changelog:

  1. Identify the Git SHA of the commit you intend to include
  2. Run git cherry-pick {{sha}}
  3. Verify that the entry was added to the changelog correctly
  4. Determine the appropriate set of tests to run. As a general rule-of-thumb, consider the following categories. In the case of ambiguities, more testing should be preferred.
    1. If the change impacts provider-wide files like common helper functions, provider.go/config.go (other than resource registrations), resources referenced across services like google_compute_instance and google_storage_bucket, etc., run tests against the whole provider.
    2. If the change impacts multiple resources in a service, the central resource in a service (i.e. google_cloud_run_service in Cloud Run), or a reference value run tests against the entire service.
    3. If the change impacts a single resource file, run tests against the resource.
  5. Next week, ensure that the entry is removed from the changelog. Responsibility falls on the cherrypicker.

Backports

When to backport

Given an out-of-band change to the API that causes issues for users of old TPG releases, we're able to release backports with cherry-picked changes into past major release series (1.X, 2.X, etc.). However, there are significant downsides to backports and we want to use them only when necessary.

  • "Users using backports in N.X who wish to upgrade to the next major release series will need to upgrade to to N+1.Y, where Y is the minor/patch version that the change was introduced.
    • Massive jumps should be safe, but we've experienced in practice that there are often minor incompatibilities that can accumulate within a release. While going from X.Y -> X.Y+1 is almost always safe, jumping directly to X.Y+10 occasionally isn't.
  • Users may demand backports for issues impacting them specifically
  • Backports may cause users to stay on old major release series for longer. We don't run integration tests using old versions so they may behave unexpectedly as the API changes underneath them, particularly as defaults are added/changed or new fields added to existing nested objects. Additionally users will miss out on preventative fixes, new features, etc.

Releasing a backport immediately before another major release is especially dangerous, as it may cause users to stay 2 major releases behind HEAD. As such, we should consider these criteria when cutting one:

  • When was the last major release?
    • ~6 months after a release is probably sufficient time for users to have upgraded, and leaves a window until our next major release.
  • What is the issue? Is there a workaround?
    • Failures in Read cause users to be completely unable to work with a resource, including verifying that their current config is stable to upgrade.
    • Crashes, particularly in Read, are even worse. They cause the provider to be unable to be used on entire configs without target.
  • How cleanly will the backport apply?
    • Resources like GKE may have drifted significantly since the last major version. If the backport behaves subtly differently than the code @ HEAD, users may have no viable alternative.

Cutting a backport

  1. Create a release-X.Y.Z+1 based off the last release in the major release series.
  2. Apply the backported commit with git cherry-pick {{SHA}} based off of the merged commit on main.
  3. Add an appropriate changelog. Add a note including which future release includes the fix. eg: 2.20.3 is a backport release, and some changes will not appear in 3.X series releases until 3.11.0..
  4. Update the upgrade guide for that release with the same note.
  5. When releasing a backport to 2.20.X, replace PROTOCOL_VERSION and PROTOCOL_VERSIONS in /tcrelease with PROTOCOL_VERSION=4 PROTOCOL_VERSIONS="4.0, 5.0" (the space between 4.0, 5.0 is very important!)
Clone this wiki locally