Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding goose provider and misc features #484

Closed
wants to merge 24 commits into from
Closed

feat: adding goose provider and misc features #484

wants to merge 24 commits into from

Conversation

mfridman
Copy link
Collaborator

@mfridman mfridman commented Mar 21, 2023

This will be the long-lived branch for the v4 implementation to add a Provider abstraction and other features that were awkward or impossible with the v3 package.

Fix #379

Below is a changelog of notable changes in the v4 branch. Once this is considered stable enough, we'll cut a v4.0.0-rc.1 release.

At a high level, the main objective with this breaking change is to take what we've learned about goose, all the PRs/issues that were raised, fruitful discussions, and idiomatic Go patterns that evolved over the years and make the next version the best we can. Building a solid foundation will enable us to add more features without working around the API.

Having said that, our goal is to keep goose simple while exposing a few knobs to satisfy the majority of users. We may not be able to add every feature, but collectively I hope we can build a tool we all enjoy using that does the right thing.

Added

  • goose.Provider. Takes a db *sql.DB, which allows the caller to decide on the driver and match it to a corresponding dialect. E.g., lib/pq or pgx and then goose.DialectPostgres
  • ctx support everywhere
  • Improved Create and Fix commands, added more tests
  • JSON support to CLI

Changed or Removed (breaking changes)

  • Unexport migration object, cannot run Up, Down, etc., on individual migration
  • Start using *sql.Conn internally instead of *sql.DB, in preparation for adding Lock support
  • Started using a (simple) CLI framework, ffcli from peterbourgon/ff
  • Removed GOOSE_DRIVER, the database can be derived from the DSN (using xo/dburl ). Callers of the CLI don't need to know the internal driver used, it should be sufficient to compose a proper DSN.
  • Removed goose reset and .Reset(...), callers can use down-to 0 or DownTo(ctx, 0)

Fixed

  • Improved lazy parsing of SQL files

...

Outstanding / TODOs

  • Add grouped migrations
  • Add locking implementation (postgres only, advisory lock only)
  • Add mutex to *Provider
  • Decide on lazy parsing, doing it all upfront or some combination of the two with a helper method that gives the caller control
  • Add back the CLI component which uses /v4 API. Consider using ffcli framework to compose the CLI.
  • Update goose_test.go to use t.TempDir() instead of building to the local filesystem
  • Revisit passing ctx to the Go register functions: GoMigration and GoMigrationNoTx

@mfridman mfridman changed the title V4 feat: adding goose provider and misc features Mar 21, 2023
@mfridman
Copy link
Collaborator Author

mfridman commented Mar 29, 2023

A work in progress, but here's where I got so far. The important bit is package goose is truly a package that encapsulates all the migration logic, it does not attempt to have CLI-like properties.

Instead, it returns well-defined objects that can be used to build additional programs.

All global state has been removed, with one exception, registering Go migrations. This is an acceptable exception because users will be using init() at build time.

Goose Provider

Takes an explicit dialect, a database connection, and options.

func NewProvider(dbDialect Dialect, db *sql.DB, opt Options) (*Provider, error) { ... }

The dialect defines which database we're connecting to. The caller is responsible for passing in *sql.DB, which means the goose package no longer cares what driver the caller registers, e.g., pgx or lib/pq. If the database technology (dialect) matches the driver used by the caller to instantiate *sql.DB then it should just work.

Lastly, Options holds all the configuration bits, and a goose.DefaultOptions() is available for convenience.

// Example:
db, err := sql.Open("pgx", dsn)

options := goose.DefaultOptions().
	SetDir("schema/migrations").
	SetVerbose(true).
	SetLogger(logger)
provider, err := goose.NewProvider(goose.DialectPostgres, db, options)

Each provider option is now well-documented, scoped to a single options.go file, and displays the default. Example:

// SetNoVersioning returns a new Options value with NoVersioning set to the given value.
// NoVersioning enables the ability to apply migrations without tracking the versions in the
// database schema table. Useful for seeding a database or running ad-hoc migrations.
//
// Default: false
func (o Options) SetNoVersioning(b bool) Options {
    // …
}

Once a *goose.Provider has been initialized, we have the following methods (subject to change):

Bonus, all methods take context.Context and propagate that down throughout the entire package.

(p *Provider) Ping(ctx context.Context) error
(p *Provider) Close() error

(p *Provider) ListMigrations() []Migration

(p *Provider) Up(ctx context.Context) ([]MigrationResult, error)
(p *Provider) UpByOne(ctx context.Context) (MigrationResult, error)
(p *Provider) UpTo(ctx context.Context, version int64) ([]MigrationResult, error)

(p *Provider) Down(ctx context.Context) (MigrationResult, error)
(p *Provider) DownTo(ctx context.Context, version int64) ([]MigrationResult, error)
(p *Provider) Reset(ctx context.Context) ([]MigrationResult, error)
(p *Provider) Redo(ctx context.Context) (RedoResult, error)

(p *Provider) ApplyVersion(ctx context.Context, version int64, direction bool) (MigrationResult, error)
(p *Provider) GetDBVersion(ctx context.Context) (int64, error)
(p *Provider) Status(ctx context.Context, opts *StatusOptions) ([]MigrationStatus, error)

There is now a public Migration and an internal *migration representation of a "migration".

type Migration struct {
	// Type is the type of migration (SQL or Go).
	Type MigrationType
	// Source is the full path to the migration file.
	Source string
	// Version is the parsed version from the migration file name.
	Version int64
	// Empty is true if the migration was a no-op, but was still recorded in the database (unless no
	// versioning is enabled).
	Empty bool
	// UseTx is true if the migration was applied in a transaction.
	UseTx bool
}

type migration struct {
	version int64
	source  string

	// A migration can be either a GoMigration or a SQL migration, but not both.
	// The migrationType field is used to determine which one is set.
	//
	// Note, the migration type may be sql but *sqlMigration may be nil.
	// This is because the SQL files are parsed in either the Provider
	// constructor or at the time of starting a migration operation.
	migrationType MigrationType
	goMigration   *goMigration
	sqlMigration  *sqlMigration
}

The caller no longer has access to the internal *migration representation and can no longer get a migration and apply it directly by passing a *sql.DB. Instead, this is now all handled internally. This unlocks the ability to add interesting features, not limited to, but some that come to mind:

  • Locking
  • Grouped Migrations
  • ...

This was referenced Mar 29, 2023
@mfridman
Copy link
Collaborator Author

mfridman commented Apr 1, 2023

Updated the logic to lazily parse SQL migration files. This was the original default behaviour of goose, but with an improvement.

Before, we used to parse a SQL file and apply, going on to the next one. The problem is this can leave the database in a partially migrated state if any of the subsequent files failed to parse.

Now, we determine the migrations that need to be applied up-front, parse those migrations, and only if all of them are successfully parsed do we attempt to apply them.

@mfridman mfridman added this to the Up next in v4 milestone Apr 1, 2023
@mfridman
Copy link
Collaborator Author

mfridman commented Apr 7, 2023

Something that's been bothering me about the design is the *goose.Provider is initialized with all possible options, but not all options are applicable to all commands.

This makes it a bit challenging to grok which methods use which options. Granted this may not be an issue for most users.

However, there is an upside. All the options are defined in a single place and are well-documented. E.g., the "no versioning" feature is supported by all up / down commands. But if we don't define it on the provider, it means we'd have something like this:

// UpOptions is a set of options that can be passed to Up, UpByOne, and UpTo.
type UpOptions struct{
	NoVersioning bool
}

// DownOptions is a set of options that can be passed to Down and DownTo.
type DownOptions struct{
	NoVersioning bool
}

The nice part of this design is that each command (or set of commands) can evolve independently of the Provider and other functions.

On the flip side (again) other commands, such as reset and redo, would need to take all these explicit options.

Hmm, need to sleep on this one.

@mfridman
Copy link
Collaborator Author

Added a common way to compose usage information with flags

CleanShot 2023-04-12 at 22 13 21@2x

@mfridman
Copy link
Collaborator Author

Something that still doesn't feel idiomatic is the runMigrations function.

Previously, we used to log directly within the package, so it was easy to see migration output as they were applied. And if a migration failed (after a few were applied, you would still have that output).

But now, we return a []*MigrationResult. But what should the correct behaviour be when a migration fails half-way through?

Or more concretely, say we have 3 new migrations, and we successfully run 1,2 .. but the 3rd one fails. We can't undo 1,2 because that's already been committed by the DB. One option is to continue logging directly within the package (not great, because we don't have a clean separation of concerns between package and CLI).

Or, we return a concrete error and let the caller assert if they need more information. E.g.,

type PartialError struct {
	Applied []*MigrationResult
	Failed  *Migration
	Err     error
}

This way the caller could assert the error, output the already applied migrations (1,2 in the example above) and then output an error for the failed migration.

OK   00002_rename_root.sql (1.17ms)
OK   00001_create_users_table.sql (838.58µs)

partial migration error: failed to run migration 00003_no_transaction.sql: SQL logic error: near "CREAT": syntax error (1)

Also need to think about what the --json output should be.

@mfridman
Copy link
Collaborator Author

Overall I think the /v4 code is in a descent spot with a tonne of improvements, new features, bug fixes, additional tests, etc.

I'm going to close this PR in favor of a clean slate and track the various improvements left.

@mfridman mfridman mentioned this pull request Apr 16, 2023
12 tasks
@mfridman
Copy link
Collaborator Author

Closing this in favor of #502

Still, a work in progress and most of these comments are a stream of consciousness and thoughts throughout the refactor.

At the end there will be a nicely summarized changelog of all changes (most internal). And we'll dive into each of these in the documentation and blog posts.

@mfridman mfridman closed this Apr 16, 2023
@mfridman mfridman mentioned this pull request Apr 25, 2023
7 tasks
@mfridman mfridman deleted the v4 branch September 17, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a goose provider
1 participant