Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Proposal: application schema coordination and versioning

Richard Newman edited this page Oct 19, 2016 · 10 revisions

Proposal: application schema coordination and versioning

This document briefly describes categories of applications and how they might coordinate vocabulary. It closes with a simple proposal for how to support such coordination.

Definitions

A datom store is a Datomish database. It consists of datoms, some of which describe vocabulary itself, and some of which use the vocabulary.

A collection of attributes — our vocabulary — is called a schema fragment. The collection of schema fragments in the datom store (including the built-in bootstrap vocabulary) constitute its schema.

An application is a piece of software that reads from or writes to a datom store. One example would be a Firefox add-on.

One application

In this case, no other application expects to be able to read from or write to the datom store.

You have three possible approaches to schema handling.

One is "head in the sand". If your vocabulary doesn't change, or only grows new attributes, then you can safely re-transact your schema fragments each time you open the datom store. This is likely to be fine during development, and perhaps even for longer periods if you're careful about your data modeling.

If your vocabulary changes, however, you will need a way to evolve it using the primitives available to you: altering attributes, renaming idents, and retracting and transacting schema fragments and data datoms.

One approach to doing so is to track some kind of version identifier outside of the datom store. This is straightforward, but prone to error when the datom store and external version identifier don't change together (e.g., when a database file is restored from backup).

Another approach is to track the version identifier inside the datom store itself. This is more foolproof, but requires vocabulary for versioning.

Multiple applications with disjoint schema

These applications can entirely pretend that the other application doesn't exist, with the exception of patterns or expressions that either query schema datoms themselves, or match against wildcard patterns. For example, the following query will behave differently when another application begins writing fulltext-indexed datoms to the datom store:

[:find ?x :in $ :where [(fulltext $ :any "some text") [[?x]]]]

Multiple applications with read-only interrelationships

This is the case when multiple applications have well-managed vocabularies. Only one piece of software claims to 'own' an attribute, and other applications can rely on it having managed the schema correctly.

This is equivalent to having multiple applications with disjoint vocabularies, with the notable exception that the 'owner' might alter the schema in such a way that the reader's assumptions are rendered incorrect.

In this situation, the above schema version approach can be used: the 'owner' performs upgrades, and the 'reader' enters an error state if it sees a schema fragment version that it doesn't understand.

Multiple applications with co-owned vocabularies

Consider two add-ons that both wish to use vocabulary like :page/url. There is no third-party add-on that can ensure that the datom store contains current vocabulary.

In this case, both add-ons take responsibility for transacting schema fragments. They also need to coordinate their upgrades and downgrades: they can't simply unilaterally decide to downgrade, because a loop can result. This strongly implies an in-store way of tracking and examining versions.

A modest proposal

We expose a simple default vocabulary for schema fragment management: :schema/name and :schema/version. Names should ideally be reverse domain notation. Versions should increment whenever a schema alteration is required. Notably, no version change is necessary when new attributes are added.

Schema names and versions are added or updated as schema fragments are transacted. Transacting the same schema fragment twice is a no-op.

Applications should ensure that no attribute is present in more than one schema fragment; an error will be thrown in that case.

These pieces of metadata are themselves stored in the datom store. Applications can listen for changes in order to pick up new vocabulary added by other applications accessing the same datom store.

The API exposes operations:

  • Check whether this schema fragment is at this version. This allows for read-only applications to adjust their behavior accordingly.
  • Ensure that this fragment is at this version; if not:
    • Create it if needed.
    • Optionally, attempt to automatically transact the difference between the two schema fragments. For example, an attribute with :db/cardinality :db.cardinality/one can always be safely altered to :db.cardinality/many.
    • Optionally, run an upgrade step from the existing version to the desired version. Typically this will prepare the store for an automatic change.
    • Finally, raise an error on failure.

This is similar to the 'user version' functionality in SQLite, with important differences:

  • The datom store's schema consists of {name, version} pairs, not a single version.
  • Schema fragments have a globally unique identifier, allowing them to be shared across applications.
  • Applications are made aware at runtime when schema fragments change.
  • Many schema changes -- adding attributes, altering indexing choices, or weakening constraints -- can be performed automatically with no need to supply migration code.

Under this proposal different applications can each ship shared schema fragments, coordinate upgrades, avoid conflicts in a large majority of cases, and safely detect real conflicts when they arise.