For information on ADRs (Architecture Decision Records) see Documenting Architecture Decisions by Michael Nygard.
Status: ACCEPTED
Metabase views the world as consisting of Tables and Fields. Datomic thinks in Entities with Attributes. We need some way to map the latter model onto the former that corresponds with people’s intuition about their data.
While in theory you can mix and match attributes freely in Datomic, in
practice people’s data tends to be neatly divided into types of entities,
each with their own distinct set of attributes. This tends to roughly map to
how people use attribute namespaces in their schemas, so while there is no
explicit modeling of a entity “type”, you can informally derive that entities
mostly consisting of attributes named :user/*
are “user”-type entities.
Attributes in Datomic are namespaced keywords, we treat each namespace
(a.k.a. prefix) that occurs in the Datomic schema as a metabase Table, with
the exception of some internal prefixes, namely ["db" "db.alter" "db.excise"
"db.install" "db.sys" "fressian"]
.
An entity that has any attributes with the given prefix is mapped to a Row in this Table.
Any attribute with the given prefix, or any attribute that co-occurs with attributes with the given prefix, is mapped to a Field (column) of this Table. For attributes which have the same prefix as the table name, the field name is the attribute name without the prefix. For other attributes we use the full attribute name as the field name.
Example:
{:artist/name "Beatles"
:artist/startYear 1960
:group/location "Liverpool"}
This becomes a row in an “artist” table, with fields “name”, “startYear”, and “group/location”.
It also becomes a row in the “group” table.
If entities have attributes with multiple namespaces, then these entities occur in multiple “tables”.
When deriving Tables and Fields from a Datomic database that has a schema but no data, then only the prefixes can be examined, so each attribute only occurs as a Field in a single Table. Once data is added an attribute could become a Field in other Tables as well, and a re-sync is necessary.
To test “table membership”, we need to check if any of the given attributes
is present, so the equivalent of SELECT * FROM artists;
becomes:
{:find [?eid]
:where [[(or [?eid :artist/name] [?eid :artist/startYear])]]}
This derived schema may be sub-optimal, and it might be necessary to provide people with a way to edit the mapping.
Status: ACCEPTED
Metabase drivers perform their queries in two steps, first they convert the MBQL (Metabase Query Language) into a “native” format (typically SQL), then they execute this native query and return the results.
A Metabase user can at any point switch to a “native” query mode, where the query can be edited by hand, so a driver does not only need to support the queries it generates, but any query a user can reasonably pass it.
As the “native” representation for Datomic queries we use the map format of Datomic’s Datalog, with certain restrictions. E.g. we do not allow pull queries, as they can lead to arbitrary nesting, which isn’t suitable for the table-based representation that Metabase works with.
We need to not only support the queries we generate, but other arbitrary datalog queries as well. We need to decide and define which (classes of) queries we accept, so that the user knows which features are available when writing queries.
Status: ACCEPTED / Partially superseded by 007
We are challenged with the task of converting Metabase’s internal query language MBQL to something Datomic understands: Datalog. MBQL by and large follows SQL semantics, which is in some areas quite different from Datalog.
Consider this query:
SELECT first_name, last_name FROM users WHERE age > 18;
Naively this would translate to
[:find ?first ?last :where [?u :user/first-name ?first] [?u :user/last-name ?last] [?u :user/age ?age] [(< 18 ?age)]]
But this won’t find entities where :user/first-name
or :user/last-name
aren’t present, whereas the SQL will. You could address this with a pull query
in the :find
clause instead, but these are harder to construct
algorithmically, and harder to process, since results will now have arbitrary
nesting.
Another example is ORDER BY
, a functionality that Datlog does not provide
and must instead be performed in application code.
We need to capture these requirements in a “native” query format that the user is able to manipulate, since Metabase allows to convert any query it generates to “native” for direct editing.
In order to stick to MBQL/SQL semantics we process queries in two parts: we perform a Datalog query to fetch all entities under consideration, then do a second pass in application code, to pull out the necessary fields, and do sorting.
To this end we add two extra fields to Datalog: :select
, and :order-by
.
The first determines which fields each returned row has, so the main query
only returns entity ids and aggregates like count
, the second determines the
sorting of the result.
{:find [?eid]
:where [[(or [?eid :user/first-name]
[?eid :user/last-name]
[?eid :user/age])]
[?eid :user/age ?age]
[(< 18 ?age)]]
:select [(:user/first-name ?eid)
(:user/last-name ?eid)]
:order-by [(:user/last-name ?eid) :desc]}
We will still have to be able to handle native queries that don’t have a
:select
clause.
Status: ACCEPTED
Datalog uses logic variables (the ones that start with a question mark) in its queries. These names are normally arbitrary, but since we need to analyze the query after the fact (because of the MBQL/Native query split in Metabase), we need to be able to re-interpret these names.
In Datalog adding a field to the :find
clause has an impact on how grouping
is handled, and so we treat MBQL :fields
references differently from
:breakout
references. Fields from :breakouts
are directly put in the
:find
clause, for the ones in :fields
we only look up the entity id.
When sorting afterwards this becomes problematic, because we no longer have a unified way of looking up field references. The solution is to have enough information in the names, so that we can look up field references, either by finding them directly in the result, or by going through the Datomic entity.
A name like ?artist
refers to an entity that logically belongs to the
“artist” table.
A name like ?artist|artist|name
refers to the :artist/name
attribute,
used on an entity that logically belongs to the "artist"
table. The form
here is ?table|attribute-namespace|attribute-name
.
Another example: ?venue|db|id
, refers to the :db/id
of a venue.
In the :select
and :order-by
clauses the form (:foo/bar ?eid)
can also
be used, which is equivalent to ?eid|foo|bar
.
We’ll have to see how this continues to behave when foreign keys and other
constructs are added to the mix, but so far this handles the use cases of
sorting with both :fields
and :breakouts
based queries, and it seems a
more future proof approach in general.
Status: ACCEPTED
References in Datomic are similar to foreign keys in RDBMS systems, but
datomic supports many-to-many references (:db.cardinality/many
), whereas
SQL requires a JOIN table to model the same thing.
This means that values in a result row can actually be sets, something Metabase is not able to handle well.
Expand cardinality/many
to its cartesian product, in other words for every
referenced entity we emit a new “row” in the result set.
If the result set is empty then we emit a single row, with the referenced
field being nil
, this mimics the behavior of an SQL left (outer) join.
This decision impacts post-processing, as we need to loop over all result rows to expand sets. It also impacts sorting, as we can only really sort after expansion.
Status: ACCEPTED
In Datomic any entity can have a :db/ident
, a symbolic identifier that can
be used interchangably with its :db/id
.
In practice this is mainly used for attributes (part of the database schema), and for “enums” or categories. E.g. gender, currency, country, etc.
In these cases showing the symbolic identifier to the user is preferable, as it carries much more information. Compare:
["Jonh" "Doe" 19483895]
vs
["Jonh" "Doe" :gender/male]
If a referenced entity has a :db/ident
, then return that (as a string),
rather than the :db/id
. This way enum-type fields are always shown with
their symbolic identifier rather than their numeric :db/id
.
This is mainly a post-processing concern. For fields that are looked via the
entity API (i.e. breakout) this is straightforward, for other cases we do
some schema introspection to see if we’re dealing with a :db/id
, and then
check if we have a :db/ident
for the given entity.
When we return these values as foreign key values, then Metabase will also hand them back to us when constructing filter clauses, so we need to convert them back to keywords so Datomic recognizes them as idents.
Status: ACCEPTED
Datomic does not have nil
values for attributes, either an attribute is
present on an entity or it is not, but it can not be present with a value of
nil
. Metabase and MBQL are modeled on RDBMS/SQL semantics, where fields can
be NULL
. We want to mimic this by treating missing attributes as if they
are attributes with nil
values.
A datalog :where
form like [?venue :venue/name ?venue|name]
does two
things, it binds the ?venue
and ?venue|name
logical variables (lvars),
and it filters the result set (unification). This means that entities where
the :venue/name
is absent (conceptually nil
), these entities will not
show up in the result.
We partially worked around this in 003 by avoiding using binding forms to pull out attribute values, and instead adopted a two step process where we first query for entity ids only, then use the entity API to pull out the necessary attributes.
This turned out to be less than ideal, because it made it hard to implement filtering operations correctly.
Instead we do bind attribute values to lvars, but we use get-else
and
nil
placeholder value to deal with missing attributes.
[:where
[(get-else $ ?venue :venue/name ::nil) ?venue/name]]
This way we get the binding inside the query, and can implement filtering
operations inside the query based on those values. It also means :fields
and :breakout
based queries differ less than before, and so share more
code and logic.
For reference attributes with :cardinality/many get-else
is not supported,
in these cases we fall back to emulating get-else
with an or-join
.
In this case using a special keyword as a placeholder does not work, Datomic
expects the lvar to be bound to an integer (a possible :db/id
), so instead
of ::nil
as a placeholder, we use Long/MIN_VALUE
.
(or-join [?ledger ?ledger|ledger|tax-entries]
[?ledger :ledger/tax-entries ?ledger|ledger|tax-entries]
(and [(missing? $ ?ledger :ledger/tax-entries)]
[(ground Long/MIN_VALUE) ?ledger|ledger|tax-entries]))
Note that the two-step process from 003 is still in place, we still honor the
extra :select
clause to perform post-processing on the result, but it isn’t
used as extensively as before, as most of the values we need are directly
returned in the query result.
Once this was implemented implemeting filter operations was a breeze. This
adds some new post-processing because we need to replace the placeholders
with actual nil
values again.
Query generating code needs to pay attention to use the necessary helper functions to set up lvar bindings, so these semantics are preserved.
Status: ACCEPTED
Datomic / Datalog does not explicitly handle nil
. To emulate the behavior
of an RDBMS where fields and relationships can be NULL
we use placeholder
values in our queries, so that in the result set we can find these sentinels
and replace them with nil
.
So far we used Long/MIN_VALUE
for refs, as they need to resemble Datomic
ids, and the special keyword ::nil
(i.e.
:metabase.driver.datomic.query-processor/nil
) elsewhere.
This turned out to be too simplistic, because when combined with aggregate functions Datomic attempts to group values, which means it needs to be able to compare values, and when it does that with values of disparate types, things blow up.
We look at the database type of the attribute when choosing an appropriate
substitute value for nil
. This can be the keyword used before, the same
keyword turned into a string, Long/MIN_VALUE
for numeric types and refs,
and a specially crafted datetime for inst fields.
There is one big downside to this approach, if an attribute in the database
actually has the placeholder value as a value, it will be shown as
~nil~/absent. The keyword/string versions are namespaced and so unlikely to
cause problems, the datetime is set at the start of the common era, so only
potentially problematic when dealing with ancient history (and even then it
needs to match to the millisecond), but number fields that contain
Long/MIN_VALUE
may actually occur in the wild. We currently don’t have a
solution for this.
Status: ACCEPTED / SUPERSEDED BY XXX
In 003 we introduced an “extended datalog” format, where a :select
form can
be added to specify how to turn the result from the :find
clause into the
final results.
So far you could put the following things in :select
:
- forms that are identical to those used in
find
, the value will be copied verbatim - s-expression forms using keywords in function position, these will either
be converted to a symbol that is looked up in the
:find
clause, or if not find the entity API will be used instead to look up the attribute value
Sometimes more specific post-processing is needed, in these cases we
communicate this from the query generation step to the execution step by
using s-expressions with namespaced symbols in function position. These
symbols are used to dispatch to the select-field-form
multimethod, which
returns a function that extracts and tranforms a value from a result row.
So far we implement two of these special cases
metabase.driver.datomic.query-processor/datetime
which will do the necessary truncation on a datetime valuemetabase.driver.datomic.query-processor/field
this is used to communicate field metadata, so that we can do type specific coercions or transformations. This is used so far to transform ref:db/id
values to idents.
This elegantly provides a future-proof mechanism for communicating extra metadata to the query execution stage.
Status: ACCEPTED
While Metabase’s foreign key functionality is very handy, it is at the same time quite limited. You can only navigate one “hop” away from your source table, and can only navigate in the direction the relationship is declared.
Datomic being (among other things) a graph database makes it very easy to traverse chains of relationships, but we are stuck with the Metabase UI. Can we “hack” the UI to support more general relationship traversal.
Allow configuring “custom” relationships. These have a source “table”, a destination “table”, a “path”, and a “name”. The UI will display these as regular foreign keys, allowing you to navigate from one table to the next, but in reality they are backed by an arbitrary path lookup, including reverse lookups and Datomic rules.
To configure these you paste a snippet of EDN configuration into the admin UI.
When syncing the schema we use this information to create synthetic foreign keys, which we give the special database type “metabase.driver.datomic/path” to distinguish them from “real” foreign keys.
During query generation :field-id
of :fk->
clauses need to be checked
first, and potentially handled by a separate logic which generates the
necessary Datalog :where
binding based on the path.
This adds some special cases that need to be considered, but it also adds a tremendous amount of flexibility of use.
We try to generate queries in such a way that when an attribute is missing on
an entity that instead we returns placeholder value, which is then in
post-processing replaced with actual nil
. This way we don’t remove entities
from a result set just because the entity doesn’t have all the attributes we
expected.
We use a type-specific placeholder because Datomic internally compares result values to group results, and this fails if values returned for the same logic variable have types that can’t be compared (e.g. keyword and boolean).
The most common placeholders are
:metabase.driver.datomic.query-processor/nil
, and Long/MIN_VALUE
.
For boolean columns we use false
as the placeholder, so that we maintain
the correct filtering semantics, and so that queries don’t blow up because of
type differences.
This is different from the other cases, because it is no longer possible to
discern which entities have an actual value of false
, and which ones are
missing/~nil~. So in this case we leave the filled-in placeholders in the
result, rather than replacing them with nil
.
In other words: entities where a boolean attribute is missing will be shown
as having false
for that attribute.
Status: ACCEPTED / SUPERSEDED BY XXX