Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: mql specs #92

Merged
merged 25 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
41be40b
chore: draft of mql-query spec
kmruiz Nov 14, 2024
86ead3e
chore: add serialization
kmruiz Nov 14, 2024
23dc67f
Update packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md
kmruiz Nov 14, 2024
2f535e1
chore: specify that components are a sorted list
kmruiz Nov 14, 2024
d02dc8a
chore: specify that a component can be more than once in a node
kmruiz Nov 14, 2024
50f04e7
chore: fix typo
kmruiz Nov 14, 2024
f6f6fda
chore: typo
kmruiz Nov 14, 2024
655314f
chore: add the query to the table
kmruiz Nov 14, 2024
7d7f45f
chore: initial bson-type spec
kmruiz Nov 19, 2024
04c22dd
chore: type assignability table
kmruiz Nov 21, 2024
b9c4b7f
chore: small typos
kmruiz Nov 21, 2024
922d0eb
chore: java type mapping table
kmruiz Nov 21, 2024
4beefa3
chore: fixing typo
kmruiz Nov 21, 2024
aa6c360
Update packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md
kmruiz Nov 21, 2024
b17b90c
chore: simplify BsonString explanation and drop links to code
kmruiz Nov 21, 2024
42f7a90
chore: add an example for the superset
kmruiz Nov 21, 2024
af3c6b1
chore: fix typos
kmruiz Nov 22, 2024
2529027
Merge branch 'main' into chore/mql-specs
kmruiz Nov 26, 2024
3f0ac52
chore: definition of the computed types
kmruiz Nov 28, 2024
097e94b
Merge branch 'main' into chore/mql-specs
kmruiz Nov 28, 2024
39cb569
chore: components
kmruiz Nov 28, 2024
52657dc
chore: iteration 2
kmruiz Nov 28, 2024
ee54905
Merge branch 'main' into chore/mql-specs
kmruiz Feb 4, 2025
451aa59
chore: add dialects and engines
kmruiz Feb 4, 2025
b46bc41
chore: add example of type assignability
kmruiz Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# MQL BSON Type
-----------

## Abstract

This specification documents the different kinds of BSON types and how they are related to the
original source code of an [MQL Query](../mql-query/mql-query.md). This document aims to provide
information about the behaviour of dialects and linters on the computation of the original
expression BSON type.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

[BSON](https://bsonspec.org/spec.html) is a binary format that is used to communicate between the
MongoDB Client (through a driver) and a MongoDB Cluster. MQL BSON (from now on we will just say BSON)
is a superset of the original BSON types. For example some semantics, like BsonAnyOf, are not part
of the original BSON.

A BSON Type represents the data type inferred from the original source code or from a MongoDB sample
of documents. A BSON Type MUST be consumable by a MongoDB Cluster and its serialization MUST be
BSON 1.1 compliant.

### Primitive BSON Types

#### BsonString

A BsonString is a sequence of Unicode characters.

#### BsonBoolean

A BsonBoolean represents a disjoint true or false values. The actual internal encoding is left to the
original BSON 1.1 specification.

#### BsonDate

A BsonDate represents a date and a time, serializable to a UNIX timestamp. This specific type MAY be
represented differently in some dialects.

In any Java-based dialects, a BsonDate can be represented as:

* [java.util.Date](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/util/Date.html)
* [java.time.Instant](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/Instant.html)
* [java.time.LocalDate](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDate.html)
* [java.time.LocalDateTime](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDateTime.html)

#### BsonObjectId

A BsonObjectId represents a 12 bytes unique identifier for an object.

#### BsonInt32

A signed integer of 32 bits precision. In Java it's mapped to an `int` type.

#### BsonInt64

A signed integer of 64 bits precision. In Java it's mapped to both `long` and `BigInteger`.

#### BsonDouble

A 64bit floating point number. In Java it's mapped to both float and double.

#### BsonDecimal128

A 128bit floating point number. In Java it's mapped to BigDecimal.

#### BsonNull

Represents the absence of a value.

#### BsonAny

Represents any possible type. Essentially, every type is a subtype of BsonAny.

#### BsonAnyOf

Represents an union of types. For example, BsonAnyOf([BsonString, BsonInt32]).

#### BsonObject

Represents the shape of a BSON document.

#### BsonArray

Represents a list of elements of a single type. For example: [ 1, 2, 3 ] is a BsonArray.

#### ComputedBsonType

A ComputedBsonType is a type that represents an expression that happens outside the boundaries
of the user. The typical use case is for expressions defined as MQL expressions (like $expr) that
will run on a valid MongoDB Cluster.

They contain a `baseType` that is the inferred type of the result of computing the expression. In
case the `baseType` can not be inferred, it MUST be BsonAny.

### Type Assignability

Assignable types MUST not change the semantics of a query when they are swapped. Let's say that
we have a query $Q$, and two variants, $Q_A$ and $Q_B$, where $Q_A$ and $Q_B$ differ on the specified type
in either a field or a value reference.

We will say that type $A$ is assignable to type $B$ if $Q_A$ and $Q_B$ are
[equivalent queries](/main/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md#query-equivalence).

The most common usage of this property is type checking, to ensure that given a type in a query, it
does behave properly based on the type of the schema of a field.

Type assignability MAY NOT be commutative.

#### Assignability table

| ⬇️ can be assigned to ➡️ | BsonString | BsonBoolean | BsonDate | BsonObjectId | BsonInt32 | BsonInt64 | BsonDouble | BsonDecimal128 | BsonNull | BsonAny | BsonAnyOf | BsonObject | BsonArray | ComputedBsonType |
|--------------------------|:----------:|:-----------:|:--------:|:------------:|:---------:|:---------:|:----------:|:--------------:|:--------:|:-------:|:---------:|:----------:|:---------:|:-----------------|
| BsonString | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonBoolean | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDate | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonObjectId | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonInt32 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonInt64 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDouble | 🔴 | 🔴 | 🔴 | 🔴 | 🟠$^2$ | 🟠$^2$ | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDecimal128 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonNull | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonAny | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonAnyOf | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟢 | 🟠$^1$ | 🟠$^1$ | 🟠$^4$ | 🟠$^6$ |
| BsonObject | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🟠$^3$ | 🟠$^4$ | 🟠$^6$ |
| BsonArray | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^5$ | 🟠$^6$ |
| ComputedBsonType | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ |

* 🟠$^1$: $A$ is assignable to $BsonAnyOf(B)$ only if $A$ is assignable to $B$.
* 🟠$^2$: It's assignable but there might be a significant loss of precision.
* 🟠$^3$: $BsonObject A$ is assignable to $B$ if $A$ is a subset of $B$.
* 🟠$^4$: $A$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$.
* 🟠$^5$: $BsonArray(A)$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$.
* 🟠$^6$: $A$ is assignable to $ComputedBsonType(BaseType)$ only if $A$ is assignable to $BaseType$.

### Type mapping

#### Java

| Java Type | Bson Type |
|:--------------|:------------------------------------|
| null | BsonNull |
| float | BsonDouble |
| Float | BsonAnyOf(BsonNull, BsonDouble) |
| double | BsonDouble |
| Double | BsonAnyOf(BsonNull, BsonDouble) |
| BigDecimal | BsonAnyOf(BsonNull, BsonDecimal128) |
| boolean | BsonBoolean |
| short | BsonInt32 |
| Short | BsonAnyOf(BsonNull, BsonInt32) |
| int | BsonInt32 |
| Integer | BsonAnyOf(BsonNull, BsonInt32) |
| BigInteger | BsonAnyOf(BsonNull, BsonInt64) |
| long | BsonInt64 |
| Long | BsonAnyOf(BsonNull, BsonInt64) |
| CharSequence | BsonAnyOf(BsonNull, BsonString) |
| String | BsonAnyOf(BsonNull, BsonString) |
| Date | BsonAnyOf(BsonNull, BsonDate) |
| Instant | BsonAnyOf(BsonNull, BsonDate) |
| LocalDate | BsonAnyOf(BsonNull, BsonDate) |
| LocalDateTime | BsonAnyOf(BsonNull, BsonDate) |
| Collection<T> | BsonAnyOf(BsonNull, BsonArray(T)) |
| Map<K, V> | BsonAnyOf(BsonNull, BsonObject) |
| Object | BsonAnyOf(BsonNull, BsonObject) |
123 changes: 123 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# MQL Component
---------------

## Abstract

This specification documents the structure of an MQL Component from a mixed perspective of both
the original source code and the target server that might run the query. It is primarily aimed
to provide developers of [dialects](/packages/mongodb-mql-model/src/docs/md/mql-dialect/mql-dialect.md)
and [engines](/packages/mongodb-mql-model/src/docs/md/mql-engine/mql-engine.md) a common and
flexible structure for query processing.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

MQL Components (from now on just components) encapsulate units of meaning of an MQL query. Components
MAY be related to how a target MongoDB Cluster can process a query. Components MAY contain other components
or MQL Nodes.

Components are categorised as:

* Leaf components: they don't contain other components or nodes.
* Non-leaf components: they contain other components or nodes.

Components MUST be part of a Node, they are meaningless outside of it. Components MAY be found
more than once in the same node.

## List of Components

### HasAccumulatedFields

Contains a list of Nodes that represent the accumulated fields of a group operation. Each
node MUST represent one accumulated field and it's accumulator.

### HasAddedFields

Contains a list of Nodes that represent fields added to a document. For example, through the
$addFields aggregation stage. Each node MUST represent one added field.

### HasAggregation

Contains a list of Nodes, where each node MUST represent one single aggregation stage.

### HasCollectionReference

Contains information whether this query or a specific subquery targets a specific collection. The
reference MUST be one of the following variants:

* **Unknown**: there is a collection reference, but we don't know on which collection.
* **OnlyCollection**: there is a collection reference, but we only know the collection, not the full namespace.
* **Known**: both the collection and database are known.

### HasFieldReference

Contains information of a field. The field MAY be used for filtering, computing or aggregating data.
There are different variants depending on the amount of information we have at the moment of parsing the query.
The variant MUST be one of the following:

* **Unknown**: we couldn't infer any information from the field.
* **FromSchema**: the field MUST be in the schema of the target collection.
* **Inferred**: Refers to a field that is not explicitly specified in the code. For example:
Filters.eq(A) refers to the _id field.
* **Computed**: Refers to a field that is not part of the schema because it's newly computed.

### HasFilter

Contains a list of Nodes that represent the filter of a query. It MAY not contain any
node for empty queries.

### HasProjections

Contains a list of Node that represents the projections of a $project stage. It MAY not
contain any node for empty projections.

### HasSorts

Contains a list of Node that represent the sorting criteria of a $sort stage. It MAY not
contain any node if the sort criteria is still not defined.

### HasSourceDialect

Identifies the source dialect that parsed this query. It MUST be one of the valid dialects:

* Java Driver
* Spring Criteria
* Spring @Query

### HasTargetCluster

Identifies the version of the cluster that MAY run the query. It MUST be a valid released MongoDB
version.

### HasUpdates

Contains a list of Node representing updates to a document. It MAY be empty if no updates are
specified yet.

### HasValueReference

Identifies a value in a query. Usually a value is the right side of a comparison,
but it can be used in different places, like for computing aggregation expressions.

It MUST be one of these variants:

* **Unknown**: We don't have any information of the provided value.
* **Constant**: It's a value that can be resolved without evaluating it. A literal value is a constant.
* **Inferred**: It's a value that could be inferred from other operations. For example, Sort.ascending("field") would have an Inferred(1).
* **Runtime**: It's a value that could not be resolved without evaluating it, but we have enough information
to infer its runtime type. For example, a parameter from a method.
* **Computed**: Refers to a computed expression in the MongoDB Cluster, like a $expr node.

### IsCommand

References the command that will be evaluated in the MongoDB cluster. The list of
valid commands can be found in the IsCommand.kt file.

### Named

References the name of the operation that is being referenced in the node. The list
of valid names can be found in the Named.kt file.
56 changes: 56 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/mql-dialect/mql-dialect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# MQL Component
---------------

## Abstract

This specification documents the architecture of an MQL Dialect and its goal in the whole system and
how does it interact with other parts of the MQL model. It is aimed mainly to developers implementing
support for new languages and frameworks.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

MQL Dialects (from now on just dialects) represent a way to define MQL queries by some mean of source
code definition. Dialects are usually a pair of a language and a framework. For example, the Java Driver
dialect would parse queries written in Java code using the official MongoDB driver.

Dialects MUST generate valid queries for any target cluster if the query is valid in the source code. However,
the MAY generate partial queries that can not be run directly.

A dialect MAY have a parser, a formatter and a connection context extractor. A dialect also depends
on the source and the context of the dialect itself.

### Source

The Source is the token that represents part of an MQL Query in the dialect. In the Java Driver, as it
depends on IntelliJs PSI, the source would be a PsiElement. Sources, however
MAY be optional and not provided by the dialect.

### Context

The Context is an opaque data structure that represents additional data, not specified in the Source,
that is relevant to understand a query. For example, in Spring Data, the namespace of the query is
not provided in the query as is. The database is defined in a configuration file, and the collection
is specified in a Java class as an annotation.

### Parser

The parser is the actual implementation that will read code structured in the Source type. For example,
for the Java Driver, it parses code as PsiElements. For the signature of the parser, take a look at
the DialectParser interface in the [Dialect.kt](/packages/mongodb-dialects/src/main/kotlin/com/mongodb/jbplugin/dialects/Dialect.kt)
file.

### Formatter

The formatter is the component that, given a query in MQL model, knows how to generate Source tokens
that are specific to this dialect.

### ConnectionContextExtractor

The extractor is the component that, given the context of where the query is written, extracts additional
metadata relevant for the query. For example, for Spring Data, it retrieves the configuration file from
the IntelliJ Project.
47 changes: 47 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/mql-engine/mql-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# MQL Component
---------------

## Abstract

This specification documents what MQL Engines are and how they interact with the overall MQL Model
and the system.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

MQL Engines (for now on, engines) are modules that provide functionality that only depends on the
MQL Model. Their main purpose it to provide insights based on the query model to users of the engine.

Engines MUST be independent of the origin of the query: for them it MUST NOT matter if the query
is a Java query or a shell query. All engines are implement in the mongodb-mql-engines package.

Currently, there are three engines.

### Autocomplete

Autocomplete calculates the list of suggestion of possible fields given a query.

### Indexing

Suggests, based on existing queries, what are the best indexes to fulfill them.

### Linting

Analyses and warns about defects on the code. There are three linters currently:

#### FieldChecking

Detects whether the specified field in the query exists in the target collection or if the field
type provided is invalid.

#### IndexChecking

Detects whether the specified query can be covered by an index or not.

### NamespaceChecking

Detects whether the target namespace exists in the cluster.
Loading
Loading