Skip to content

Commit

Permalink
More documentation improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
timothyarmes committed Feb 20, 2020
1 parent 4bd1fc0 commit 14aff39
Showing 1 changed file with 68 additions and 81 deletions.
149 changes: 68 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,29 @@ This approach is often sufficient, however it suffers from two major problems:

1. The recursive nature of GraphQL means that a single request can lead to many database requests.
For example, if a request pulls in 100 results, and each of those results calls a resolver that
requires a lookup in other table, then that single query will result in 101 database fetches.
requires a further lookup in other collection, then that single query will result in 101 database fetches.
It's easy to see how we can quickly reach hundreds of lookups for a single Apollo query.

2. The second problem, and the most difficult one to solve, occurs when we need to fetch data from a
primary collection, join with a secondary collection and then sort (or filter) the results based on a
field in that second collection.
field in that *second* collection.

A typical scenario would occur when fetching data from multiple collections to display in a table where
the user can click a column to change the sort order. In this case it's not sufficient to perform a simple
Mongo `find` on the top-level collection (since the sub-fields won't be available to sort on), and as a
result just isn't possible to use the simplified GraphQL approach of fetching the joins using resolvers.

Both of these issues can be solved by performing a *single* Mongo query that joins the tables, so that we can
then sort or filter on any field.
Both of these issues can be solved by performing a *single* Mongo aggregation that fetches all the data in one go,
performing lookups on the related collections, so that we can then sort or filter on any field in the result.

*Apongo* does all the heavy lifting for you:

1. It analyses the `resolveInfo` data passed to the top-level resolver in order to extract the hierarchy of
fields that have been requested. It does this to ensure that it only performs the joins required for the
actual query.

2. From this information it builds a __single__ Mongo aggregation pipeline that recursively performs joins
for the other tables used in the request (using `$lookup`).
2. From this information it builds a __single__ Mongo aggregation pipeline that recursively performs lookups
for the other collections used in the request.

You can then include the pipeline as part of a larger aggregation pipeline that sorts and filters the result.

Expand Down Expand Up @@ -59,9 +59,8 @@ const schema = makeExecutableSchema({

## Specifying the Joins

Apongo needs to know which fields are joins, and how to join them.

A custom GraphQL directive, `@apongo`, is used to specify this information directly in the types declaration. Here's an example:
Apongo needs to know which fields are joins, and how to join them. In order to make this both easy to specify and declarative,
a custom GraphQL directive, `@apongo`, is used to specify this information directly in the types declaration. Here's an example:

```
type User {
Expand All @@ -77,29 +76,31 @@ type Query {

## Writing the Resolvers

In your resolver you'll need to call `createPipeline` to create the aggregation pipeline:
In your resolvers you'll call `createPipeline` to create the aggregation pipeline:

```
import { createPipeline } from 'apongo';
...
const users = (_, { limit = 20 }, context, resolveInfo) => new Promise((resolve, reject) => {
// Create a pipeline to first perform the match and joins, then limit the results
const users = (_, { limit = 20 }, context, resolveInfo) => {
// Create a pipeline to first perform any initial matching, then do the lookups and finally the results
const pipeline = [
// Perform any initial matching that you need
// Perform any initial matching that you need.
// This would typically depend on the parameters passed to the query.
{ $match: { type: 'client' } }
// Include all the pipeline stages generated by Apongo to do the joins
// Include all the pipeline stages generated by Apongo to do the lookups
// We pass `null` since the `users` query is mapped directly to the result
// of an aggregation on the Users collection
// of an aggregation on the Users collection.
...createPipeline(null, resolveInfo, context),
// Filter, sort or limit the result
// Filter, sort or limit the result.
{ $limit: limit },
];
// How you call Mongo will depend on your code base. You'll need to pass your pipeline to Mongo's aggregate.
// This is how you'd do it using `mongoose`
return UsersCollection.aggregate(pipeline);
});
Expand All @@ -111,12 +112,17 @@ const users = (_, { limit = 20 }, context, resolveInfo) => new Promise((resolve,

`createPipeline` is called with three parameters:

* _mainFieldName_ : The name of the field containing the result of the aggregation, or null. See below.
* _resolveInfo_ : The `resolveInfo` passed to your resolved
* _context_ : The context passed to your resolved
| Parameter | Description
| --------------- | -----------
| _mainFieldName_ | The name of the field containing the result of the aggregation, or `null` if the entire query is the result of an aggregation over a specific collection. See below.
| _resolveInfo_ | The `resolveInfo` passed to your resolver
| _context_ | The `context` passed to your resolver

This function will analyse the query and construct an aggregation pipeline to construct the lookups.

In the example above, the `users` query need to directly returns the result of an aggregation over the `Users` collection.
If the GraphQL request includes the `company` field then Apongo will fetch data from the `Companies` collection using `$lookup`.

This function will analyse the query and construct an aggregation pipeline to construct the joins. In the example above, as we aggregate the __Users__ collection it sees the request for the `companies` fields and it will
add a join to the pipeline:

```
[
Expand All @@ -143,33 +149,10 @@ By default `createPipeline` assumes that the fields in current GraphQL request m
type Query {
paginatedUsers: PaginatedUsers!
}
`;
```

Here, calling `createPipeline` within the `paginatedUsers` resolver with `null` as the `mainField` will result in slight problem:

```
[
{
'$lookup': {
from: 'users',
localField: 'users.userId', // Error - this should be 'usersId'
foreignField: '_id',
as: 'tasks.user'
}
},
{
'$unwind': { path: '$tasks.user', preserveNullAndEmptyArrays: true }
},
]
```

Apongo, having recursed into the users field of the request, will now try to look up `userId`
at `users.userId` within the current pipeline.

When we wish to create a pipeline for a specific field withing the response, we need to pass in
the name of that field:
Here, the `paginatedUsers` resolver should return two fields, `count` and `users`. `users` needs be the result an aggregation
on the `Users` collection, sp we need to tell `createPipeline` this by passing the field name to `createPipeline`:


```
Expand All @@ -188,26 +171,28 @@ See below for more information about handling pagination.

The `lookup` request accepts a number of fields:

* _collection_ : The name of the collection to join to
* _localField_ : The name of the local field used by the $lookup
* _foreignField_ : The name of foreign field used by the $lookup
* _preserveIfNull_ (Optional): Boolean to determine if the parent should should be kept if no join is found (default - true)
* _conds_ (Optional): A stringified JSON array of additional conditions use for the lookup
| Parameter | Description
| --------------------------- | -----------
| _collection_ | The name of the collection to lookup
| _localField_ | The name of the local field used by the $lookup
| _foreignField_ | The name of foreign field used by the $lookup
| _preserveIfNull_ (Optional) | Boolean to determine if the parent should should be kept if no join is found (default - `true`)
| _conds_ (Optional) | A *stringified* JSON array of additional conditions used by the lookup

Sometimes your join will need extra conditions to perform the join between the two collections. Mongo's `$lookup`
command has an advanced feature that us allows use a pipeline within the lookup. Apongo uses this feature to
Sometimes your lookup will need extra conditions to perform the join between the two collections. Mongo's `$lookup`
command has an advanced feature that allows us to use a sub-pipeline within the primary lookup. Apongo uses this feature to
allow us to supply an array of extra conditions that are used when matching the collection.

Internally, this is what get added to the pipeline within the `$lookup`:
Internally, this is what get added to the sub-pipeline within the `$lookup`:

```
and: [
{ $eq: [`$${apongo.lookup.foreignField}`, '$$localField'] },
...JSON.parse(apongo.lookup.conds),
{ $eq: [`$${foreignField}`, '$$localField'] }, // Match on the keys
...JSON.parse(apongo.lookup.conds), // Extra conditions specified in the directive
],
```

The `conds` needs to be a JSON array, but we have to stringify it to pass it to the directive in the types file.
The `conds` needs to be a JSON array, but we have to stringify it in order to pass it to the directive in the types file.

Here's an example:

Expand Down Expand Up @@ -235,7 +220,7 @@ type User {

This is useful when you need to sort or filter on a composed field as part of your pipeline.

Note that Apongo takes care of replacing fields accessed by $ with the full path to that field following any joins.
Note that Apongo takes care of replacing fields accessed by $ with the full path to that field following any lookups.

### The *expr* request

Expand Down Expand Up @@ -266,29 +251,30 @@ Wherever you need to access a field using $ you should include the token `@path`
aggregation pipeline. They are ignored by all other resolvers.

2. It's very important to understand that resolvers are __always__ called, even for fields which have already
been fetched by createPipeline. In our example above, if we provide a `company` resolver for the User type
been fetched by `createPipeline`. In our example above, if we provide a `company` resolver for the User type
then it will be called for each fetched user, even though it would have already been fetched by the aggregation.

It would be very costly to allow the server to refetch all of these fields unnecessarily, so the resolvers
need to be written to only fetch the field if it doesn't already exist in the root.
need to be written to only fetch the field if it doesn't already exist in the root object that's passed to the
resolver.

Our User resolver might look like this:

```
const User = {
// We only fetch fields that haven't been fetched by createPipeline.
// companyId comes from the database collection, company is the result fetched via the pipeline
company: ({ companyId, company }) => company || CompaniesCollection.findOne(companyId),
...
```
```
const User = {
// We only fetch fields that haven't been fetched by createPipeline.
// companyId comes from the database collection, company is the result fetched via the pipeline
company: ({ companyId, company }) => company || CompaniesCollection.findOne(companyId),
...
```

In the above example we simply test if `company` has already been fetched into the root object
(via the $lookup stage created by apongo), and if it hasn't we perform the lookup in the traditional way.
In the above example we simply test if `company` has already been fetched into the root object
(via the `$lookup` stage created by Apongo), and if it hasn't we perform the lookup in the traditional way.

There's a slight performance limitation that occurs when the $lookup returns a null value.
In that case the resolver receives null for the joined field, and it can't know that an attempt
was made to do the join. In this case we'll have to __unnecessarily__ call the database (which will again return `null`).
Such is life.
There's a slight performance limitation that occurs if the $lookup returns a null value.
In that case the resolver receives `null` for that field, and it can't know that an attempt
was made to do the join. In this case we'll have to __unnecessarily__ call the database (which will again return `null`).
Such is life.

## Recipes

Expand All @@ -309,20 +295,18 @@ By enhancing the aggregation pipeline we can do this quite easily. The types mig
type Query {
paginatedUsers: PaginatedUsers!
}
`;
```

And the resolver:

```
const paginatedUsers = (_, { limit = 20, offset = 0 }, context, resolveInfo) => new Promise((resolve, reject) => {
// Create a main pipeline to first perform the match and joins
const paginatedUsers = (_, { limit = 20, offset = 0 }, context, resolveInfo) => {
// Create a main pipeline to first perform the match and lookups
const pipeline = [
// Perform any initial matching that you need
{ $match: { type: 'client' } }
// Include all the pipeline stages generated by Apongo to do the joins
// Note that we *must* specify the field for which we're creating the pipeline
// The `users` field contains the result of aggregating over the `Users` collection.
...createPipeline('users', resolveInfo, context),
];
Expand All @@ -335,7 +319,7 @@ const paginatedUsers = (_, { limit = 20, offset = 0 }, context, resolveInfo) =>
];
// Split the main pipeline into two facets, one to return the paginated result using the pipeline
// above, and the other to get the total count of matched documents
// above, and the other to get the total count of matched documents.
pipeline.push(
{
$facet: {
Expand All @@ -347,13 +331,16 @@ const paginatedUsers = (_, { limit = 20, offset = 0 }, context, resolveInfo) =>
},
);
// Call the aggregation function. Here's how we could do that using mongoose.
return UsersCollection.aggregate(pipeline).exec().then(([{users, count}]) => {
return { tasks, count: count.length === 0 ? 0 : count[0].count };
});
});
```

### Meteor Compatibility
## FAQ

### Will this work with Meteor?

Meteor doesn't natively provide access to Mongo's aggregation command. Fortunately this oversight can be
Yes! Meteor doesn't natively provide access to Mongo's aggregation command. Fortunately this oversight can be
rectified with a this [tiny meteor package](https://github.com/meteorhacks/meteor-aggregate).

0 comments on commit 14aff39

Please sign in to comment.