Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

A mongoose plugin that indexes models into Elasticsearch 2 / 5 / 6 and 7

Notifications You must be signed in to change notification settings

jbdemonte/mongoose-elasticsearch-xp

Repository files navigation

mongoose-elasticsearch-xp

NPM version Build Status Coverage Status

mongoose-elasticsearch-xp is a mongoose plugin that can automatically index your models into elasticsearch. This plugin is compatible with Elasticsearch version 2,5,6 and 7.

Prerequisite

mongoose-elasticsearch-xp requires:

  • mongoose 4.9.0, 5.0.0 or later
  • elasticsearch 2.0, 5.0, 6.0, 7.0 or later

Why this plugin?

Although mongoosastic is a great tool, it didn't fit my needs. I needed something more flexible and up to date than mongoosastic.
I started by sending some pull requests to mongoosastic. When I was facing to a full rewrite need, I choosed to start a new project based on the mongoosastic idea / syntax.
This plugin handle both callback and promise syntaxes. It uses the mongoose Promise which can be redefined

Installation

The latest version of this package will be as close as possible to the latest elasticsearch and mongoose packages.

npm install --save mongoose-elasticsearch-xp

Important

This plugin is configured to work with the latest version (7.x.y). In order to use it with Elasticsearch 2.x.y, you need to use the v2 version:

It is very strongly recommended to fix your version by using the require with the elastic to prevent breaking changes

var mexp = require('mongoose-elasticsearch-xp').v2;

Likewise for .v5, .v6, and .v7, v5 is default for now.

The examples below use the version 5 syntax.

Limitation

  • This plugin requires mongoose object to be indexed, not lean object
  • Indexing using findOneAndUpdate requires {new: true} as options to be updated, else previous data will be saved

Setup

Model.plugin(mongoose-elasticsearch-xp, options)

Options are:

  • index - the index in Elasticsearch to use. Defaults to the collection name.
  • type - the type this model represents in Elasticsearch. Defaults to the model name. It may be a function (modelName) => typeName.
  • client - an existing Elasticsearch Client instance.
  • hosts - an array hosts Elasticsearch is running on.
  • host - the host Elasticsearch is running on.
  • port - the port Elasticsearch is running on.
  • auth - the authentication needed to reach Elasticsearch server. In the standard format of 'username:password'.
  • protocol - the protocol the Elasticsearch server uses. Defaults to http.
  • hydrate - whether or not to replace ES source by mongo document.
  • filter - the function used for filtered indexing.
  • transform - the function used for transforming a document before indexing it, accepts the document as an argument, expects transformed document to be returned (if returned value is falsy, the original document will be used).
  • idsOnly - whether or not returning only mongo ids in esSearch.
  • countOnly - whether or not returning only the count value in esCount.
  • mappingSettings - default settings to use with esCreateMapping.
  • refreshDelay - time in ms to wait after esRefresh. Defaults to 0.
  • script - whether or not the inline script are enabled in elasticsearch. Defaults to false.
  • bulk - options to use when synchronising.
  • bulk.batch - batchSize to use on synchronise options. Defaults to 50.
  • bulk.size - bulk element count to wait before calling client.bulk function. Defaults to 1000.
  • bulk.delay - idle time to wait before calling the client.bulk function. Defaults to 1000.
  • onlyOnDemandIndexing - whether or not to demand indexing on CRUD operations. If set to true middleware hooks for save, update, delete do not fire. Defaults to false.

To have a model indexed into Elasticsearch simply add the plugin.

var mongoose = require('mongoose');
var mexp = require('mongoose-elasticsearch-xp');

var UserSchema = new mongoose.Schema({
    name: String,
    email: String,
    city: String
});

UserSchema.plugin(mexp);

var User = mongoose.model('User', UserSchema);

This will by default simply use the collection name as the index while using the model name itself as the type. So if you create a new User object and save it, you can see it by navigating to http://localhost:9200/users/user/_search (this assumes Elasticsearch is running locally on port 9200).

The default behavior is all fields get indexed into Elasticsearch. This can be a little wasteful especially considering that the document is now just being duplicated between mongodb and Elasticsearch so you should consider opting to index only certain fields by specifying es_indexed on the fields you want to store:

var UserSchema = new mongoose.Schema({
    name: {type: String, es_indexed: true},
    email: String,
    city: String
});

UserSchema.plugin(mexp);

var User = mongoose.model('User', UserSchema);

In this case only the name field will be indexed for searching.

Now, by adding the plugin, the model will have a new method called esSearch which can be used to make simple to complex searches.
The esSearch method accepts standard Elasticsearch query DSL

User
  .esSearch({
    query_string: {
      query: "john"
    }
  })
  .then(function (results) {
    // results here
  });

The esSearch also handle the full Elasticsearch ...

User
  .esSearch({
    bool: {
      must: {
        match_all: {}
      },
      filter: {
        range: {
          age: {lt: 35}
        }
      }
    }
  })
  .then(function (results) {
    // results here
  });

... and Lucene syntax:

User
  .esSearch("name:john")
  .then(function (results) {
    // results here
  });

To connect to more than one host, you can use an array of hosts.

MyModel.plugin(mexp, {
  hosts: [
    'localhost: 9200',
    'anotherhost: 9200'
  ]
})

Also, you can re-use an existing Elasticsearch Client instance

var esClient = new elasticsearch.Client({host: 'localhost: 9200'});

MyModel.plugin(mexp, {
  client: esClient
});

Indexing

Saving a document

The indexing takes place after saving inside the mongodb and is a deferred process. One can check the end of the indexion catching es-indexed event. This event is emitted both from the document and the model (which make unit tests easier).

doc.save(function (err) {
  if (err) throw err;
  /* Document indexation on going */
  doc.on('es-indexed', function (err, res) {
    if (err) throw err;
    /* Document is indexed */
    });
  });

Indexing Nested Models

In order to index nested models you can refer following example.

var CommentSchema = new mongoose.Schema({
    title: String,
    body: String,
    author: String
});

var UserSchema = new mongoose.Schema({
    name: {type: String, es_indexed: true},
    email: String,
    city: String,
    comments: {type: [CommentSchema], es_indexed: true}
});

UserSchema.plugin(mexp);

var User = mongoose.model('User', UserSchema);

Indexing Populated Models

To index populated models (ref model), it is mandatory to provide a schema to explain what to index in the es_type key. This plugin will never populate models by its own, you have to populate the models.

var CountrySchema = new mongoose.Schema({
    name: String,
    code: String
});

var Country = mongoose.model('Country', CountrySchema);

var CitySchema = new mongoose.Schema({
    name: String,
    pos: {
        type: [Number],
        index: '2dsphere'
    },
    country: {
        type: mongoose.Schema.Types.ObjectId,
        ref: 'Country'
    }
});

var City = mongoose.model('City', CitySchema);

var UserSchema = new mongoose.Schema({
    name: String,
    city: {
        type: mongoose.Schema.Types.ObjectId,
        ref: 'City',
        es_type: {
            name: {
                es_type: 'string'
            },
            pos: {
                es_type: 'geo_point'
            },
            country: {
                es_type: {
                    name: {
                        es_type: 'string'
                    },
                    code: {
                        es_type: 'string'
                    }
                }
            }
        }
    }
});

UserSchema.plugin(mexp);

var User = mongoose.model('User', UserSchema);

Indexing An Existing Collection

Already have a mongodb collection that you'd like to index using this plugin? No problem! Simply call the esSynchronize method on your model to open a mongoose stream and start indexing documents individually.

var BookSchema = new mongoose.Schema({
  title: String
});

BookSchema.plugin(mexp);

var Book = mongoose.model('Book', BookSchema);

Book.on('es-bulk-sent', function () {
  console.log('buffer sent');
});

Book.on('es-bulk-data', function (doc) {
  console.log('Adding ' + doc.title);
});

Book.on('es-bulk-error', function (err) {
  console.error(err);
});

Book
  .esSynchronize()
  .then(function () {
    console.log('end.');
  });

esSynchronise use same parameters as find method or alternatively you can pass a mongoose query instance in order to use any specific methods like .populate(). It allows to synchronize a subset of documents, modifying the default projection...

Book
  .esSynchronize({author: 'Arthur C. Clarke'}, '+resume')
  .then(function () {
    console.log('end.');
  });
// using a mongoose query instance, populating the author `ref`
const query = Book.find({author: 'Arthur C. Clarke'}).populate('author')
Book
  .esSynchronize(query, '+resume')
  .then(function () {
    console.log('end.');
  });

Filtered Indexing

You can specify a filter function to index a model to Elasticsearch based on some specific conditions. If document satisfies conditions it will be added to the elastic index. If not, it will be removed from index.

Filtering function must return True for conditions that will be indexing to Elasticsearch (like Array.filter & unlike moogoosastic.filter)

var MovieSchema = new mongoose.Schema({
  title: {type: String},
  genre: {type: String, enum: ['horror', 'action', 'adventure', 'other']}
});

MovieSchema.plugin(mexp, {
  filter: function (doc) {
    return doc.genre === 'action';
  }
});

Transforming a document before indexing

You can specify a function to transform a document before indexing it in ElasticSearch.

var MovieSchema = new mongoose.Schema({
  title: {type: String},
  genre: {type: String, enum: ['horror', 'action', 'adventure', 'other']}
});

MovieSchema.plugin(mexp, {
  transform: function (doc) {
    delete doc.genre;
    return doc;
  }
});

Instances of Movie model having 'action' as their genre will be indexed to Elasticsearch.

Indexing On Demand

You can do on-demand indexes using the esIndex function

esIndex([update], [callback])

Dude.findOne({name: 'Jeffrey Lebowski', function (err, dude) {
  dude.awesome = true;
  dude.esIndex(function (err, res) {
    console.log("egads! I've been indexed!");
  });
});

update parameter allows to update a partial document (documentation). It is especially useful when dealing with not loaded properties (when setting select = false in schema properties).

Note that indexing a model does not mean it will be persisted to mongodb. Use save for that.

Unsetting fields

By default, inline scripts are disabled in Elasticsearch. In this case, unsetting fields result in setting fields to null.

Dude.findOne({name: 'Jeffrey Lebowski', function (err, dude) {
  dude.job = undefined;
  dude.save(); // => job fields will be set to null on Elasticsearch
});

If dynamic-scripting is enabled, setting script to true will use ctx._source.remove and fields will be removed in Elasticsearch.

Adding fields

es_extend allows to add some fields which does not exist in the mongoose schema. It is defined in the options of the schema definition. When adding some fields, es_type and es_value are mandatories.

var UserSchema = new mongoose.Schema(
  {
    name: String
  },
  {
    es_extend: {
      length: {
        es_type: 'integer',
        es_value: function (document) {
          return document.name.length;
        }
      }
    }
  }
);

The es_value parameter can be either a value or a function returning a value, in this case, here are its parameter:

  • document is the mongoose document

Change fields value

es_value allows to replace the value of a field. It can be either a value or a function which will return the value to index. If the type changes, it is mandatory to set the correct es_type.

var TagSchema = new mongoose.Schema({
  _id: false,
  value: String
});

var UserSchema = new mongoose.Schema({
  name: String,
  xyz: {
    type: Number,
    es_value: 123               // <= whatever the model.xyz value is, the xyz indexed will be 123 in ES
  },
  tags: {
    type: [TagSchema],
    es_type: 'string',          // <= because the type change from a TagSchema (object) to an array of string
    es_value: function (tags) {
      return tags.map(function (tag) {
        return tag.value;
      });
    }
  }
});

UserSchema.plugin(plugin);

var User = mongoose.model('User', UserSchema);

var john = new User({
  name: 'John',
  tags: [
    {value: 'cool'},
    {value: 'green'}
  ]
});

// users index will contain {"name": "John", "xyz": 123, "tags": ["cool", "green"]}

When es_value is a function, it takes theses parameters:

  • value the original value
  • context a context object

context contains:

  • document the mongoose document
  • container the container of the original value (which is equal to the document when it is not a nested object)
  • field the key name

Using with mongoose discriminators

You may save discriminator models' data in different Elasticsearch types with different mappings. To make it possible you should provide type option as a function. You will get modelName as an argument and must return type name for Elasticsearch.

// define base Schema with base Model
const BaseSchema = new mongoose.Schema({
  name: String,
});
const BaseModel = mongoose.model('Base', BaseSchema);

// define discriminator models
const UserModel = BaseModel.discriminator('User', new mongoose.Schema({
  age: Number,
}));

const AdminModel = BaseModel.discriminator('Admin', new mongoose.Schema({
  access: Boolean,
}));

// add mexp plugin to the base Schema, with `type` as a function
BaseSchema.plugin(mexp, {
  index: 'user',
  type: kind => {
    if (kind === 'User') return 'userType';
    if (kind === 'Admin') return 'adminType';
    return 'base';
  },
});

Mapping

Schemas can be configured to have special options per field. These match with the existing mapping parameters defined by Elasticsearch with the only difference being they are all prefixed by es_.

So for example. If you wanted to index a book model and have the boost for title set to 2.0 (giving it greater priority when searching) you'd define it as follows:

var BookSchema = new mongoose.Schema({
    title: {type: String, es_boost: 2.0},
    author: {type: String, es_null_value: "Unknown Author"},
    publicationDate: {type: Date, es_type: 'date'}
});

This example uses a few other mapping fields... such as null_value and type (which overrides whatever value the schema type is, useful if you want stronger typing such as float).

There are various mapping options that can be defined in Elasticsearch. Check out https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html/ for more information.

Creating Mappings On Demand

You can do on-demand create a mapping using the esCreateMapping function.

Creating the mapping is a one time operation and can be done as follows:

var UserSchema = new mongoose.Schema({
    name: String,
    email: String,
    city: String
});

UserSchema.plugin(mexp);

var User = mongoose.model('User', UserSchema);

User
  .esCreateMapping({
    "analysis" : {
      "analyzer": {
        "content": {
          "type": "custom",
          "tokenizer": "whitespace"
        }
      }
    }
  })
  .then(function (mapping) {
    // do neat things here
  });

⚠️ For v7 analysis needs to be wrapped in a settings object. Please refer to: test/es7/model-mapping.js

You'll have to manage whether or not you need to create the mapping, mongoose-elasticsearch-xp will make no assumptions and simply attempt to create the mapping. If the mapping already exists, an Exception detailing such will be populated in the err argument.

Queries

The full query DSL of Elasticsearch is exposed through the esSearch method. For example, if you wanted to find all people between ages 21 and 30:

Person
  .esSearch({
    range: {
      age: {
        from: 21,
        to: 30
      }
    }
  })
  .then(function (results) {
    // all the people who fit the age group are here!
  });

See the Elasticsearch query DSL docs for more information.

You can also specify full query:

Person
  .esSearch({
    {
      query: {
        bool: {
          must: {match_all: {}},
          filter: {range: {age: {gte: 35}}}
        }
      },
      sort: [
        {age: {order: "desc"}}
      ]
    }

  })
  .then(function (results) {
    // ...
  });

Hydration

By default objects returned from performing a search will be the objects as is in Elasticsearch. This is useful in cases where only what was indexed needs to be displayed (think a list of results) while the actual mongoose object contains the full data when viewing one of the results.

However, if you want the results to be actual mongoose objects you can provide {hydrate: true} as the second argument to a search call.

User
  .esSearch({query_string: {query: "john"}}, {hydrate: true})
  .then(function (results) {
  // results here
  });

To modify default hydratation, provide an object to hydrate instead of "true". hydrate accept {select: string, options: object, docsOnly: boolean}

User
  .esSearch({query_string: {query: "john"}}, {hydrate: {select: 'name age', options: {lean: true}}})
  .then(function (results) {
    // results here
  });

When using hydration, hits._source is replaced by hits.doc.

If you only want the models, instead of the complete ES results, use the option "docsOnly".

User
  .esSearch({query_string: {query: "john"}}, {hydrate: {select: 'name age', docsOnly; true}})
  .then(function (users) {
    // users is an array of User
  });

Hydration with population

To populate hydrated models, simply use the populate key of the hydrate object.

Use it the same way mongoose populate works (string, object, array of object).

User
  .esSearch(
    {query_string: {query: "john"}},
    {hydrate: {
      populate: {
        path: 'city',
        select: 'name'
      }
    }}
  )
  .then(function (results) {
    // results here
  });

When having different populate to handle, you can use an array of populate. In the example below, two main key are populated city and books. The sub-key book.author is also populated (mongoose feature).

User
  .esSearch(
    {query_string: {query: "john"}},
    {hydrate: {
      populate: [
        {
          path: 'city'
        },
        {
          path: 'books',
          populate: {
            path: 'author',
            select: 'name'
          }
        }
      ]
    }}
  )
  .then(function (results) {
    // results here
  });

Getting only Ids

A variant to hydration may be to get only ids instead of the complete Elasticsearch result. Using idsOnly will return the ids cast in mongoose ObjectIds.

User
  .esSearch({query_string: {query: "john"}}, {idsOnly: true})
  .then(function (ids) {
  // ids is an array of mongo id
  });

Count

The count API is available using the esCount function. It handle the same queries as the esSearch method (string query, full query...).

User
  .esCount({match: {age: 34}})
  .then(function (result) {
    // result = {
    //    "count" : 1,
    //    "_shards" : {
    //        "total" : 5,
    //        "successful" : 5,
    //        "failed" : 0
    //    }
    // }
  });

Getting only count value

Count result can be simplified to the count value using the countOnly options whether in the plugin options or in the function options.

User
  .esCount(
    {
      bool: {
        must: {match_all: {}},
        filter: {range: {age: {gte: 35}}}
      }
    },
    {countOnly: true}
  )
  .then(function (count) {
    // count is a number
  })

Refreshing model index

esRefresh explicitly refresh the model index by calling indices-refresh.

User
  .esRefresh()
  .then(function () {
    // index has been refreshed
  });

You also can provide explicit options:

User
  .esRefresh({refreshDelay: 1000})
  .then(function () {
    // index has been refreshed, and then, 1000ms has been waited
  });

Breaking changes for elastic v7.0

List

1 - Elasticsearch _Type has been removed 2 - Elasticsearch SQL 3 - Index lifecycle management 4 - Standard token filter has been removed 5 - nGram and edgeNGram token filter cannot be used on new indices should be replaces by ngram or edge_ngram 6 - Shards number on index creation is now 1 instead of 5

This library handles types fine for now but keep that in mind that they will be gone for v8.0.

Contributing

You will need a mongodb running locally either via docker or your own The tests currently write in a test collection.

Ideally you would run: (example for v7)

Your mongodb then, In one terminal: npm run docker-v7 In another: npm run test-v7

All the docker images load their own elasticsearch.yml config, In the case of es7 you might need to edit the line

network.host: 127.0.0.1

for network.host: _eth0_ in order to test locally (don't commit this file change or it will break travis).

About

A mongoose plugin that indexes models into Elasticsearch 2 / 5 / 6 and 7

Resources

Stars

Watchers

Forks

Packages

No packages published