D-I-L/master
D-I-L/develop
django-elastic is a Django app to run Elastic search queries.
Installation:
pip install -e git://github.com/D-I-L/django-elastic.git#egg=elastic
If you need to start a Django project:
django-admin startproject [project_name]
Add "elastic" to your
INSTALLED_APPS
insettings.py
:INSTALLED_APPS = ( ... 'elastic', )
Add the Elastic settings to the settings.py:
# elastic search engine ELASTIC = { 'default': { 'ELASTIC_URL': 'http://127.0.0.1:9200/', 'IDX': { 'MARKER': 'dbsnp142', 'DEFAULT': 'dbsnp142', }, 'TEST': 'test_suffix', 'REPOSITORY': 'my_backup', 'TEST_REPO_DIR': '/path/repos/test_snapshot/', } }
Tests can be run as follows:
./manage.py test elastic.tests
The plugin comes with some in-built loaders. These can be listed using the Django command line management tool:
./manage.py index_search --help
For example there is a generic GFF file loader:
./manage.py index_search --indexName [index name] --indexType [gff] \ --indexGFF file.gff [--isGTF]
To write custom loaders there are example loaders in the management.loaders package. These inherit from the management.loaders.loader.Loader class and can be run by extending the commands in management.commands.index_search.py.
The classes in the elastic_model
module are used to build Elastic queries.
A query can be used to retrieve search hits, get a count of the hits or
to get the index mapping. The query may also be a filter or be combined
with a filter component. This plugin attempts to provide flexibility in
the generation of queries but also provides shortcuts to common query
structures.
An ElasticQuery
object is used to build an instance of Search
.
Search.get_json_response() then runs the search request and returns
the elastic JSON results. Alternatively Search.get_result()
can be used to return a processed form of the results without
leading underscores (e.g. _type) which Django template does not like.
Search.get_count() uses the Elastic count API to return the number
of hits for a query.
Example of a filtered boolean query:
query_bool = BoolQuery() query_bool.must_not([Query.term("seqid", 2)]) \ .should(RangeQuery("start", gte=10054)) \ .should(Query.term("id", "rs373328635")) query = ElasticQuery.filtered_bool(Query.match_all(), query_bool, sources=["id", "seqid"]) search = Search(query, idx=ElasticSettings.idx('DEFAULT')) results = search.get_json_response()
An ElasticQuery
object can be built from Query
and Filter
objects. There are factory methods within ElasticQuery
and Query
classes that provide shortcuts to building common types of queries/filters.
When creating a new query the first port of call would therefore be
the factory methods in ElasticQuery
. If this does not provide the
exact components needed for the query then look into building it
from the Query
and Filter
parent and child classes.
The snapshot and restore Elastic module is used by the custom django-elastic
management commands described below. These can be used in creating and managing
repositories containing snapshots of indices.
./manage.py show_snapshot --help ./manage.py snapshot --help ./manage.py repository --help ./manage.py restore_snapshot --help
Note that apart from the repository
argument each command takes a --repo
flag to specify the repository name. If the --repo
flag is not provided the
REPOSITORY
defined in the ELASTIC
setting in settings.py
is used.
The repository
argument is used in the creation and deletion of a
repository. To create a 'test_backup' repository:
./manage.py repository test_backup --dir /path_to_backup/snapshot/test_snapshot/
To delete the 'test_backup' repository:
./manage.py repository test_backup --delete
The snapshot
argument is used is used in the creation and
deletion of a snapshot. To create a 'snapshot_1' snapshot of the
indices 'disease_region_grch38' and 'disease':
./manage.py snapshot snapshot_1 --indices disease_region_grch38,disease
To delete the 'snapshot_1' snapshot:
./manage.py snapshot snapshot_1 --delete
To copy a snapshot to an instance of Elastic on the same network, use
the url
flag to point at the other cluster to copy to:
./manage.py restore_snapshot snapshot_1 --repo tmp_restore \ --url http://cluster_host:9200
A repository can be used to copy indices to another cluster that is on a different network. To do this tar and move data to the machine with the cluster to copy the indices to. Un-tar and ensure the directory has read-write permissions for everyone (note that for a multi-node cluster make sure the file system repository is available to all nodes - /tmp is fine for a temporary repository on a single node cluster):
tar cvf /tmp/snapshot_test/test_snapshot.tar test_snapshot/ chmod a+rwx -R test_snapshot
Change the REPOSITORY
and ELASTIC_URL
settings in Django to
point at the correct Elastic cluster. Then create a new repository
that points to the snapshot repository:
./manage.py repository tmp_restore --dir /tmp/snapshot_test/test_snapshot/
View the repository and snapshot:
./manage.py show_snapshot --repo tmp_restore ./manage.py show_snapshot --all
Now use restore_snapshot
to copy the data from the repository:
./manage.py restore_snapshot snapshot_1 --repo tmp_restore \ --url http://localhost:9200
The URL parameter can be used to copy to other Elastic instances on the network. Now list the available indices to check that they have been created:
curl 'http://localhost:9200/_cat/indices?v'
Delete the repository and remove the data:
./manage.py repository tmp_restore --delete rm -rf /tmp/snapshot_test/