Skip to content

Commit

Permalink
Upgrade to Elasticsearch 7.x
Browse files Browse the repository at this point in the history
- Upgrade Docker image and add `discovery.type` environment variable.
- Upgrade elasticsearch-dsl dependency requirement.
- Use new Document class instead of old DocType.
- Remove document type from index, update and delete operations.
- Do not disable `_all` meta field, removed in 7.x.
- Update README.md.
  • Loading branch information
jraddaoui committed May 4, 2020
1 parent 123bb54 commit 688ba2f
Show file tree
Hide file tree
Showing 21 changed files with 68 additions and 78 deletions.
34 changes: 19 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ By default, the application has five levels of permissions:

## Technologies involved

SCOPE is a Django application that uses Elasticsearch 6.x as search engine, Celery 4.2 to process asynchronous tasks, a SQLite database and Redis as message broker (probably in the future, as cache system too).
SCOPE is a Django application that uses Elasticsearch as search engine, Celery to process asynchronous tasks, a SQLite database and Redis as message broker (probably in the future, as cache system too).

### Django, Celery and SQLite

Expand All @@ -88,9 +88,9 @@ Redis is used as broker in the current Celery implementation and it will probabl

### Elasticsearch

Elasticsearch could also be installed in the same or different servers and its URL(s) can be configured through an environment variable read in the Django settings. The application expects Elasticsearch 6.x, which requires at least Java 8 in order to run. Only Oracle’s Java and the OpenJDK are supported and the same JVM version should be used on all Elasticsearch nodes and clients.
Elasticsearch could also be installed in the same or different servers and its URL(s) can be configured through an environment variable read in the Django settings. The application requires Elasticsearch 7.x, which includes a a bundled version of [OpenJDK](http://openjdk.java.net/) from the JDK maintainers (GPLv2+CE). To use your own version of Java, see the [JVM version requirements](https://www.elastic.co/guide/en/elasticsearch/reference/7.x/setup.html#jvm-version).

The Elasticsearch node/cluster configuration can be fully customized, however, for the current implementation, a single node with the the default JVM heap size of 1GB set by Elasticsearch would be more than enough. It could even be reduced to 512MB if more memory is needed for other parts of the application or to reduce its requirements. For more info on how to change the Elasticsearch configuration check [their documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html), specially [the JVM heap size page](https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html).
The Elasticsearch node/cluster configuration can be fully customized, however, for the current implementation, a single node with the the default JVM heap size of 1GB set by Elasticsearch would be more than enough. It could even be reduced to 512MB if more memory is needed for other parts of the application or to reduce its requirements. For more info on how to change the Elasticsearch configuration check [their documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.x/settings.html), specially [the JVM heap size page](https://www.elastic.co/guide/en/elasticsearch/reference/7.x/heap-size.html).

The Elasticsearch indexes size will vary based on the application data and they will require some disk space, but it’s hard to tell how much at this point.

Expand All @@ -115,8 +115,8 @@ The following steps are just an example of how to run the application in a produ
### Requirements

* Python 3.6 to 3.8
* Elasticsearch 6.x
* Redis
* Elasticsearch 7.x
* Redis (tested with 5.x)

### Environment

Expand Down Expand Up @@ -154,9 +154,9 @@ pip install virtualenv
Install Java 8 and Elasticsearch:

```
apt-get install apt-transport-https openjdk-8-jre
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
apt-get install apt-transport-https
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt-get update
apt-get install elasticsearch
systemctl daemon-reload
Expand All @@ -167,18 +167,22 @@ systemctl enable elasticsearch
Verify Elasticsearch is running:

```
curl -XGET http://localhost:9200
curl -X GET "localhost:9200/?pretty"
{
"name" : "ofgAtrJ",
"name" : "scope",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "3h9xSrVlRJmDHgQ8FLnByA",
"cluster_uuid" : "wcacahSSSAWHfx0WOrw4jw",
"version" : {
"number" : "6.3.0",
"build_hash" : "db0d481",
"build_date" : "2017-02-09T22:05:32.386Z",
"number" : "7.6.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
"build_date" : "2020-03-26T06:34:37.794943Z",
"build_snapshot" : false,
"lucene_version" : "6.4.1"
"lucene_version" : "8.4.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@ services:
command: 'watchmedo auto-restart -p *.py -i ./.tox/* -R -- celery -A scope worker -l info'

elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4
image: docker.elastic.co/elasticsearch/elasticsearch-oss:7.6.2
environment:
- discovery.type=single-node
- bootstrap.memory_lock=true
- 'ES_JAVA_OPTS=-Xms512m -Xmx512m'
- cluster.routing.allocation.disk.threshold_enabled=false
Expand Down
2 changes: 1 addition & 1 deletion requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ django-modeltranslation==0.15
django-npm==1.0.0
djangorestframework==3.11.0
django-widget-tweaks==1.4.8
elasticsearch-dsl==6.4.0 # pyup: <7.0
elasticsearch-dsl==7.1.0 # pyup: <8.0
envparse==0.2.0
gevent==20.4.0
gunicorn==20.0.4
Expand Down
14 changes: 5 additions & 9 deletions scope/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
To connect Django models to elasticsearch-dsl documents declared in
search.documents, an AbstractEsModel has been created with the ABC and
Django model metas. The models extending AbstractEsModel must implement
an `es_doc` attribute with the related DocType class from search.documents
an `es_doc` attribute with the related Document class from search.documents
and a `get_es_data` method to transform to a dictionary representation of
the ES document.
"""
Expand Down Expand Up @@ -73,7 +73,7 @@ class AbstractModelMeta(ABCMeta, type(models.Model)):


class AbstractEsModel(models.Model, metaclass=AbstractModelMeta):
"""Abstract base model for models related to ES DocTypes."""
"""Abstract base model for models related to ES Documents."""

class Meta:
abstract = True
Expand Down Expand Up @@ -112,7 +112,7 @@ def delete(self, *args, **kwargs):
@property
@abstractmethod
def es_doc(self):
"""Related ES DocType from search.documents."""
"""Related ES Document from search.documents."""

@abstractmethod
def get_es_data(self):
Expand All @@ -127,17 +127,13 @@ def requires_es_descendants_delete(self):
"""Checks if descendants need to be updated in ES."""

def to_es_doc(self):
"""Model transformation to related DocType."""
"""Model transformation to related ES Document."""
data = self.get_es_data()
return self.es_doc(meta={"id": data.pop("_id")}, **data)

def delete_es_doc(self):
"""Call to remove related document from the ES index."""
delete_document(
index=self.es_doc._index._name,
doc_type=self.es_doc._doc_type.name,
id=self.pk,
)
delete_document(index=self.es_doc._index._name, id=self.pk)


class DublinCore(models.Model):
Expand Down
2 changes: 1 addition & 1 deletion scope/tests/test_DIP.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@


class DIPTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
User = get_user_model()
User.objects.create_user("temp", "[email protected]", "temp")
Expand Down
4 changes: 2 additions & 2 deletions scope/tests/test_api_views.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def setUp(self):
self.admin_token = Token.objects.create(user=admin)

@patch("scope.api_views.chain")
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def test_dip_stored_webhook_success(self, mock_es_save, mock_chain):
self.client.credentials(HTTP_AUTHORIZATION="Token %s" % self.admin_token.key)
origin = "http://192.168.1.128:62081"
Expand Down Expand Up @@ -73,7 +73,7 @@ def test_dip_stored_webhook_unknown_origin(self):
response.data["detail"], "SS host not configured for Origin: %s" % origin
)

@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def test_dip_stored_webhook_dip_already_exists(self, mock_es_save):
DIP.objects.create(ss_uuid=self.dip_uuid)
self.client.credentials(HTTP_AUTHORIZATION="Token %s" % self.admin_token.key)
Expand Down
2 changes: 1 addition & 1 deletion scope/tests/test_collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@


class CollectionTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
User = get_user_model()
User.objects.create_user("temp", "[email protected]", "temp")
Expand Down
2 changes: 1 addition & 1 deletion scope/tests/test_dc_deletion.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


class DcDeletionTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
dc = DublinCore.objects.create(identifier="1")
self.collection = Collection.objects.create(dc=dc)
Expand Down
2 changes: 1 addition & 1 deletion scope/tests/test_delete_by_dc_form.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@


class DcByDcFormTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
User = get_user_model()
User.objects.create_superuser("admin", "[email protected]", "admin")
Expand Down
4 changes: 2 additions & 2 deletions scope/tests/test_dip_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def _sized_tmp_file(path, size):


class DipDownloadTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
User.objects.create_superuser("admin", "[email protected]", "admin")
self.client.login(username="admin", password="admin")
Expand Down Expand Up @@ -76,7 +76,7 @@ def test_local_dip_download_zip_headers(self, mock_is_zipfile):
)
self.assertEqual(response["X-Accel-Redirect"], "/media/fake.zip")

@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
@patch("scope.views.zipfile.is_zipfile", return_value=False)
def test_local_dip_download_tar_headers(self, mock_is_zipfile, mock_es_save):
self.local_dip.objectszip = "fake.tar"
Expand Down
16 changes: 4 additions & 12 deletions scope/tests/test_es_models_save_delete.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@


class EsModelsSaveDeleteTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
dc = DublinCore.objects.create(identifier="1")
self.collection = Collection.objects.create(dc=dc)
Expand Down Expand Up @@ -64,9 +64,7 @@ def test_digital_file_delete(self, mock_es_delete, mock_send_task):
uuid = self.digital_file.uuid
self.digital_file.delete()
mock_es_delete.assert_called_with(
index=DigitalFile.es_doc._index._name,
doc_type=DigitalFile.es_doc._doc_type.name,
id=uuid,
index=DigitalFile.es_doc._index._name, id=uuid
)
mock_send_task.assert_not_called()

Expand All @@ -75,9 +73,7 @@ def test_digital_file_delete(self, mock_es_delete, mock_send_task):
def test_dip_delete(self, mock_es_delete, mock_send_task):
pk = self.dip.pk
self.dip.delete()
mock_es_delete.assert_called_with(
index=DIP.es_doc._index._name, doc_type=DIP.es_doc._doc_type.name, id=pk
)
mock_es_delete.assert_called_with(index=DIP.es_doc._index._name, id=pk)
mock_send_task.assert_called_with(
"search.tasks.delete_es_descendants", args=("DIP", 1)
)
Expand All @@ -87,11 +83,7 @@ def test_dip_delete(self, mock_es_delete, mock_send_task):
def test_collection_delete(self, mock_es_delete, mock_send_task):
pk = self.collection.pk
self.collection.delete()
mock_es_delete.assert_called_with(
index=Collection.es_doc._index._name,
doc_type=Collection.es_doc._doc_type.name,
id=pk,
)
mock_es_delete.assert_called_with(index=Collection.es_doc._index._name, id=pk)
mock_send_task.assert_called_with(
"search.tasks.delete_es_descendants", args=("Collection", 1)
)
2 changes: 1 addition & 1 deletion scope/tests/test_import_failure_display.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@


class ImportFailureDisplayTests(TestCase):
@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def setUp(self, mock_es_save):
User = get_user_model()
User.objects.create_superuser("admin", "[email protected]", "admin")
Expand Down
6 changes: 3 additions & 3 deletions scope/tests/test_models_to_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def test_collection(self):
}
self.assertEqual(doc_dict, collection.get_es_data())

# Verify DocType creation, avoid already tested transformation
# Verify Document creation, avoid already tested transformation
with patch.object(Collection, "get_es_data", return_value=doc_dict):
doc = collection.to_es_doc()
self.assertEqual(collection.pk, doc.meta.id)
Expand All @@ -48,7 +48,7 @@ def test_dip(self):
}
self.assertEqual(doc_dict, dip.get_es_data())

# Verify DocType creation, avoid already tested transformation
# Verify Document creation, avoid already tested transformation
with patch.object(DIP, "get_es_data", return_value=doc_dict):
doc = dip.to_es_doc()
self.assertEqual(dip.pk, doc.meta.id)
Expand Down Expand Up @@ -76,7 +76,7 @@ def test_digital_file(self):
}
self.assertEqual(doc_dict, digital_file.get_es_data())

# Verify DocType creation, avoid already tested transformation
# Verify Document creation, avoid already tested transformation
with patch.object(DigitalFile, "get_es_data", return_value=doc_dict):
doc = digital_file.to_es_doc()
self.assertEqual(digital_file.uuid, doc.meta.id)
Expand Down
2 changes: 1 addition & 1 deletion scope/tests/test_new_collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def test_csrf(self):
response = self.client.get(url)
self.assertContains(response, "csrfmiddlewaretoken")

@patch("elasticsearch_dsl.DocType.save")
@patch("elasticsearch_dsl.Document.save")
def test_new_topic_valid_post_data(self, mock_es_save):
# Make collection
url = reverse("new_collection")
Expand Down
Loading

0 comments on commit 688ba2f

Please sign in to comment.