Skip to content

Elasticsearch Adapter

Dayue Bai edited this page Nov 11, 2019 · 21 revisions

Elasticsearch Adapter

This page includes instructions on how to use Elasticsearch and Cloudberry to setup a small instance of TwitterMap on a local machine.

Requirements:

  • System: Linux or MacOS

  • Python 3.0+ (Please configure to run python scripts with the command: python3)

  • Java 8 SDK and sbt

  • At least 2GB memory

1. Setup Elasticsearch

Step 1.1: Create a directory named quick-start under your home directory and enter quick-start directory:

mkdir ~/quick-start
cd ~/quick-start

Step 1.2: Download elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.tar.gz

Step 1.3: Uncompress this file

tar -xzf elasticsearch-6.7.2.tar.gz

Step 1.4: Enter elasticsearch-6.7.2/ directory

cd elasticsearch-6.7.2/

Step 1.5: Run elasticsearch

  • ./bin/elasticsearch

  • Or start on daemon mode: ./bin/elasticsearch -d -p pid

    • To shutdown elasticsearch on daemon mode, kill the process ID in the pid file

      pkill -F pid

  • Wait until you see the following messages:

[INFO ][o.e.n.Node               ] [7Z9-8gl] initialized
[INFO ][o.e.n.Node               ] [7Z9-8gl] starting ...
[INFO ][o.e.t.TransportService   ] [7Z9-8gl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}INFO ][o.e.c.s.MasterService    ] [7Z9-8gl] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[INFO ][o.e.c.s.ClusterApplierService] [7Z9-8gl] new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[INFO ][o.e.h.n.Netty4HttpServerTransport] [7Z9-8gl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node               ] [7Z9-8gl] started

Step 1.6: Check the health status of your elasticsearch cluster

Open a new terminal window

  • curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s&pretty"

    The cluster health status has to be green or yellow. If your cluster's status is red, it indicates that the specific shard is not allocated in the cluster.

2. Install Cloudberry & TwitterMap

  • Clone the Cloudberry Github repository
cd ~/quick-start

git clone https://github.com/ISG-ICS/cloudberry.git

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets (about 100K) data file

cd ~/quick-start/cloudberry/examples/twittermap/script/

wget http://cloudberry.ics.uci.edu/img/sample.json.zip

Note: This file is sample.json.zip, different from the sample.adm.gz file in Quick Start tutorial

Step 3.2: Ingest sample tweets into elasticsearch cluster

cd ~/quick-start/cloudberry/examples/twittermap/

./script/ingestTweetToElasticCluster.sh

When the script completes, you should see something similar to the following messages:

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   twitter.ds_tweet Gz7FQaxlQ5SNnwTUwpbkvA   4   0      99943            0     92.7mb         92.7mb

Finish ingesting data!

4. Configure Cloudberry & Twittermap

Edit file: cloudberry/cloudberry/neo/conf/application.conf

Step 4.1: Comment line 89 and 96, which are the asterixDB configurations

  • line 89: asterixdb.url = "http://localhost:19002/query/service"

  • line 96: asterixdb.lang = SQLPP

Step 4.2: Uncomment line 93 and 101, which are the elasticsearch configurations

  • line 93: #elasticsearch.url = "http://localhost:9200"

  • line 101: #asterixdb.lang = elasticsearch

Step 4.3: Update line 86 and line 87, tune DRUM parameters to be more friendly to ElasticSearch

  • line 86: berry.firstquery.gap = "60 days"

  • line 87: berry.query.gap = "180 days"

5. Now you can compile and run applications as normal!