-
Notifications
You must be signed in to change notification settings - Fork 82
Elasticsearch Adapter
This page includes instructions on how to use Elasticsearch and Cloudberry to setup a small instance of TwitterMap on a local machine.
-
System: Linux or MacOS
-
Python 3.0+ (Please configure to run python scripts with the command:
python3
) -
Java 8 SDK and sbt
-
At least 2GB memory
Step 1.1: Create a directory named quick-start
under your home directory and enter quick-start
directory:
mkdir ~/quick-start
cd ~/quick-start
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.tar.gz
tar -xzf elasticsearch-6.7.2.tar.gz
cd elasticsearch-6.7.2/
-
./bin/elasticsearch
-
Or start on daemon mode:
./bin/elasticsearch -d -p pid
-
To shutdown elasticsearch on daemon mode, kill the process ID in the pid file
pkill -F pid
-
-
Wait until you see the following messages:
[INFO ][o.e.n.Node ] [7Z9-8gl] initialized
[INFO ][o.e.n.Node ] [7Z9-8gl] starting ...
[INFO ][o.e.t.TransportService ] [7Z9-8gl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}INFO ][o.e.c.s.MasterService ] [7Z9-8gl] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[INFO ][o.e.c.s.ClusterApplierService] [7Z9-8gl] new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[INFO ][o.e.h.n.Netty4HttpServerTransport] [7Z9-8gl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node ] [7Z9-8gl] started
Open a new terminal window
-
curl -X GET "localhost:9200/_cluster/health?pretty"
The cluster health status has to be
green
oryellow
. If your cluster's status isred
, it indicates that the specific shard is not allocated in the cluster.
- Clone the Cloudberry Github repository
cd ~/quick-start
git clone https://github.com/ISG-ICS/cloudberry.git
cd ~/quick-start/cloudberry/examples/twittermap/script/
wget http://cloudberry.ics.uci.edu/img/sample.json.zip
Note: This file is sample.json.zip
, different from the sample.adm.gz
file in Quick Start tutorial
cd ~/quick-start/cloudberry/examples/twittermap/
./script/ingestTweetToElasticCluster.sh
When the script completes, you should see something similar to the following messages:
[info] Showing high-level information about indices in Elasticsearch cluster AFTER ingesting data...
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open twitter.ds_tweet fQiZx9wBQNKkqRB9fMw9Xw 4 0 73348 0 58mb 58mb
[success] Finish ingesting tweets
Edit file: ~/quick-start/cloudberry/cloudberry/neo/conf/application.conf
-
line 89:
asterixdb.url = "http://localhost:19002/query/service"
-
line 96:
asterixdb.lang = SQLPP
-
line 93:
#elasticsearch.url = "http://localhost:9200"
-
line 101:
#asterixdb.lang = elasticsearch
-
line 86:
berry.firstquery.gap = "60 days"
-
line 87:
berry.query.gap = "180 days"
Edit file: ~/quick-start/cloudberry/examples/twittermap/web/conf/application.conf
-
line 94:
startDate = "2019-01-04T18:29:23.000"
-
line 96:
endDate = "2019-11-10T09:00:23.000"