Elasticsearch Adapter

This page includes instructions on how to use Elasticsearch and Cloudberry to setup a small instance of TwitterMap on a local machine.

Requirements:

System: Linux or MacOS
Python 3.0+ (Please configure to run python scripts with the command: python3)
Java 8 SDK and sbt
At least 2GB memory

1. Setup Elasticsearch

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

mkdir ~/quick-start
cd ~/quick-start

Step 1.2: Download elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.tar.gz

Step 1.3: Uncompress this file

tar -xzf elasticsearch-6.7.2.tar.gz

Step 1.4: Enter `elasticsearch-6.7.2/` directory

cd elasticsearch-6.7.2/

Step 1.5: Run elasticsearch

./bin/elasticsearch
Or start on daemon mode: ./bin/elasticsearch -d -p pid
- To shutdown elasticsearch on daemon mode, kill the process ID in the pid file
  
  pkill -F pid
Wait until you see the following messages:

[INFO ][o.e.n.Node               ] [7Z9-8gl] initialized
[INFO ][o.e.n.Node               ] [7Z9-8gl] starting ...
[INFO ][o.e.t.TransportService   ] [7Z9-8gl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}INFO ][o.e.c.s.MasterService    ] [7Z9-8gl] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[INFO ][o.e.c.s.ClusterApplierService] [7Z9-8gl] new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[INFO ][o.e.h.n.Netty4HttpServerTransport] [7Z9-8gl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node               ] [7Z9-8gl] started

Step 1.6: Check the health status of your elasticsearch cluster

Open a new terminal window

curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s&pretty"

The cluster health status has to be green or yellow. If your cluster's status is red, it indicates that the specific shard is not allocated in the cluster.

2. Install Cloudberry & TwitterMap

Clone the Cloudberry Github repository

cd ~/quick-start

git clone https://github.com/ISG-ICS/cloudberry.git

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets (about 100K) data file

cd ~/quick-start/cloudberry/examples/twittermap/script/

wget http://cloudberry.ics.uci.edu/img/sample.json.zip

Note: This file is sample.json.zip, different from the sample.adm.gz file in Quick Start tutorial

Step 3.2: Ingest sample tweets into elasticsearch cluster

cd ~/quick-start/cloudberry/examples/twittermap/

./script/ingestTweetToElasticCluster.sh

When the script completes, you should see something similar to the following messages:

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   twitter.ds_tweet Gz7FQaxlQ5SNnwTUwpbkvA   4   0      99943            0     92.7mb         92.7mb

Finish ingesting data!

4. Configure Cloudberry & Twittermap

Edit file: cloudberry/cloudberry/neo/conf/application.conf

Step 4.1: Comment line 89 and 96, which are the asterixDB configurations

line 89: asterixdb.url = "http://localhost:19002/query/service"
line 96: asterixdb.lang = SQLPP

Step 4.2: Uncomment line 93 and 101, which are the elasticsearch configurations

line 93: #elasticsearch.url = "http://localhost:9200"
line 101: #asterixdb.lang = elasticsearch

Step 4.3: Update line 86 and line 87, tune `DRUM` parameters to be more friendly to ElasticSearch

line 86: berry.firstquery.gap = "60 days"
line 87: berry.query.gap = "180 days"

5. Now you can compile and run applications as normal!

Quick Start
Documentation
Advanced topics
- Database Adapters
- Enable Sidebar Live Tweets
- Realtime Tweets' Ingestion
How to Contribute
Research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch Adapter

Elasticsearch Adapter

Requirements:

1. Setup Elasticsearch

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

Step 1.2: Download elasticsearch

Step 1.3: Uncompress this file

Step 1.4: Enter `elasticsearch-6.7.2/` directory

Step 1.5: Run elasticsearch

Step 1.6: Check the health status of your elasticsearch cluster

2. Install Cloudberry & TwitterMap

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets (about 100K) data file

Step 3.2: Ingest sample tweets into elasticsearch cluster

4. Configure Cloudberry & Twittermap

Step 4.1: Comment line 89 and 96, which are the asterixDB configurations

Step 4.2: Uncomment line 93 and 101, which are the elasticsearch configurations

Step 4.3: Update line 86 and line 87, tune `DRUM` parameters to be more friendly to ElasticSearch

5. Now you can compile and run applications as normal!

Clone this wiki locally

Elasticsearch Adapter

Elasticsearch Adapter

Requirements:

1. Setup Elasticsearch

Step 1.1: Create a directory named quick-start under your home directory and enter quick-start directory:

Step 1.2: Download elasticsearch

Step 1.3: Uncompress this file

Step 1.4: Enter elasticsearch-6.7.2/ directory

Step 1.5: Run elasticsearch

Step 1.6: Check the health status of your elasticsearch cluster

2. Install Cloudberry & TwitterMap

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets (about 100K) data file

Step 3.2: Ingest sample tweets into elasticsearch cluster

4. Configure Cloudberry & Twittermap

Step 4.1: Comment line 89 and 96, which are the asterixDB configurations

Step 4.2: Uncomment line 93 and 101, which are the elasticsearch configurations

Step 4.3: Update line 86 and line 87, tune DRUM parameters to be more friendly to ElasticSearch

5. Now you can compile and run applications as normal!

Clone this wiki locally

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

Step 1.4: Enter `elasticsearch-6.7.2/` directory

Step 4.3: Update line 86 and line 87, tune `DRUM` parameters to be more friendly to ElasticSearch