Skip to content
QIUSHI BAI edited this page Aug 13, 2019 · 30 revisions

Software Architecture

Setup Twittermap

This page includes instructions on how to use Cloudberry and AsterixDB to setup a small instance of TwitterMap on a local machine. The following diagram illustrates its architecture:

System requirements:

  • Linux or Mac
  • At least 4GB memory
  • (if using Virtual Machine) At least 2 CPUs

0. Install Java and sbt

Follow these instructions to install Java and sbt.

1. Setup AsterixDB

Step 1.1: Move to your home directory:

$ cd ~

Step 1.2: Download asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip:

$ wget http://cloudberry.ics.uci.edu/img/asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip

Step 1.3: Uncompress the file:

$ unzip asterix-server-0.9.5-SNAPSHOT-binary-assembly.zip

Step 1.4: Move to apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin directory.

$ cd apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/

Step 1.5: Execute start-sample-cluster.sh to start the sample instance. Wait until you see “INFO: Cluster started and is ACTIVE.” message.

$ ./start-sample-cluster.sh 
CLUSTERDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/opt/local 
INSTALLDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/ 
LOGSDIR=/home/x/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/logs

Using Java version: 1.8.0_XX
INFO: Starting sample cluster...
Using Java version: 1.8.0_XX
INFO: Waiting up to 30 seconds for cluster 127.0.0.1:19002 to be available.
INFO: Cluster started and is ACTIVE.

Step 1.6: Execute jps to check one instance of “CCDriver” and two instances of “NCService” and “NCDriver” are running:

$ jps 
59264 NCService
59280 NCDriver
59265 CCDriver
59446 Jps
59263 NCService
59279 NCDriver

Step 1.7: Open the AsterixDB Web interface at http://localhost:19001 and issue the following query to see the AsterixDB instance is running.

Query:

select * from Metadata.`Dataverse`;

Expected result:

{ "Dataverse": { "DataverseName": "Default", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp":0}} 
{ "Dataverse": { "DataverseName": "Metadata", "DataFormat": "org.apache.asterix.runtime.formats.NonTaggedDataFormat", "Timestamp": "Wed Mar 07 16:13:37 PST 2018", "PendingOp":0}}

Note: When you want to stop AsterixDB, use the following command:

$ cd ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin
$ ./stop-sample-cluster.sh

Next time when you want to start/stop your AsterixDB instance, use the following command.

$ ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/start-sample-cluster.sh
$ ~/apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/stop-sample-cluster.sh

2. Setup Cloudberry and TwitterMap:

Step 2.1: Clone the Cloudberry Github repository.

~> git clone https://github.com/ISG-ICS/cloudberry.git

Suppose the repository is cloned to the folder ~/cloudberry.

Step 2.2: Compile and run the Cloudberry server.

~/cloudberry> cd ~/cloudberry/cloudberry
~/cloudberry> sbt compile
~/cloudberry> sbt "project neo" "run"

Note: if you see errors like the following:

[ERROR] Failed to construct terminal; falling back to unsupported
java.lang.NumberFormatException: For input string: "0x100"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.valueOf(Integer.java:766)
	... ...

it’s due to the compatibility of some versions of sbt , do the following:

Add export TERM=xterm-color to the top of /usr/share/sbt/bin/sbt.

Now the errors above should be gone. And you can continue this guide. If this doesn’t solve the above errors, please refer to this discussion to try other solutions

Wait until the shell prints the messages shown as following:

$ sbt "project neo" "run"
[info] Loading global plugins from /Users/white/.sbt/0.13/plugins
[info] Loading project definition from /Users/white/cloudberry/cloudberry/project
[info] Set current project to cloudberry (in build file:/Users/white/cloudberry/cloudberry/)
[info] Set current project to neo (in build file:/Users/white/cloudberry/cloudberry/)

--- (Running the application, auto-reloading is enabled) ---

[info] p.c.s.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000

(Server started, use Ctrl+D to stop and go back to the console...)

Step 2.3: Download and ingest the synthetic sample tweets (about 1M) data into AsterixDB.

Open a new terminal window

(1) Download the synthetic sample tweets (about 1M) data:

~/cloudberry> cd ../examples/twittermap/script/
~/script> wget http://cloudberry.ics.uci.edu/img/sample.adm.gz -O sample.adm.gz

(2) Ingest the data into AsterixDB.

~/script> cd ..
~/twittermap> ./script/ingestAllTwitterToLocalCluster.sh

When it finishes you should see the messages shown as following:

Socket 127.0.0.1:10005 - # of ingested records: 260000
Socket 127.0.0.1:10005 - # of total ingested records: 268497
>>> # of ingested records: 268497 Elapsed (s) : 2 (m) : 0 record/sec : 134248.5
>>> An ingestion process is done.
[success] Total time: 3 s, completed Nov 19, 2018 8:44:51 PM
Ingested city population dataset.

Step 2.4: Start the TwitterMap Web server (in port 9001) by running the following command in another shell:

~/twittermap> sbt "project web" "run 9001"

Wait until the shell prints the messages shown as following:

$ sbt "project web" "run 9001"
[info] Loading global plugins from /Users/white/.sbt/0.13/plugins
...
--- (Running the application, auto-reloading is enabled) ---

[info] p.c.s.NettyServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9001

(Server started, use Ctrl+D to stop and go back to the console...)

Step 2.5: Open a browser to access http://localhost:9001 to see the TwitterMap frontend. The first time you open the page, it could take up to several minutes (depending on your machine’s speed) to show the following Web page:

(Note: Firefox users have to go to about:config and change privacy.trackingprotection.enabled to false)

twittermap-screen

Congratulations! You have successfully set up TwitterMap using Cloudberry and AsterixDB!