Skip to content

Commit

Permalink
Continue cleanup (#339)
Browse files Browse the repository at this point in the history
  • Loading branch information
JavierJia authored May 6, 2017
1 parent 081b66d commit 337e574
Show file tree
Hide file tree
Showing 9 changed files with 35 additions and 23 deletions.
Empty file removed .gitattributes
Empty file.
17 changes: 12 additions & 5 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ The second command will download and run a prebuilt AsterixDB docker container f
After it finishes, you should see the messages as shown in the following screenshot:
![docker][docker]

**Step 4**: Run the following command to ingest sample tweets (about 324K) and US population data into AsterixDB.
**Step 4**: Run the following command to ingest sample tweets (about 47K) and US population data into AsterixDB.


```
~/cloudberry> ./script/ingestAllTwitterToLocalCluster.sh
```

This step is downloading about 70MB of data, and it may take 5 minutes, again, depending on your network speed. You should see the messages as shown in the following screenshot:
When it finishes you should see the messages as shown in the following screenshot:
![ingestion][ingestion]

**Step 5**: Compile and run the Cloudberry server.
Expand Down Expand Up @@ -80,17 +80,24 @@ The instructions above assume that we use an AsterixDB instance in a Docker cont

**Step 8**: Follow the instructions on the [AsterixDB Installation Guide](https://ci.apache.org/projects/asterixdb/index.html) to install an AsterixDB cluster. Select your preferred installation option.

**Step 9**: Ingest twitter data.
**Step 9**: Ingest twitter data to AsterixDB

**Step 10**: Change the Cloudberry middleware configuration to connect to this new AsterixDB cluster. You can modify the AsterixDB hostname in the configuration file `neo/conf/application.conf` and change the `asterixdb.url` value to the AsterixDB hostname.
You need to give the RESTFul API link of the AsterixDB cluster and one of its NC names to the ingestion script as following:

```
~/cloudberry> ./script/ingestAllTwitterToLocalCluster.sh http://YourAsterixDBServerIP:19002/aql ONE_OF_NC_NAMES
```

**Step 10**: Change the Cloudberry middleware configuration to connect to this new AsterixDB cluster.
You can modify the AsterixDB hostname in the configuration file `neo/conf/application.conf` by changing the `asterixdb.url` value.

```
asterixdb.url = "http://YourAsterixDBHostName:19002/query/service"
```

## Build your own application

For more information about Cloudberry, please read its [documentation](/documentation).
TwitterMap is one example of how to use Cloudberry. To develop your own application, please find more information in [documentation](/documentation).

[architecture]: /img/quick-start-architecture.png
{: width="800px"}
Expand Down
Empty file removed script/.gitattributes
Empty file.
2 changes: 1 addition & 1 deletion script/dockerClean.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
#clean up the existing images
docker stop -f cc nc1
docker stop cc nc1
docker rm -f cc nc1
docker volume rm dbstore
# remove the local image to fetch the newest remote version
Expand Down
6 changes: 5 additions & 1 deletion script/fileFeed.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
#!/usr/bin/env bash
link=${1-"localhost"}
host=$(basename $(dirname $link))
host=${host%%:*}
port=${2-"10001"}
sbt "project noah" --error "run-main edu.uci.ics.cloudberry.noah.feed.FileFeedDriver \
-u localhost -p ${1:-10001}"
-u $host -p $port"
5 changes: 3 additions & 2 deletions script/ingestAllTwitterToLocalCluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
#===============================================================================

host=${1:-'http://localhost:19002/aql'}
nc=${2:-"nc1"}
echo "Ingesting sample tweets..."
./script/ingestTwitterToLocalCluster.sh $host
./script/ingestTwitterToLocalCluster.sh $host $nc

echo "Ingesting population data..."
./script/ingestPopulationToLocalCluster.sh $host
./script/ingestPopulationToLocalCluster.sh $host $nc

echo "Data ingestion completed!"
15 changes: 8 additions & 7 deletions script/ingestPopulationToLocalCluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
set -o nounset # Treat unset variables as an error

host=${1:-'http://localhost:19002/aql'}
nc=${2:-"nc1"}
# ddl to register the twitter dataset
cat <<'EOF' | curl -XPOST --data-binary @- $host
cat <<EOF | curl -XPOST --data-binary @- $host
use dataverse twitter;
create type typeStatePopulation if not exists as open{
name:string,
Expand Down Expand Up @@ -52,7 +53,7 @@ create dataset dsCityPopulation(typeCityPopulation) if not exists primary key ci
create feed StatePopulationFeed using socket_adapter
(
("sockets"="nc1:10002"),
("sockets"="$nc:10002"),
("address-type"="nc"),
("type-name"="typeStatePopulation"),
("format"="adm")
Expand All @@ -62,7 +63,7 @@ start feed StatePopulationFeed;
create feed CountyPopulationFeed using socket_adapter
(
("sockets"="nc1:10003"),
("sockets"="$nc:10003"),
("address-type"="nc"),
("type-name"="typeCountyPopulation"),
("format"="adm")
Expand All @@ -72,7 +73,7 @@ start feed CountyPopulationFeed;
create feed CityPopulationFeed using socket_adapter
(
("sockets"="nc1:10004"),
("sockets"="$nc:10004"),
("address-type"="nc"),
("type-name"="typeCityPopulation"),
("format"="adm")
Expand All @@ -83,13 +84,13 @@ EOF

echo 'Created population datasets in AsterixDB.'
#Serve socket feed using local file
cat ./noah/src/main/resources/population/adm/allStatePopulation.adm | ./script/fileFeed.sh 10002
cat ./noah/src/main/resources/population/adm/allStatePopulation.adm | ./script/fileFeed.sh $host 10002
echo 'Ingested state population dataset.'

cat ./noah/src/main/resources/population/adm/allCountyPopulation.adm | ./script/fileFeed.sh 10003
cat ./noah/src/main/resources/population/adm/allCountyPopulation.adm | ./script/fileFeed.sh $host 10003
echo 'Ingested county population dataset.'

cat ./noah/src/main/resources/population/adm/allCityPopulation.adm | ./script/fileFeed.sh 10004
cat ./noah/src/main/resources/population/adm/allCityPopulation.adm | ./script/fileFeed.sh $host 10004
echo 'Ingested city population dataset.'

cat <<'EOF' | curl -XPOST --data-binary @- $host
Expand Down
13 changes: 6 additions & 7 deletions script/ingestTwitterToLocalCluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
set -o nounset # Treat unset variables as an error

# ddl to register the twitter dataset
host=${1:-'http://localhost:19002/aql'}
cat <<'EOF' | curl -XPOST --data-binary @- $host
host=${1:-"http://localhost:19002/aql"}
nc=${2:-"nc1"}
cat <<EOF | curl -XPOST --data-binary @- $host
drop dataverse twitter if exists;
create dataverse twitter if not exists;
use dataverse twitter
Expand Down Expand Up @@ -84,7 +85,7 @@ create index text_idx if not exists on ds_tweet("text") type fulltext;
create feed TweetFeed using socket_adapter
(
("sockets"="nc1:10001"),
("sockets"="$nc:10001"),
("address-type"="nc"),
("type-name"="typeTweet"),
("format"="adm")
Expand All @@ -94,11 +95,9 @@ start feed TweetFeed;
EOF


[ -f ./script/sample.adm.gz ] || { echo "Downloading the data..."; ./script/getSampleTweetsFromGDrive.sh; }
#Serve socket feed using local file
#git lfs fetch
#[ -f ./script/sample.adm.gz ] || { echo "Downloading the data..."; ./script/getSampleTweetsFromGDrive.sh; }

echo "Start ingestion ..."
gunzip -c ./script/sample.adm.gz | ./script/fileFeed.sh
gunzip -c ./script/sample.adm.gz | ./script/fileFeed.sh $host 10001
echo "Ingested sample tweets."

Binary file added script/sample.adm.gz
Binary file not shown.

0 comments on commit 337e574

Please sign in to comment.