Skip to content

Commit 1fc0508

Browse files
authored
Keep polishing the QuickStart page (#335)
1 parent 48436e5 commit 1fc0508

16 files changed

+163
-339
lines changed

docs/README.md

+1-229
Large diffs are not rendered by default.

docs/_config.yml

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ navbar-links:
3030
Demo :
3131
- TwitterMap: "http://cloudberry.ics.uci.edu/demos/twittermap/"
3232
Resources:
33+
- Quick Start: "quick-start"
3334
- Documentation: "documentation"
3435
- GitHub: "https://github.com/ISG-ICS/cloudberry"
3536
Pubs: "pubs"

docs/aboutme.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,17 @@ subtitle:
1212
## Contributors
1313
* [Jianfeng Jia](https://github.com/JavierJia) (Ph.D. student)
1414
* [Taewoo Kim](https://github.com/waans11) (Ph.D. student)
15+
* [Chen Luo](luochen01.github.io) (Ph.D. student)
1516
* [Hao Chen](https://github.com/haochen07)
16-
* [Nishad Gurav](https://github.com/nishadg)
17+
* [Te-Yu Chen](https://github.com/DeyuChen)
1718
* [Vidhyasagar Thirumaraiselvan](https://github.com/vidhya567)
19+
* [Vignesh Sankar](https://github.com/vignesh-sankar)
1820
* [Shengjie Xu](https://github.com/HotLemonJuice)
21+
* [Liangju Chu](https://github.com/liangjuc)
22+
* [Sicong Liu](https://github.com/lsclovecode)
1923

2024
## Earlier Contributors
25+
* [Nishad Gurav](https://github.com/nishadg)
2126
* [Aishwarya Kapse](https://github.com/aishwaryakapse)
2227
* [Chen Li (The student version :-))](https://github.com/JeremyLi28)
2328
* [Kaiyi Ma](https://github.com/kaiyim)

docs/documentation.md

+30-103
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,8 @@
11
---
22
layout: page
3-
title: Quick Start
3+
title: Documentation
44
toc: true
55
---
6-
## Setup TwitterMap locally
7-
8-
This page includes instructions on how to setup a small instance of the
9-
[TwitterMap](http://cloudberry.ics.uci.edu/demos/twittermap/) on a local machine.
10-
11-
System requirements:
12-
13-
- Linux or Mac
14-
- At least 4GB memory
15-
16-
Step 1: Install `sbt` by following the instructions on this [`page`](http://www.scala-sbt.org/release/docs/Setup.html).
17-
18-
Step 2: Clone the codebase.
19-
20-
```
21-
shell> git clone https://github.com/ISG-ICS/cloudberry.git
22-
```
23-
24-
Suppose the repostory is cloned to the folder `~/cloudberry`.
25-
26-
Step 3: Use the following steps to install an AsterixDB cluster on the local machine in order to run the Cloudberry middleware.
27-
28-
1. Install [Docker](https://www.docker.com/products/docker) (version at least 1.10) on the local machine;
29-
2. Run the following commands to create an AsterixDB cluster locally:
30-
31-
```
32-
~> cd cloudberry
33-
~/cloudberry> ./script/dockerRunAsterixDB.sh
34-
```
35-
This command will download and run a prebuilt AsterixDB docker image from [here](https://hub.docker.com/r/jianfeng/asterixdb/). This step may take 5-10 minutes or even longer, depending on your network speed.
36-
37-
Step 4: Run the following command to ingest sample tweets (about 324K) and US population data into AsterixDB.
38-
39-
```
40-
~/cloudberry> ./script/ingestAllTwitterToLocalCluster.sh
41-
```
42-
43-
This step is downloading about 70MB of data, and it may take 5 minutes, again, depending on your network speed. This step is successful after you see a message "Data ingestion completed!" in the shell.
44-
45-
Step 5: Compile and run the Cloudberry server.
46-
47-
```
48-
~/cloudberry> sbt compile
49-
~/cloudberry> sbt "project neo" "run"
50-
```
51-
52-
Wait until the shell prints a message "Server started, use Ctrl+D to stop and go back to the console....".
53-
54-
Step 6: Start the TwitterMap frontend by running the following command in another shell:
55-
56-
```
57-
~/cloudberry> sbt "project twittermap" "run 9001"
58-
```
59-
60-
Step 7: Open a browser to access [http://localhost:9001](http://localhost:9001) to see the TwitterMap frontend. Notice that the first time you open the page, it could take up to several minutes (depending on your machine) to load the front-end data.
61-
62-
**Congratulations!** You have successfully set up TwitterMap using AsterixDB and Cloudberry!
63-
64-
65-
66-
* Run your own front-end server
67-
68-
TwitterMap is our homemade front-end that shows how to use Cloudberry server. You can implement own front-end service
69-
and let it talk to Cloudberry to achieve the same interactive user experience.
70-
716

727
## Concepts
738

@@ -157,8 +92,9 @@ The following JSON request can be used to register the Twitter dataset inside As
15792

15893
The front-end application can send the ddl JSON file to Cloudberry `/admin/register` path by using `POST` HTTP method.
15994
E.g., we can register the previous ddl using the following command line:
95+
16096
```
161-
curl -XPOST -d @JSON_FILE_NAME http://localhost:9000/berry
97+
curl -XPOST -d @JSON_FILE_NAME http://localhost:9000/admin/register
16298
```
16399

164100
*Note*:
@@ -197,11 +133,18 @@ After defining the dataset, the front-end can `POST` a JSON request to `/berry`
197133
illustration purpose, clients can use the `curl` command to send the JSON file as following.
198134

199135
```
200-
curl -XPOST -d @JSON_FILE --header "Content-Type:application/json" http://localhost:9000/berry
136+
curl -XPOST -d @JSON_FILE http://localhost:9000/berry
201137
```
202138

203139
In the production system, the front-end application can send the request by JavaScripts to the `/berry` path.
204-
We also provide the websocket connection at `ws://cloudberry_host_name/ws`. It will return the same result as HTTP POST requests.
140+
We also provide the websocket connection at `ws://cloudberry_host_name/ws`. You can let the front-end to directly talk
141+
to Cloudberry server in Javascript as following:
142+
143+
```
144+
var ws = new WebSocket("ws://localhost:9000/ws"");
145+
```
146+
147+
It will return the same result as HTTP POST requests.
205148

206149

207150
A request is composed of the following parameters:
@@ -263,7 +206,14 @@ A request is composed of the following parameters:
263206
}
264207
```
265208

266-
Using `curl` command, you should see the following responses:
209+
You can test the query by putting the above JSON record into a file and using `curl` command to send it to Cloudberry.
210+
211+
```
212+
curl -XPOST -d @JSON_FILE http://localhost:9000/berry
213+
```
214+
215+
You should see the following responses:
216+
267217
```
268218
[[
269219
{"state":6,"hour":"2016-04-09T10:00:00.000Z","count":1},
@@ -310,6 +260,7 @@ Using `curl` command, you should see the following responses:
310260
```
311261

312262
The expected results are as following:
263+
313264
```
314265
[[
315266
{"tag":"Zika","count":6},
@@ -339,6 +290,7 @@ The expected results are as following:
339290
```
340291

341292
The expected results are as following:
293+
342294
```
343295
[[
344296
{"create_at":"2016-10-04T10:00:17.000Z","id":783351045829357568},
@@ -363,6 +315,7 @@ Cloudberry supports automatic query-slicing on the `timeField`. The front-end ca
363315
```
364316

365317
For example, the following query asks the top-10 hashtags with an option to accept an updated results every 200ms.
318+
366319
```json
367320
{
368321
"dataset": "twitter.ds_tweet",
@@ -398,6 +351,7 @@ For example, the following query asks the top-10 hashtags with an option to acce
398351
```
399352

400353
There will be a stream of results return from Cloudberry as following:
354+
401355
```
402356
[[{"tag":"Zika","count":3},{"tag":"ColdWater","count":1},{"tag":"Croatia","count":1}, ... ]]
403357
[[{"tag":"Zika","count":4},{"tag":"Croatia","count":1},{"tag":"OperativoNU","count":1}, ... ]]
@@ -495,17 +449,16 @@ queries should be sliced synchronized.
495449
```
496450

497451
The response is as following:
452+
498453
```
499454
[
500455
[ {"state":6,"hour":"2016-08-05T10:00:00.000Z","count":1}, {"state":12,"hour":"2016-07-26T10:00:00.000Z","count":1}, ...],
501456
[ {"tag":"trndnl","count":6},{"tag":"Zika","count":5},{"tag":"ColdWater","count":1}, ...]
502457
]
503-
504458
[
505459
[ {"state":72,"hour":"2016-05-06T10:00:00.000Z","count":1},{"state":48,"hour":"2016-09-09T10:00:00.000Z","count":2}, ...],
506460
[ {"tag":"trndnl","count":6},{"tag":"Zika","count":6},{"tag":"Croatia","count":1}, ...]
507461
]
508-
509462
...
510463
```
511464

@@ -558,38 +511,12 @@ The response is as below:
558511

559512
`wrap` transformation is often preferable when the front-end send many different requests in the same WebSocket interface.
560513

561-
### Advanced users
562-
563-
#### Multi-node AsterixDB cluster
514+
### Deregister Dataset
564515

565-
Some applications may require a multi-node AsterixDB cluster.
566-
You can follow the official [documentation](https://ci.apache.org/projects/asterixdb/install.html) to set it up.
567-
568-
After the cluster is set up, you should make the following changes
569-
570-
* Change the AsterixDB NC name for feed connection
571-
572-
In the script `./script/ingestTwitterToLocalCluster.sh`, line 86:
516+
You may also deregister a dataset from Cloudberry. To do so, you can send the JSON record as following to the `/admin/deregister` path.
573517

574518
```
575-
("sockets"="my_asterix_nc1:10001")
576-
```
577-
578-
where *my_asterix* is the name of your cluster instance, and *nc1* is the name of one NC node.
579-
580-
581-
* Modify the AsterixDB hostname
582-
583-
In configuration file `neo/conf/application.conf`, chang the `asterixdb.url` value to the previously set AsterixDB CC RESTFul API address.
584-
585-
```
586-
asterixdb.url = "http://YourAsterixDBHostName:19002/query/service"
587-
```
588-
589-
#### Deregister Tweets and US Population Data Models
590-
591-
You may also deregister these data models from the Cloudberry to experiment with other data models. But be careful when doing this as the TwitterMap won't work without these data models.
592-
593-
```
594-
./script/deregisterTwitterMapDataModel.sh
519+
{
520+
"dataset": "twitter.dsCountyPopulation"
521+
}
595522
```

docs/img/docker.png

269 KB
Loading

docs/img/ingestion.png

105 KB
Loading

docs/img/neo.png

138 KB
Loading

docs/img/quick-start-architecture.png

76.9 KB
Loading

docs/img/twittermap.png

171 KB
Loading

docs/img/web.png

1.02 MB
Loading

docs/quick-start.md

+102-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,105 @@ layout: page
33
title: Quick Start
44
---
55

6-
## TODO
6+
## Setup TwitterMap locally
7+
8+
This page includes instructions on how to setup a small instance of the
9+
[TwitterMap](http://cloudberry.ics.uci.edu/demos/twittermap/) on a local machine.
10+
The relation between TwitterMap and Cloudberry is shown in the following figure:
11+
![architecture][architecture]
12+
13+
System requirements:
14+
15+
- Linux or Mac
16+
- At least 4GB memory
17+
18+
**Step 1**: Install `sbt` by following the instructions on this [`page`](http://www.scala-sbt.org/release/docs/Setup.html).
19+
20+
**Step 2**: Clone the codebase.
21+
22+
```
23+
shell> git clone https://github.com/ISG-ICS/cloudberry.git
24+
```
25+
26+
Suppose the repostory is cloned to the folder `~/cloudberry`.
27+
28+
**Step 3**: Use the following steps to install an AsterixDB cluster on the local machine in order to run the Cloudberry middleware.
29+
30+
1. Install [Docker](https://www.docker.com/products/docker) (version at least 1.10) on the local machine;
31+
2. Run the following commands to create an AsterixDB cluster locally:
32+
33+
```
34+
~> cd cloudberry
35+
~/cloudberry> ./script/dockerRunAsterixDB.sh
36+
```
37+
This command will download and run a prebuilt AsterixDB docker image from [here](https://hub.docker.com/r/jianfeng/asterixdb/). This step may take 5-10 minutes or even longer, depending on your network speed.
38+
After it finishes, you should see the messages as shown in the following screenshot:
39+
![docker][docker]
40+
41+
**Step 4**: Run the following command to ingest sample tweets (about 324K) and US population data into AsterixDB.
42+
43+
44+
```
45+
~/cloudberry> ./script/ingestAllTwitterToLocalCluster.sh
46+
```
47+
48+
This step is downloading about 70MB of data, and it may take 5 minutes, again, depending on your network speed. This step is successful after you see a message "Data ingestion completed!" in the shell.
49+
After it finishes, you should see the messages as shown in the following screenshot:
50+
![ingestion][ingestion]
51+
52+
**Step 5**: Compile and run the Cloudberry server.
53+
54+
```
55+
~/cloudberry> sbt compile
56+
~/cloudberry> sbt "project neo" "run"
57+
```
58+
59+
Wait until the shell prints the messages as shown in the following screenshot:
60+
![neo][neo]
61+
62+
**Step 6**: Start the TwitterMap frontend by running the following command in another shell:
63+
64+
```
65+
~/cloudberry> sbt "project twittermap" "run 9001"
66+
```
67+
68+
Wait until the shell prints the messages as shown in the following screenshot:
69+
![twittermap][twittermap]
70+
71+
72+
**Step 7**: Open a browser to access [http://localhost:9001](http://localhost:9001) to see the TwitterMap frontend. Notice that the first time you open the page, it could take up to several minutes (depending on your machine) to show the following webpage:
73+
![web][web]
74+
75+
76+
**Congratulations!** You have successfully set up TwitterMap using AsterixDB and Cloudberry!
77+
78+
79+
### Connect to your own AsterixDB cluster
80+
81+
To connect with an existing AsterixDB cluster, you can modify the AsterixDB hostname in the
82+
configuration file `neo/conf/application.conf` and change the `asterixdb.url` value to the AsterixDB hostname.
83+
84+
```
85+
asterixdb.url = "http://YourAsterixDBHostName:19002/query/service"
86+
```
87+
88+
### Run your own front-end server
89+
90+
TwitterMap is our homemade front-end that shows how to use Cloudberry server. You can implement own front-end service
91+
and let it talk to Cloudberry to achieve the same interactive user experience.
92+
93+
### Customize front-end requests
94+
Read more on [documentation](/documentation) about how to write Cloudberry requests for your own front-end applications.
95+
96+
[architecture]: /img/quick-start-architecture.png
97+
{: width="800px"}
98+
[docker]: /img/docker.png
99+
{: width="800px"}
100+
[ingestion]: /img/ingestion.png
101+
{: width="800px"}
102+
[neo]: /img/neo.png
103+
{: width="800px"}
104+
[twittermap]: /img/twittermap.png
105+
{: width="800px"}
106+
[web]: /img/web.png
107+
{: width="800px"}

noah/src/main/scala/edu/uci/ics/cloudberry/noah/feed/FeedSocketAdapterClient.java

+3-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,9 @@ public void finalize() {
4343

4444
public void ingest(String record) throws IOException{
4545
recordCount++;
46-
System.err.println("send record: " + recordCount);
46+
if (recordCount % 5000 == 0) {
47+
System.err.println("send record: " + recordCount);
48+
}
4749
byte[] b = record.replaceAll("\\s+", " ").getBytes();
4850
try {
4951
out.write(b);

script/dockerClean.sh

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/usr/bin/env bash
2+
#clean up the existing images
3+
docker stop -f cc nc1
4+
docker rm -f cc nc1
5+
docker volume rm dbstore
6+
# remove the local image to fetch the newest remote version
7+
docker rmi jianfeng/asterixdb
8+
9+

script/dockerRunAsterixDB.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,11 @@
2121
set -o nounset # Treat unset variables as an error
2222

2323
ncs=${1:-1} # the number of NCs in local cluster, default is 1 ncs
24-
NC_JVM_MEM=1024 # the JVM -Xmx2048m memory budget for each NC. the Unit is in meta bytes
25-
24+
NC_JVM_MEM=2048 # the JVM -Xmx2048m memory budget for each NC. the Unit is in meta bytes
2625
docName=dbstore
26+
27+
./script/dockerClean.sh
28+
2729
docker volume create --driver local --name $docName
2830

2931
echo "build the cc"

0 commit comments

Comments
 (0)