Skip to content

Commit d4c65c1

Browse files
committed
demo: update Apache Beam Analysis
Signed-off-by: Xiang Dai <[email protected]>
1 parent cc38a1a commit d4c65c1

File tree

7 files changed

+79
-80
lines changed

7 files changed

+79
-80
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,80 +1,79 @@
1-
# Data Analytics @ Edge
2-
3-
## Description
4-
5-
![High level architecture](Images/High_level_Arch.png "High Level Architecture")
6-
7-
The main aim of analytics engine is to get data from mqtt broker in stream format and apply rules on incoming data in real time and produce alert/action on mqtt broker. Getting data through pipeline and applying analysis function is done by using Apache Beam.
8-
9-
### Apache beam:
10-
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, we can build a program that defines the pipeline.
11-
12-
13-
#### Why use Apache Beam for analytics:
14-
There are many frameworks like Hadoop, Spark, Flink, Google Cloud Dataflow etc for stream processing. But there was no unified API to binds all such frameworks and data sources. It was needed to abstract out the application logic from these Big Data frameworks. Apache Beam framework provides this abstraction between your application logic and big data ecosystem.
15-
- A generic dataflow-based model for building an abstract pipeline which could be run on any runtime like Flink/Samza etc.
16-
- The same pipeline code can be executed on cloud(eg. Huawei Cloud Stream based on Apache Flink) and on the edge with a custom backend which can efficiently schedule workloads in an edge cluster and perform distributed analytics.
17-
- Apache Beam integrates well with TensorFlow for machine learning which is a key use-case for edge.
18-
- Beam has support for most of the functions required for stream processing and analytics.
19-
20-
#### Demo 1.1 [Real-time alert]:Read batch data from MQTT,filter and generate alerts
21-
- Basic mqtt read/write support in Apache Beam for batch data
22-
- Reads data from an mqtt topic
23-
- Create PCollection of read data and use it as the initial data for pipeline
24-
- Do a filtering over the data
25-
- Publish an alert on a topic if reading exceeds the value
26-
![Demo1.1](Images/Demo1.1.png "Demo1.1:Read batch data from MQTT,filter and generate alerts")
27-
28-
#### Demo 1.2 [Filter Streaming Data]: Reads streaming data from MQTT, filter at regular intervals
29-
- Read streaming data using MQTT
30-
- Do a filtering over the data at fixed time intervals
31-
![demo1.2](Images/Demo1.2.png "Demo1.2:Reads streaming data from MQTT, filter at regular intervals")
32-
33-
### Prerequisites
34-
- Golang(version: 1.11.4+)
35-
- KubeEdge(version: v0.1+)
36-
- Docker(version: 18.06.1-ce+)
37-
38-
### Deploy pipeline application:
39-
For demo 1.1:
40-
Pull the docker image from dockerhub by using following command
41-
```console
42-
$ sudo docker pull containerise/ke_apache_beam:ke_apache_analysis_v1.1
43-
```
44-
For demo 1.2:
45-
Pull the docker image from dockerhub by using following command
46-
```shell
47-
$ sudo docker pull containerise/ke_apache_beam:ke_apache_analysis_v1.2
48-
```
49-
Run the command
50-
```console
51-
$ docker images
52-
```
53-
This will shows all images created. Check image named ke_apache_analysis_v1.1 or ke_apache_analysis_v1.2
54-
55-
Follows steps from [here](https://github.com/kubeedge/kubeedge/blob/master/README.md) to setup prerequisite environment.
56-
57-
Try out a application deployment by following below steps.
58-
```console
59-
$ kubectl apply -f examples/KE-Apache-Beam-Analysis/KE-Apache-Beam-Deployment.yaml
60-
```
61-
Then you can use below command to check if the application is normally running.
62-
```console
63-
kubectl get pods
64-
```
65-
66-
To check result, publish dummy data by using [testmachine](MQTT_Publisher/testmachine.go).
67-
68-
Add following vendor packages:
69-
```console
70-
$ go get -u github.com/yosssi/gmq/mqtt
71-
$ go get -u github.com/yosssi/gmq/mqtt/client
72-
```
73-
run:
74-
```console
75-
$ go build testmachine.go
76-
$ ./testmachine
77-
```
78-
79-
80-
1+
# Data Analytics with Apache Beam
2+
3+
## Description
4+
5+
![High level architecture](images/High_level_Arch.png "High Level Architecture")
6+
7+
The main aim of analytics engine is to get data from mqtt broker in stream format and apply rules on incoming data in real time and produce alert/action on mqtt broker. Getting data through pipeline and applying analysis function is done by using Apache Beam.
8+
9+
### Apache Beam
10+
11+
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, we can build a program that defines the pipeline.
12+
13+
14+
#### Why use Apache Beam for analytics
15+
16+
There are many frameworks like Hadoop, Spark, Flink, Google Cloud Dataflow etc for stream processing. But there was no unified API to binds all such frameworks and data sources. It was needed to abstract out the application logic from these Big Data frameworks. Apache Beam framework provides this abstraction between your application logic and big data ecosystem.
17+
- A generic dataflow-based model for building an abstract pipeline which could be run on any runtime like Flink/Samza etc.
18+
- The same pipeline code can be executed on cloud(eg. Huawei Cloud Stream based on Apache Flink) and on the edge with a custom backend which can efficiently schedule workloads in an edge cluster and perform distributed analytics.
19+
- Apache Beam integrates well with TensorFlow for machine learning which is a key use-case for edge.
20+
- Beam has support for most of the functions required for stream processing and analytics.
21+
22+
#### Demo 1.1 [Real-time alert]:Read batch data from MQTT,filter and generate alerts
23+
- Basic mqtt read/write support in Apache Beam for batch data
24+
- Reads data from an mqtt topic
25+
- Create PCollection of read data and use it as the initial data for pipeline
26+
- Do a filtering over the data
27+
- Publish an alert on a topic if reading exceeds the value
28+
![Demo1.1](images/Demo1.1.png "Demo1.1:Read batch data from MQTT,filter and generate alerts")
29+
30+
#### Demo 1.2 [Filter Streaming Data]: Reads streaming data from MQTT, filter at regular intervals
31+
- Read streaming data using MQTT
32+
- Do a filtering over the data at fixed time intervals
33+
![demo1.2](images/Demo1.2.png "Demo1.2:Reads streaming data from MQTT, filter at regular intervals")
34+
35+
### Prerequisites
36+
- Golang(version: 1.14+)
37+
- KubeEdge(version: v1.5+)
38+
- Docker(version: 18.09-ce+)
39+
40+
### Deploy pipeline application
41+
For demo 1.1:
42+
Pull the docker image from dockerhub by using following command
43+
```console
44+
$ sudo docker pull containerise/ke_apache_beam:ke_apache_analysis_v1.1
45+
```
46+
For demo 1.2:
47+
Pull the docker image from dockerhub by using following command
48+
```shell
49+
$ sudo docker pull containerise/ke_apache_beam:ke_apache_analysis_v1.2
50+
```
51+
Run the command
52+
```console
53+
$ docker images
54+
```
55+
This will shows all images created. Check image named ke_apache_analysis_v1.1 or ke_apache_analysis_v1.2
56+
57+
Follows steps from [here](https://github.com/kubeedge/kubeedge/blob/master/README.md) to setup prerequisite environment.
58+
59+
Try out a application deployment by following below steps.
60+
```console
61+
$ kubectl apply -f examples/apache-beam-analysis/deployment.yaml
62+
```
63+
Then you can use below command to check if the application is normally running.
64+
```console
65+
$ kubectl get pods
66+
```
67+
68+
To check result, publish dummy data by using [testmachine](publisher/testmachine.go).
69+
70+
Add following vendor packages:
71+
```console
72+
$ go get -u github.com/yosssi/gmq/mqtt
73+
$ go get -u github.com/yosssi/gmq/mqtt/client
74+
```
75+
run:
76+
```console
77+
$ go build testmachine.go
78+
$ ./testmachine
79+
```

0 commit comments

Comments
 (0)