Amber Prototype based on Orleans

Introduction

Long-running analytic tasks on big data frameworks often provide little or no feedback about the status of the execution. Some big data processing frameworks provide status updates for running jobs, but these systems only allow users to monitor their jobs passively. Even if the users notice anomalies happening during the execution, they can either kill the job or wait for the job to run to its completion.

Amber is a distributed data processing engine build on top of existing actor model implementation. It has a unique capability of supporting responsive debugging during the execution of a dataflow. Users can pause/resume the execution, investigate the state of operators, change the behavior of an operator, and set conditional breakpoints. Amber provides these features along with the support for fault tolerance. In case of a failure, it not only ensures the correctness of the final computation result, but also recovers the same consistent debugging state.

Paper: Amber: A Debuggable Dataflow System Based on the Actor Model(VLDB 2020)

Contributors: Shengquan Ni, Avinash Kumar, Zuozhi Wang, Chen Li.

Affiliation: University of California, Irvine.

Install Frontend

Install Node JS

For Windows / Mac

Download and install the latest LTS version of NodeJS (Version 12)

For Linux

sudo apt-get install curl software-properties-common
curl -sL https://deb.nodesource.com/setup_12.x | sudo bash -
sudo apt-get install nodejs

Build Frontend

Clone this repo then do the following:

cd AmberOnOrleans/Frontend
npm install
npm run build

Running npm install will take a long time, usually 5 to 10 minutes. You can ignore the vulnerabilities warnings in the end.

Install Amber

Install dotnet-sdk 3.0
Install MySQL and login as admin. Using the following command to create a user with username "orleansbackend" and password "orleans-0519-2019" (this can be changed at Constants.cs)

CREATE USER 'orleansbackend'@'%' IDENTIFIED BY 'orleans-0519-2019';

Create a mysql database called 'amberorleans' and grant all privileges by using the following commands.

CREATE DATABASE amberorleans;
GRANT ALL PRIVILEGES ON amberorleans. * TO 'orleansbackend'@'%';
FLUSH PRIVILEGES;
USE amberorleans;

Run the scripts MySQL-Main.sql, MySQL-Clustering.sql to create the necessary tables and insert entries in the database.
We have generated some sample dataset for you to banchmark Amber, here are 2 datasets you can use:
- tiny TPC-H dataset(MBs)
- TPC-H sample dataset(1GB)
Download one dataset from the links above to your local machine.

Run Amber on your local machine:

1.Start MySql Server on local machine.

2.Start Silo:

Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.

Open terminal and enter:

cd AmberOnOrleans/SiloHost
dotnet run -c Release

You can ignore all the warnings and it takes time to build the connection.

Make sure you see "Silo Started!" before proceeding to step 3.

3.Start Console Application:

Open another terminal and enter:

cd AmberOnOrleans/ConsoleApp
dotnet run

It will prompt you to choose a sample workflow and enter the path of the dataset on your local machine.

After entering all the parameters, the workflow will automatically run and the results will be displayed.

4.Create workflow through Web GUI(Optional):

If you want to checkout the web-based frontend of Amber. This is a step-by-step guide for creating and runnning a sample Workflow using one of the datasets above.

Open another terminal and enter:

cd AmberOnOrleans/WebApp
dotnet run

Go to http://localhost:7070, you can see a web GUI for Amber:

Drag Source -> Scan operator from left panel and drop it on the canvas:

Then, drag and drop Utilities -> Comparison, LocalGroupBy, GlobalGroupBy and Sort -> Sort respectively. They will automatically be linked with the previous operator. Your workflow should look like this:

You can specifiy properties for each operator on the right panel. Each operator should have the following properties:

Scan:

Comparison:

LocalGroupBy:

GlobalGroupBy:

Sort:

Click the "Run" button in upper-right corner to run the workflow. After completion, the following result will pop up from the bottom:

Run Amber on a cluster:

1.Clone this repo:

On one cluster machine (name it A) which installed MySql Server and do the following change at Constants.cs:

public static string ClientIPAddress = <Current Machine's IP address>;
...
public volatile static int DefaultNumGrainsInOneLayer = <# of Machines in the cluster - 1>;

2.Start MySql Server on machine A.

3.Copy the edited repo to all other machines in the cluster.

4.Start Silos:

Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.

Open terminal and enter on all other machines in the cluster:

cd AmberOnOrleans/SiloHost
dotnet run -c Release

You can ignore all the warnings and it takes time to build the connection.

Make sure you see "Silo Started!" on all the machines before proceeding to step 4.

5.On machine A, follow from step 3 or 4 of the tutorial above.

Note: The table file should be stored in HDFS for other machine to access and you will need to use HDFS Restful link as the path of the table file.(e.g. http://128.295.2.45:9870/webhdfs/v1/datasets/lineitem.tbl)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amber Prototype based on Orleans

Introduction

Install Frontend

Install Node JS

Build Frontend

Install Amber

Run Amber on your local machine:

1.Start MySql Server on local machine.

2.Start Silo:

3.Start Console Application:

4.Create workflow through Web GUI(Optional):

Run Amber on a cluster:

1.Clone this repo:

2.Start MySql Server on machine A.

3.Copy the edited repo to all other machines in the cluster.

4.Start Silos:

5.On machine A, follow from step 3 or 4 of the tutorial above.

About

Releases 1

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,023 Commits
Client		Client
ConsoleApp		ConsoleApp
Engine		Engine
Frontend		Frontend
SiloHost		SiloHost
Utilities		Utilities
WebApp		WebApp
.gitignore		.gitignore
README.md		README.md

Texera/AmberOnOrleans

Folders and files

Latest commit

History

Repository files navigation

Amber Prototype based on Orleans

Introduction

Install Frontend

Install Node JS

Build Frontend

Install Amber

Run Amber on your local machine:

1.Start MySql Server on local machine.

2.Start Silo:

3.Start Console Application:

4.Create workflow through Web GUI(Optional):

Run Amber on a cluster:

1.Clone this repo:

2.Start MySql Server on machine A.

3.Copy the edited repo to all other machines in the cluster.

4.Start Silos:

5.On machine A, follow from step 3 or 4 of the tutorial above.

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages