MADlib Port

This project brings in-DBMS data analytics to Impala. This leverages previous work done by two projects:

Each of these projects use User Defined Aggregates (UDAs) to train analytic models using an existing DBMS's data management and processing ability.

Dependencies: yum install -y eigen3-devel.noarch

Also, install boost 1.54.0

Code Base

This code base includes the following components.

There is a fork of MADlib 1.0 which has been modified for use with impala. The specific changes were:

To run the example SVM,

Create database toysvm
to register the UDFs with a database (without re-making the binaries), execute: python python/deploy.py -mp toysvm
create a synthetic table of examples in the database toysvm with the table toy: python python/gen_classify_data.py toysvm toy
python python/impala_svm.py lbl e0 e1 e2 --db toysvm --table toy -e 1
impala-shell -q 'use toysvm; select iter, printarray(model) from history;'

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
doc		doc
example		example
madlib		madlib
python		python
src		src
test		test
udf @ 51bd5ac		udf @ 51bd5ac
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
instructs		instructs