How to set up TPC-H Benchmark and train SmartIX
I. Requirements
- MySQL version 14.14 (see Section II)
- TPC-H version: 2.18.0 (see Section III)
- Python 3.6 (see Section V)
II. Installing MySQL
$ sudo apt-get install mysql-server
$ sudo mysql_secure_installation
VALIDATE PASSWORD PLUGIN (...) n
New password: YOUR PASSWORD
Re-enter new password: YOUR PASSWORD
Remove anonymous users? y
Disallow root login remotely? n
Remove test database and access to it? n
Reload privilege tables now? y
$ sudo ufw allow mysql
$ sudo systemctl start mysql
$ sudo systemctl enable mysql
$ sudo systemctl status mysql
Press Ctrl+C to exit MySQL status.
- Create a new user:
$ sudo mysql -u root -p
mysql> CREATE USER 'smartix'@'localhost' IDENTIFIED BY 'smartix';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'smartix'@'localhost' WITH GRANT OPTION;
mysql> CREATE USER 'smartix'@'%' IDENTIFIED BY 'smartix';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'smartix'@'%' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;
mysql> EXIT;
- Create the database:
$ mysql -u smartix -p
mysql> CREATE DATABASE tpch;
mysql> EXIT;
III. Using TPC-H Tools
-
Click at TPC-H source code, available at the TPC website.
-
You have to fill in a register, and then they will send you a unique link by e-mail, so you can download and unzip the tools.
-
Go to the folder named
dbgen
, where there are the.c
files. -
Edit the
makefile.suite
file to match what we are showing below. We will use MySQL, but we have to set it to Oracle as the database here.
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE = ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
- Compile using the following command:
$ make -f makefile.suite
- It will generate several compiled files. Now, we have to generate the database files. We use a scale factor of 1GB:
$ ./dbgen -s 1
-
It will generate one file for each table, to be inserted in the database, in the format tbl_name.tbl“.
-
Create the tables using the file
dss.ddl
:
$ mysql -u smartix -psmartix tpch < dss.ddl
- Log in on MySQL with --local-infile variable enabled and select the TPCH database. Then, import the data to the database (pay attention to replace _~/path/ _for the actual path where the folder is):
mysql --local-infile=1 -u smartix -psmartix
USE tpch;
SET GLOBAL local_infile=1;
load data local infile '~/path/dbgen/region.tbl' into table REGION fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/nation.tbl' into table NATION fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/customer.tbl' into table CUSTOMER fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/supplier.tbl' into table SUPPLIER fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/part.tbl' into table PART fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/orders.tbl' into table ORDERS fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/partsupp.tbl' into table PARTSUPP fields terminated by '|' lines terminated by '\n';
load data local infile '~/path/dbgen/lineitem.tbl' into table LINEITEM fields terminated by '|' lines terminated by '\n';
exit;
- Check the amount of rows for each table:
- REGION: 5
- NATION: 25
- CUSTOMER: 150000
- SUPPLIER: 10000
- PART: 200000
- ORDERS: 1500000
- PARTSUPP: 800000
- LINEITEM: 6001215
- Generate the files to be used in refresh functions. It will generate files named
delete.[num]
, and inserts toorders.[num]
, andlineitem.[num]
tables. Create a folder named1
(one) and move the files intto this folder. We do this in order to separate the refresh files created with respect to a scale factor of 1.
$ ./dbgen -s 1 -U 10000
$ mkdir 1
$ mv delete.* 1
$ mv orders.* 1
$ mv lineitem.* 1
IV. Configuring the TPC-H Benchmark
We need to create procedures, tables, and views to run our experiments.
-
Download the folder from this link and save it to your home folder.
-
Create folder
/QSRF
at/home/user/
, and move files to this folder (/home/user/QSRF
). -
The files contained in the folder should be:
README.txt
: order to execute the filesCREATE TEMPORARY TABLES.sql
: create the tables to be used in data insertion and deletion;CREATE VIEWS.sh
: create views to test database query perfomance;CREATE_PROCEDURE_DELETE_REFRESH_FUNCTION.sql
: create procedure to delete data;CREATE_PROCEDURE_INSERT_REFRESH_FUNCTION.sql
: create procedure to insert data;CREATE_PROCEDURE_QUERY_STREAM.sql
: create procedure for query stream (sequential query in the 22 views we created).
- Create temporary tables:
$ mysql -u dbuser -p tpch
mysql> source ~/QSRF/CREATE TEMPORARY TABLES.sql;
mysql> exit;
- Create views (check the path in the
.sh
file):
$ cd ~/QSRF
$ ./CREATE_VIEWS.sh
- Create procedures:
$ mysql -u dbuser -p tpch < ~/QSRF/CREATE_PROCEDURE_DELETE_REFRESH_FUNCTION.sql;
$ mysql -u dbuser -p tpch < ~/QSRF/CREATE_PROCEDURE_INSERT_REFRESH_FUNCTION.sql;
$ mysql -u dbuser -p tpch < ~/QSRF/CREATE_PROCEDURE_QUERY_STREAM.sql;
- Check if views and procedures are created:
$ mysql -u dbuser -p tpch
mysql> use tpch;
mysql> show tables;
34 rows in set (0.00 sec)
mysql> show procedure status where db = 'tpch';
(...)
| tpch | DELETE_REFRESH_FUNCTION | PROCEDURE | dbuser@localhost |
| tpch | INSERT_REFRESH_FUNCTION | PROCEDURE | dbuser@localhost |
| tpch | QUERY_STREAM | PROCEDURE | dbuser@localhost |
(...)
3 rows in set (0.00 sec)
mysql> exit;
V. Configuring the Python Environment
- unixodbc (
$ sudo apt-get install unixodbc-dev
) - pyodbc (
$ pip3 install pyodbc
) - mysql-connector (
$ pip3 install mysql-connector mysql-connector-python
)
- Download the MySQL Connector ODBC from this link.
VI. Training SmartIX
-
Download SmartIX source code (or clone the repository).
-
Configure the database connection string in the database.py class init method: put your user, password, and database name to the connection string variable.
-
The same has to be done in the TPCH.py class: put your database connection info to the DB_CONFIG constant, as well as setting the REFRESH_FILES_PATH constant to the path you generated the refresh files back in Step 13.
-
Then you can finally start training the agent by running:
$ python3 environment.py > training.log
- Finally, you can view training data in the
data
folder.