-
Notifications
You must be signed in to change notification settings - Fork 162
809: Setup of TPC‐H benchmark set
KaiGai Kohei edited this page Jun 26, 2024
·
2 revisions
This note introduces the steps to setup TPC-H benchmark dataset.
Checkout the latest version of PG-Strom from GitHub
% git clone https://github.com/heterodb/pg-strom.git
Move to the test/tpch
subdirectory
% cd pg-strom/test/tpch
Run the script to create tables
% psql DBNAME -f ./tpch-ddl.sql
Build dbgen-tpch
which is based on v3.0.1
.
Just run make
.
% make
(echo "const char *static_dists_dss ="; \
cat dists.dss | while read line; \
do \
echo $line | sed -e 's/^/ "/g' -e 's/$/\\n"/g'; \
done; echo ";") > dists.dss.h
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DPOSTGRESQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen-tpch build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
Load the dataset using dbgen-tpch
.
% psql DBNAME
=# \copy customer from program './dbgen-tpch -s 1000 -X -Tc' delimiter '|'
=# \copy lineitem from program './dbgen-tpch -s 1000 -X -TL' delimiter '|'
=# \copy nation from program './dbgen-tpch -s 1000 -X -Tn' delimiter '|'
=# \copy orders from program './dbgen-tpch -s 1000 -X -TO' delimiter '|'
=# \copy part from program './dbgen-tpch -s 1000 -X -Tp' delimiter '|'
=# \copy partsupp from program './dbgen-tpch -s 1000 -X -TS' delimiter '|'
=# \copy region from program './dbgen-tpch -s 1000 -X -Tr' delimiter '|'
=# \copy supplier from program './dbgen-tpch -s 1000 -X -Ts' delimiter '|'
=# VACUUM ANALYZE;
After that, TPC-H benchmark database will be built.
-s
is scaling factor. SF=1 is almost 1GB, therefore, -s 1000
will generate 1TB dataset in total.
You can see the following database size if SF=1000. It is not just 1000GB, because PostgreSQL adds 24bytes header for each rows in front of the payload.
tpch=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+----------+-------+--------+-------------+---------------+------------+-------------
public | customer | table | kaigai | permanent | heap | 28 MB |
public | lineitem | table | kaigai | permanent | heap | 882 GB |
public | nation | table | kaigai | permanent | heap | 40 kB |
public | orders | table | kaigai | permanent | heap | 205 GB |
public | part | table | kaigai | permanent | heap | 31 GB |
public | partsupp | table | kaigai | permanent | heap | 134 GB |
public | region | table | kaigai | permanent | heap | 8192 bytes |
public | supplier | table | kaigai | permanent | heap | 1736 MB |
(8 rows)
You can see the test queries in test/tpch/queries
directory.
$ ls queries/
tpch-01.sql tpch-03.sql tpch-05.sql tpch-07.sql tpch-09.sql tpch-11.sql tpch-13.sql tpch-15.sql tpch-17.sql tpch-19.sql tpch-21.sql
tpch-02.sql tpch-04.sql tpch-06.sql tpch-08.sql tpch-10.sql tpch-12.sql tpch-14.sql tpch-16.sql tpch-18.sql tpch-20.sql tpch-22.sql
And, you can give the query as follows:
tpch=# \i queries/tpch-01.sql