This project aims to be a base for replication of selected results from: "Blanas, Spyros, Yinan Li, and Jignesh M. Patel. "Design and evaluation of main memory hash join algorithms for multi-core CPUs." Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 2011."
This project provides a comparison of hash join algorithms on artificially generated OLAP-like datasets (with only key and value fields). The algorithms currently implemented are no-partitioning hash join and radix-partitioning hash join (as described by the study). Each program run provides a separate execution of the selected algorithm. Current implementation does not materialize join results (only counts matched keys). See phjoin --help for some more information about provided configuration options.
Project defines a variable that specifies whether to use mimalloc library for overriding malloc calls:
PHJ_USE_MIMALLOC - CMAKE option that specifies whether to use Mimalloc or not.
PHJ_MIMALLOC_DLL_DIR - if PHJ_USE_MIMALLOC is ON and OS is Windows, then this CMake variable must be set to the directory containing all DLLs required by Mimalloc.
Also one can build GTest suite by setting the option:
PHJ_BUILD_TESTS
- Boost 1.7.2+
- GTest 1.11.0+
- XXHash 0.7.4+
- CMake 3.8+
mkdir build
cd build
cmake ..
cmake --build .
Small Google Test suite is provided.
Compilers:
- MSVC 14.25
- GCC 7.4.0
Libraries:
- Boost 1.7.2
- GTest 1.11.0
- XXHash 0.7.4
- CMake 3.8
This project is licensed under the MIT License.
Experiment results were obtained with default relation size values, i.e. 10 million tuples for the primary and 200 million tuples for the secondary relation. The tests were run on a local machine with AMD Ryzen 7 3700X 8-Core Processor (16 logical cores) 3.6Ghz and 32GB RAM (2400Mhz).
See directory results for details.
- add support for measuring CPU cache misses