-
Notifications
You must be signed in to change notification settings - Fork 1
keith-turner/Accumulo-Parallel-Splitter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project contains a Java program that works around the slow split issue identified in [ACCUMULO-348][1]. The program works around the issue by making the split calls in parallel. To use this project use maven to build the jar using the following command. mvn package Then place the jar in <ACCUMULO_HOME>/lib/ext and then run the following command. $ ./bin/accumulo ParallelSplitter Usage : ParallelSplitter <instance> <zoo keepers> <table> <user> <pass> <num threads> <file> Some experiments were done varying the number of splits to create and the number of threads to use. These results were done on a 10 node cluster using Accumulo 1.4.0. The table being split was empty, if it had data that would probably change the times. The times were obtained by timing the process, so the times include java startup times. The results are below. ParallelSplitter times for 999 splits : 4 threads : 5.4s 8 threads : 3.0s 16 threads : 3.7s This is the time the addsplits command took for 999 splits $ time ./bin/accumulo shell -u root -p secret -e "addsplits -t foo -sf splits.txt" real 0m13.386s ParallelSplitter times for 4999 splits : 4 threads : 53.6s 8 threads : 15.0s 16 threads : 7.4s 32 threads : 20.2s This is the time the addsplits command took for 4999 splits $ time ./bin/accumulo shell -u root -p secret -e "addsplits -t foo -sf splits.txt" real 1m37.254s ParallelSplitter times for 99,999 splits : 8 threads : 408.3s 16 threads : 227.1s 32 threads : 117.7s 64 threads : 92.3s 128 threads : 119.5s This is the time the addsplits command took for 99,999 splits $ time ./bin/accumulo shell -u root -p secret -e "addsplits -t foo -sf splits.txt" real 152m15.531s About halfway though the above command I discovered that flushing the metadata table would speed things up. Doing this more frequently would have dramatically changed the time above. [1]: https://issues.apache.org/jira/browse/ACCUMULO-348
About
This is a workaround for the issue identified in ACCUMULO-348
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published