The ES-Fastloader uses the fault tolerance and parallelism of Hadoop and builds individual ElasticSearch shards in multiple reducer nodes, then transfers shards to ElasticSearch cluster for serving. The loader will create a Hadoop job to read data from data files in HDFS, repartitions it on a per-node basis, and finally writes the generated indices to ES shards. In DiDi we have been using ES-Fastloader to create large-scale ElasticSearch indices from TB/PB level sequence files in Hive.
- Supports batch construction of ES indexes, which can quickly process dozens of terabytes of data in 1-2 hours, and solve the low-efficiency problem when building massive ES index files.
- Support the horizontal expansion of computing power, and facilitate the expansion. By increasing the machine resources, you can further increase the index construction speed and the amount of data processed.
- JDK: 8 or greater
- ElasticSearch: 6.6.X or greater
- cd mr
- mvn clean package -Dmaven.test.skip
- Launch --run in hadoop cluster
- hadoop jar mr-1.0.0-SNAPSHOT-with-dep.jar com.didichuxing.datachannel.arius.fastindex.FastIndex $PARAM
- API document wiki
- Read core library source code
- Read main class
- Read Release notes
Welcome to contribute by creating issues or sending pull requests. See Contributing Guide for guidelines.
ES-Fastloader is licensed under the Apache License 2.0. See the LICENSE file.