-
Notifications
You must be signed in to change notification settings - Fork 76
Hadoop Validator
The Hadoop Plugin includes the Hadoop Validator, which provides Gradle tasks that perform local validation of your Hadoop jobs. In particular, the Hadoop Validator includes tasks for data validation, schema validation and syntax checking for Hadoop ecosystem jobs.
These tasks should deliver a signficant boost in developer productivity by enabling you to validate your Hadoop jobs locally at build time, which helps you to avoid the process of waiting for your Hadoop job to be submitted to the cluster, only to see the job fail due to a trivial error.
Currently, the Hadoop Validator provides validation tasks for Apache Pig jobs. However, the Hadoop Validator is built in such a way that it can be easily extended for other Hadoop ecosystem bundles like Apache Hive or Apache Spark.
Please note that the Hadoop Validator is currently an experimental feature.
Many of the validation tasks depend on information stored in the .hadoopValidatorProperties
file in the project directory. If this file does not exist, it will be automatically created by the Hadoop Plugin.
To execute the Hadoop Validator tasks for your project, run ./gradlew hadoopValidate
. The Hadoop Validator will examine all the jobs configured with the Hadoop DSL for your project and attempt to validate them.
Currently, the hadoopValidate
task executes the pigValidate
task for Apache Pig jobs.
The pigValidate
task finds the Apache Pig jobs configured with the Hadoop DSL for your project and executes the pigDataExists
, pigDependencyExists
and pigSyntaxValidator
tasks described below.
This task checks for existence of data files loaded by Apache Pig in HDFS, whose NameNode address must be declared in the .hadoopValidatorProperties
file.
This task checks for existence of jar dependencies declared in Apache Pig scripts. The jar dependencies can be local, located in HDFS or in an Ivy repository. For dependencies located in HDFS or in an Ivy repository, the HDFS NameNode address or Ivy repository URL must be declared in the .hadoopValidatorProperties
file.
This task checks the syntax for Apache Pig scripts and generates a parameter substitution file, which is an input to the other Apache Pig Validation tasks.