-
Notifications
You must be signed in to change notification settings - Fork 76
Azkaban Features
The Hadoop Plugin comes with features that enable you to interact with Azkaban from the command line.
(Since version 0.5.14 - Contributed by Akshay Rai. Interactive mode since version 0.10.8 - Contributed by Pranay Hasan) To upload to Azkaban, run gradle azkabanUpload --no-daemon
. When you run this command, the task will enter into an interactive mode that will help you configure your upload information and save it to the .azkabanPlugin.json
file in your project directory.
You should add this file to your project's .gitignore
so that each developer on your team can have their own copy. Once you have an .azkabanPlugin.json
file, you can skip interactive mode by including the -PskipInteractive
command line option.
At the end of interactive mode, the task will ask for your password and start an Azkaban session. Your session information will be saved under ~/.azkaban
in a file that only you can read. As long as your session is still valid, you will not have to re-enter your password. Then the upload will start and you will see updates on the screen as to how close the upload is to completion.
Entering interactive mode. You can use the -PskipInteractive command line parameter to skip interactive mode and ONLY read from the .azkabanPlugin.json file.
Azkaban Project Name: abain-xgboost-demo
Azkaban URL: https://ltx1-holdemaz01.grid.linkedin.com:8443
Azkaban User Name: abain
Azkaban Zip Task: azkabanHadoopZip
> Building 93% > :xgboost-demo:azkabanUpload > Want to change any of the above? [y/N]:
Resuming previous Azkaban session
Once the zip is uploaded, Azkaban will validate your zip with Byte-Ray to complete the upload
Zip upload progress...
0% 20% 40% 60% 80% 100% (42393 KB)
| | | | | | | | | | |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Building 93% > :xgboost-demo:azkabanUpload
(Since version 0.5.14 - Contributed by Akshay Rai) When you run the azkabanUpload
command, the task will enter into an interactive mode that will help you configure your upload information. If you would rather configure it manually, run gradle writeAzkabanPluginJson
to write a default .azkabanPlugin.json
file to your project directory.
Edit this file to tell the Hadoop Plugin how to upload to Azkaban. You should add this file to your project's .gitignore
so that each developer on your team can have their own copy. This file has the following format:
{
"azkabanUrl": "https://theAzkabanServer.linkedin.com:8443",
"azkabanProjName": "abain-hello-pig-azkaban",
"azkabanZipTask": "azkabanHadoopZip",
"azkabanValidatorAutoFix": "true",
"azkabanUserName": "abain",
"azkabanPassword": null
}
The value for the azkabanZipTask
field in the .azkabanPlugin.json
file should be the name of a Gradle Zip task. Although you can use the name of any Gradle Zip task, you usually want to use the name of a Zip task generated by configuring the hadoopZip
block using the zip
method. To find the names of these tasks, see the section on the generated Zip tasks at Hadoop Zip Artifacts.
The azkabanUpload
task specifically requires access to a console that supports password masking. At LinkedIn, the task may tell you to set your JAVA_HOME
correctly to match the version of Java declared in your product-spec.json
(a LinkedIn-specific file) and to pass --no-daemon
on the command line (or set org.gradle.daemon=false
in your gradle.properties
file). Once you take these steps, you will be able to use the task.
At LinkedIn, if you get a message about setting your JAVA_HOME
, make one of the edits described below:
# If you get this message on your Mac, add this to your ~/.bashrc:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home # If your product-spec.json java version is 1.8
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home # If your product-spec.json java version is 1.8.0_40
# If you get this message on your Linux box, add this to your ~/.bashrc:
export JAVA_HOME=/export/apps/jdk/JDK-1_8_0_5 # If your product-spec.json java version is 1.8
export JAVA_HOME=/export/apps/jdk/JDK-1_8_0_40 # If your product-spec.json java version is 1.8.0_40
# Be sure to close and reopen all your terminal windows before you gradle azkabanUpload --no-daemon so that your .bashrc changes take effect
You will also get this message if you have set the property org.gradle.jvmargs
in your gradle.properties
file. Setting this option will prevent you from using the task. Usually people are using this property to set JVM memory properties for unit tests.
Instead of setting this property you should configure the Gradle test
block with minHeapSize
and maxHeapSize
options: https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html.
(Since version 0.10.14 - Contributed by Pranay Hasan) The Hadoop Plugin includes a number of features that allow you to interact with Azkaban from the command line. These features allow you to create a project, start or cancel a flow, and check the status of a flow without logging into the Azkaban user interface.
These tasks rely on the project information contained in your .azkabanPlugin.json
file. See the previous section about using the azkabanUpload
task to configure this file (or the section about configuring it manually).
In addition, these tasks may ask you for your Azkaban password. Be sure to read the previous section about password masking if any of these tasks fail with an error that involves access to a console that supports password masking.
This task allows you to cancel a running flow in Azkaban. Run ./gradlew azkabanCancelFlow --no-daemon
to cancel the flows for your project.
This task allows you to create a new project in Azkaban. Run ./gradlew azkabanCreateProject --no-daemon
to create the project described in your .azkabanPlugin.json
file.
This task allows you to start a flow in Azkaban. Run ./gradlew azkabanExecuteFlow --no-daemon
to start a flow for your project or ./gradlew azkabanExecuteFlow -Pflow=<comma-delimited flow names> --no-daemon
to start a particular set of flows in your project.
This task allows you to check that the status of a flow that is currently running in Azkaban. Run ./gradlew azkabanFlowStatus --no-daemon
to check the status of the flows for your project or ./gradlew azkabanFlowStatus --no-daemon -Pflow=<comma-delimited flow names>
to check the status of a particular set of flows in your project. When you use the option to specify a flow name, you will get detailed information about the status of jobs in that flow.