Skip to content

Azkaban Features

Alex Bain edited this page Jun 28, 2017 · 23 revisions

Table of Contents

Azkaban Features

The Hadoop Plugin comes with features that enable you to interact with Azkaban from the command line.

Upload to Azkaban

(Since version 0.5.14 - Contributed by Akshay Rai. Interactive mode since version 0.10.8 - Contributed by Pranay Hasan) To upload to Azkaban, run gradle azkabanUpload --no-daemon. When you run this command, the task will enter into an interactive mode that will help you configure your upload information and save it to the .azkabanPlugin.json file in your project directory.

You should add this file to your project's .gitignore so that each developer on your team can have their own copy. Once you have an .azkabanPlugin.json file, you can skip interactive mode by including the -PskipInteractive command line option.

At the end of interactive mode, the task will ask for your password and start an Azkaban session. Your session information will be saved under ~/.azkaban in a file that only you can read. As long as your session is still valid, you will not have to re-enter your password. Then the upload will start and you will see updates on the screen as to how close the upload is to completion.

Entering interactive mode. You can use the -PskipInteractive command line parameter to skip interactive mode and ONLY read from the .azkabanPlugin.json file.
Azkaban Project Name: abain-xgboost-demo
Azkaban URL: https://ltx1-holdemaz01.grid.linkedin.com:8443
Azkaban User Name: abain
Azkaban Zip Task: azkabanHadoopZip
> Building 93% > :xgboost-demo:azkabanUpload > Want to change any of the above? [y/N]: 
Resuming previous Azkaban session
Once the zip is uploaded, Azkaban will validate your zip with Byte-Ray to complete the upload
Zip upload progress...
0%                20%                 40%                 60%                 80%                 100% (42393 KB)
|        |         |         |         |         |         |         |         |         |         |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Building 93% > :xgboost-demo:azkabanUpload
Configure Azkaban Upload Information

(Since version 0.5.14 - Contributed by Akshay Rai) When you run the azkabanUpload command, the task will enter into an interactive mode that will help you configure your upload information. If you would rather configure it manually, run gradle writeAzkabanPluginJson to write a default .azkabanPlugin.json file to your project directory.

Edit this file to tell the Hadoop Plugin how to upload to Azkaban. You should add this file to your project's .gitignore so that each developer on your team can have their own copy. This file has the following format:

{
    "azkabanUrl": "https://theAzkabanServer.linkedin.com:8443",
    "azkabanProjName": "abain-hello-pig-azkaban",
    "azkabanZipTask": "azkabanHadoopZip",
    "azkabanValidatorAutoFix": "true",
    "azkabanUserName": "abain",
    "azkabanPassword": null
}

The value for the azkabanZipTask field in the .azkabanPlugin.json file should be the name of a Gradle Zip task. Although you can use the name of any Gradle Zip task, you usually want to use the name of a Zip task generated by configuring the hadoopZip block using the zip method. To find the names of these tasks, see the section on the generated Zip tasks at Hadoop Zip Artifacts.

Password Masking

The azkabanUpload task specifically requires access to a console that supports password masking. At LinkedIn, the task may tell you to set your JAVA_HOME correctly to match the version of Java declared in your product-spec.json (a LinkedIn-specific file) and to pass --no-daemon on the command line (or set org.gradle.daemon=false in your gradle.properties file). Once you take these steps, you will be able to use the task.

At LinkedIn, if you get a message about setting your JAVA_HOME, make one of the edits described below:

# If you get this message on your Mac, add this to your ~/.bashrc:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home    # If your product-spec.json java version is 1.8
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home    # If your product-spec.json java version is 1.8.0_40
 
# If you get this message on your Linux box, add this to your ~/.bashrc:
export JAVA_HOME=/export/apps/jdk/JDK-1_8_0_5     # If your product-spec.json java version is 1.8
export JAVA_HOME=/export/apps/jdk/JDK-1_8_0_40    # If your product-spec.json java version is 1.8.0_40
 
# Be sure to close and reopen all your terminal windows before you gradle azkabanUpload --no-daemon so that your .bashrc changes take effect

You will also get this message if you have set the property org.gradle.jvmargs in your gradle.properties file. Setting this option will prevent you from using the task. Usually people are using this property to set JVM memory properties for unit tests.

Instead of setting this property you should configure the Gradle test block with minHeapSize and maxHeapSize options: https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html.

Azkaban CLI Features

(Since version 0.10.14 - Contributed by Pranay Hasan) The Hadoop Plugin includes a number of features that allow you to interact with Azkaban from the command line. These features allow you to create a project, start or cancel a flow, and check the status of a flow without logging into the Azkaban user interface.

These tasks rely on the project information contained in your .azkabanPlugin.json file. See the previous section about using the azkabanUpload task to configure this file (or the section about configuring it manually).

In addition, these tasks may ask you for your Azkaban password. Be sure to read the previous section about password masking if any of these tasks fail with an error that involves access to a console that supports password masking.

azkabanCancelFlow Task

This task allows you to cancel a running flow in Azkaban. Run ./gradlew azkabanCancelFlow --no-daemon to cancel the flows for your project.

azkabanCreateProject Task

This task allows you to create a new project in Azkaban. Run ./gradlew azkabanCreateProject --no-daemon to create the project described in your .azkabanPlugin.json file.

azkabanExecuteFlow Task

This task allows you to start a flow in Azkaban. Run ./gradlew azkabanExecuteFlow --no-daemon to start a flow for your project or ./gradlew azkabanExecuteFlow -Pflow=<comma-delimited flow names> --no-daemon to start a particular set of flows in your project.

azkabanFlowStatus Task

This task allows you to check that the status of a flow that is currently running in Azkaban. Run ./gradlew azkabanFlowStatus --no-daemon to check the status of the flows for your project or ./gradlew azkabanFlowStatus --no-daemon -Pflow=<comma-delimited flow names> to check the status of a particular set of flows in your project. When you use the option to specify a flow name, you will get detailed information about the status of jobs in that flow.