The default value of each configuration can be modified by setting the corresponding properties in the $HBOX_HOME/conf/hbox-site.xml
at the Hbox client or the parameter of --conf
when submitting the application.
Property Name | Default | Meaning |
---|---|---|
hbox.driver.memory | 2048 | amount of memory to use for the AM process, in MB |
hbox.driver.cores | 1 | number of cores to use for the AM process |
hbox.worker.num | 1 | number of worker containers to use for the application |
hbox.worker.memory | 1024MB | amount of memory to use for the worker process |
hbox.worker.cores | 1 | number of cores to use for the worker process |
hbox.chief.worker.memory | 1024 | amount of memory for chief worker,especially for the index 0 worker of the TensorFlow application, default as the setting of the worker memory. |
hbox.evaluator.worker.memory | 1024 | amount of memory for evaluator worker, especially for the TensorFlow Estimator application, default as the setting of the worker memory. |
hbox.ps.num | 0 | number of ps containers to use for the application |
hbox.ps.memory | 1024MB | amount of memory to use for the ps process |
hbox.ps.cores | 1 | number of cores to use for the ps process |
hbox.app.queue | DEFAULT | the queue which application submitted to |
hbox.app.priority | 3 | the priority of the application, divided into level 0 to 5, corresponding to DEFAULT, VERY_LOW, LOW, NORMAL, HIGH, VERY_HIGH |
hbox.input.strategy | DOWNLOAD | loading strategy of input file, including DOWNLOAD, STREAM, PLACEHOLDER |
hbox.inputfile.rename | false | whether to rename the download file in the DOWNLOAD strategy of input file |
hbox.stream.epoch | 1 | the number of the input file loading in the STREAM strategy of input file |
hbox.input.stream.shuffle | false | whether to shuffle the input splits in the STREAM strategy of input file |
hbox.inputformat.class | org.apache.hadoop.mapred.TextInputFormat.class | which inputformat implementation to use in the STREAM strategy of input file |
hbox.inputformat.cache | false | whether cache the inputformat file to local when the stream epoch longer than 1 |
hbox.inputformat.cachefile.name | inputformatCache.gz | the local cache file name for inputformat |
hbox.inputformat.cachesize.limit | 100*1024 | the limit size of the local cache file (in MB) |
hbox.output.local.dir | output | If the local output path is not specified, the local directory of the output file is the default value. |
hbox.output.strategy | UPLOAD | loading strategy of output file, including DOWNLOAD, STREAM |
hbox.outputformat.class | TextMultiOutputFormat.class | which outputformat implementation to use in the STREAM strategy of output file |
hbox.interresult.dir | /interResult_ | specify the HDFS subdirectory that the intermediate output file upload to |
hbox.interresult.upload.timeout | 30 * 60 * 1000 | upload timeout to save the intermediate output (in milliseconds) |
hbox.interresult.save.inc | false | increment upload the intermediate output file, default not (upload all output file each time) |
hbox.tf.evaluator | false | whether to set the last worker as evaluator of the distributed TensorFlow job type for the estimator api |
hbox.tf.distribution.strategy | false | whether use the distribution strategy API for the TensorFlow, default as false |
Property Name | Default | Meaning |
---|---|---|
hbox.tf.board.enable | true | If set to false, Board service is not necessary |
hbox.tf.board.worker.index | 0 | the index of the worker which start the service of Board |
hbox.tf.board.log.dir | eventLog | the directory saving TensorBoard event log |
hbox.tf.board.history.dir | /tmp/hbox/eventLog | specify the HDFS path which the TensorBoard event log upload to |
hbox.tf.board.reload.interval | 1 | how often the backend should load more data of event log (in seconds) for tensorboard |
hbox.board.modelpb | "" | model proto in ONNX format for VisualDL |
hbox.board.cache.timeout | 20 | memory cache timeout duration in seconds for VisualDL |
hbox.tf.board.path | tensorboard | the path of the tensorboard |
hbox.board.path | visualDL | the path of the visualDL |
Property Name | Default | Meaning |
---|---|---|
hbox.container.extra.java.opts | "" | A string of extra JVM options to pass to ApplicationMaster to launch container |
hbox.allocate.interval | 1000ms | interval between the AM get the container assigned state from RM |
hbox.status.update.interval | 1000ms | interval between the AM report the state to RM |
hbox.task.timeout | 5 * 60 * 1000 | communication timeout between the AM and container (in milliseconds) |
hbox.task.timeout.check.interval | 3 * 1000 | how often the AM check the timeout of the container (in milliseconds) |
hbox.localresource.timeout | 5 * 60 * 1000 | set the timeout of the download the localResources (in milliseconds) |
hbox.messages.len.max | 1000 | Maximum size (in bytes) of message queue |
hbox.execute.node.limit | 200 | Maximum number of nodes that application use |
hbox.staging.dir | /tmp/hbox/staging | HDFS directory that application local resources upload to |
hbox.cleanup.enable | true | whether delete the resources after the application finished |
hbox.container.maxFailures.rate | 0.5 | maximum percentage of the failure containers |
hbox.download.file.retry | 3 | Maximum number of retries for the input file download when the strategy of input file is DOWNLOAD |
hbox.download.file.thread.nums | 10 | number of download threads of the input file in the strategy of DOWNLOAD |
hbox.upload.output.thread.nums | 10 | number of upload threads of the output file in the strategy of UPLOAD |
hbox.container.heartbeat.interval | 10 * 1000 | interval between each container to the AM (in milliseconds) |
hbox.container.heartbeat.retry | 3 | Maximum number of retries for the container send the heartbeat to the AM |
hbox.container.update.appstatus.interval | 3 * 1000 | how often the containers get the state of the application process (in milliseconds) |
hbox.container.auto.create.output.dir | true | If set to true, the containers create the local output path automatically |
hbox.log.pull.interval | 10000 | interval between the client get the log output of the AM (in milliseconds) |
hbox.user.classpath.first | true | whether user job jar should be the first one on class path or not. |
hbox.worker.mem.autoscale | 0.5 | automatic memory scale ratio of worker when application retry after failed. |
hbox.ps.mem.autoscale | 0.2 | automatic memory scale ratio of ps when application retry after failed. |
hbox.app.max.attempts | 1 | the number of application attempts, default not retry after failed. |
hbox.report.container.status | true | whether the client report the status of the container. |
hbox.env.maxlength | 102400 | the maximum length of environment variable when container execute the user program. |
hbox.am.env.[EnvironmentVariableName] | (none) | Add the environment variable specified by EnvironmentVariableName to the AM process. The user can specify multiple of these to set multiple environment variables. |
hbox.container.env.[EnvironmentVariableName] | (none) | Add the environment variable specified by EnvironmentVariableName to the Container process. The user can specify multiple of these to set multiple environment variables. |
hbox.am.nodeLabelExpression | (none) | A YARN node label expression that restricts the set of nodes AM will be scheduled on. |
hbox.worker.nodeLabelExpression | (none) | A YARN node label expression that restricts the set of nodes Worker will be scheduled on. |
hbox.ps.nodeLabelExpression | (none) | A YARN node label expression that restricts the set of nodes PS will be scheduled on. |
Property Name | Default | Meaning |
---|---|---|
hbox.history.log.dir | /tmp/hbox/history | the HDFS directory that saves the history log |
hbox.history.log.delete-monitor-time-interval | 24 * 60 * 60 * 1000 | set the time interval by which the application history logs will be checked to clean (in milliseconds) |
hbox.history.log.max-age-ms | 24 * 60 * 60 * 1000 | how long the history log can be saved (in milliseconds) |
hbox.history.port | 10021 | port for the history service |
hbox.history.address | 0.0.0.0:10021 | address for the history service |
hbox.history.webapp.port | 19886 | port for the history http web service |
hbox.history.webapp.address | 0.0.0.0:19886 | address for the history http web service |
hbox.history.webapp.https.port | 19885 | port for the history https web service |
hbox.history.webapp.https.address | 0.0.0.0:19885 | address for the history https web service |
Property Name | Default | Meaning |
---|---|---|
hbox.mpi.install.dir | /usr/local/openmpi | the installation path of the openmpi |
hbox.mpi.extra.ld.library.path | (none) | the extra library path that openmpi need |
hbox.mpi.container.update.status.retry | 3 | the retry times for the container status update |
Property Name | Default | Meaning |
---|---|---|
hbox.container.type | yarn | container running type |
hbox.docker.registry.host | (none) | docker register host |
hbox.docker.registry.port | (none) | docker register port |
hbox.docker.image | (none) | docker image name |
hbox.docker.worker.dir | /work | the work dir of the docker container |