Skip to content

Latest commit

 

History

History
126 lines (105 loc) · 9.49 KB

configure.md

File metadata and controls

126 lines (105 loc) · 9.49 KB

Configuration

中文文档

The default value of each configuration can be modified by setting the corresponding properties in the $HBOX_HOME/conf/hbox-site.xml at the Hbox client or the parameter of --conf when submitting the application.

Application Configuration

Property Name Default Meaning
hbox.driver.memory 2048 amount of memory to use for the AM process, in MB
hbox.driver.cores 1 number of cores to use for the AM process
hbox.worker.num 1 number of worker containers to use for the application
hbox.worker.memory 1024MB amount of memory to use for the worker process
hbox.worker.cores 1 number of cores to use for the worker process
hbox.chief.worker.memory 1024 amount of memory for chief worker,especially for the index 0 worker of the TensorFlow application, default as the setting of the worker memory.
hbox.evaluator.worker.memory 1024 amount of memory for evaluator worker, especially for the TensorFlow Estimator application, default as the setting of the worker memory.
hbox.ps.num 0 number of ps containers to use for the application
hbox.ps.memory 1024MB amount of memory to use for the ps process
hbox.ps.cores 1 number of cores to use for the ps process
hbox.app.queue DEFAULT the queue which application submitted to
hbox.app.priority 3 the priority of the application, divided into level 0 to 5, corresponding to DEFAULT, VERY_LOW, LOW, NORMAL, HIGH, VERY_HIGH
hbox.input.strategy DOWNLOAD loading strategy of input file, including DOWNLOAD, STREAM, PLACEHOLDER
hbox.inputfile.rename false whether to rename the download file in the DOWNLOAD strategy of input file
hbox.stream.epoch 1 the number of the input file loading in the STREAM strategy of input file
hbox.input.stream.shuffle false whether to shuffle the input splits in the STREAM strategy of input file
hbox.inputformat.class org.apache.hadoop.mapred.TextInputFormat.class which inputformat implementation to use in the STREAM strategy of input file
hbox.inputformat.cache false whether cache the inputformat file to local when the stream epoch longer than 1
hbox.inputformat.cachefile.name inputformatCache.gz the local cache file name for inputformat
hbox.inputformat.cachesize.limit 100*1024 the limit size of the local cache file (in MB)
hbox.output.local.dir output If the local output path is not specified, the local directory of the output file is the default value.
hbox.output.strategy UPLOAD loading strategy of output file, including DOWNLOAD, STREAM
hbox.outputformat.class TextMultiOutputFormat.class which outputformat implementation to use in the STREAM strategy of output file
hbox.interresult.dir /interResult_ specify the HDFS subdirectory that the intermediate output file upload to
hbox.interresult.upload.timeout 30 * 60 * 1000 upload timeout to save the intermediate output (in milliseconds)
hbox.interresult.save.inc false increment upload the intermediate output file, default not (upload all output file each time)
hbox.tf.evaluator false whether to set the last worker as evaluator of the distributed TensorFlow job type for the estimator api
hbox.tf.distribution.strategy false whether use the distribution strategy API for the TensorFlow, default as false

Board Service Configuration

Property Name Default Meaning
hbox.tf.board.enable true If set to false, Board service is not necessary
hbox.tf.board.worker.index 0 the index of the worker which start the service of Board
hbox.tf.board.log.dir eventLog the directory saving TensorBoard event log
hbox.tf.board.history.dir /tmp/hbox/eventLog specify the HDFS path which the TensorBoard event log upload to
hbox.tf.board.reload.interval 1 how often the backend should load more data of event log (in seconds) for tensorboard
hbox.board.modelpb "" model proto in ONNX format for VisualDL
hbox.board.cache.timeout 20 memory cache timeout duration in seconds for VisualDL
hbox.tf.board.path tensorboard the path of the tensorboard
hbox.board.path visualDL the path of the visualDL

System Configuration

Property Name Default Meaning
hbox.container.extra.java.opts "" A string of extra JVM options to pass to ApplicationMaster to launch container
hbox.allocate.interval 1000ms interval between the AM get the container assigned state from RM
hbox.status.update.interval 1000ms interval between the AM report the state to RM
hbox.task.timeout 5 * 60 * 1000 communication timeout between the AM and container (in milliseconds)
hbox.task.timeout.check.interval 3 * 1000 how often the AM check the timeout of the container (in milliseconds)
hbox.localresource.timeout 5 * 60 * 1000 set the timeout of the download the localResources (in milliseconds)
hbox.messages.len.max 1000 Maximum size (in bytes) of message queue
hbox.execute.node.limit 200 Maximum number of nodes that application use
hbox.staging.dir /tmp/hbox/staging HDFS directory that application local resources upload to
hbox.cleanup.enable true whether delete the resources after the application finished
hbox.container.maxFailures.rate 0.5 maximum percentage of the failure containers
hbox.download.file.retry 3 Maximum number of retries for the input file download when the strategy of input file is DOWNLOAD
hbox.download.file.thread.nums 10 number of download threads of the input file in the strategy of DOWNLOAD
hbox.upload.output.thread.nums 10 number of upload threads of the output file in the strategy of UPLOAD
hbox.container.heartbeat.interval 10 * 1000 interval between each container to the AM (in milliseconds)
hbox.container.heartbeat.retry 3 Maximum number of retries for the container send the heartbeat to the AM
hbox.container.update.appstatus.interval 3 * 1000 how often the containers get the state of the application process (in milliseconds)
hbox.container.auto.create.output.dir true If set to true, the containers create the local output path automatically
hbox.log.pull.interval 10000 interval between the client get the log output of the AM (in milliseconds)
hbox.user.classpath.first true whether user job jar should be the first one on class path or not.
hbox.worker.mem.autoscale 0.5 automatic memory scale ratio of worker when application retry after failed.
hbox.ps.mem.autoscale 0.2 automatic memory scale ratio of ps when application retry after failed.
hbox.app.max.attempts 1 the number of application attempts, default not retry after failed.
hbox.report.container.status true whether the client report the status of the container.
hbox.env.maxlength 102400 the maximum length of environment variable when container execute the user program.
hbox.am.env.[EnvironmentVariableName] (none) Add the environment variable specified by EnvironmentVariableName to the AM process. The user can specify multiple of these to set multiple environment variables.
hbox.container.env.[EnvironmentVariableName] (none) Add the environment variable specified by EnvironmentVariableName to the Container process. The user can specify multiple of these to set multiple environment variables.
hbox.am.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes AM will be scheduled on.
hbox.worker.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes Worker will be scheduled on.
hbox.ps.nodeLabelExpression (none) A YARN node label expression that restricts the set of nodes PS will be scheduled on.

History Configuration

Property Name Default Meaning
hbox.history.log.dir /tmp/hbox/history the HDFS directory that saves the history log
hbox.history.log.delete-monitor-time-interval 24 * 60 * 60 * 1000 set the time interval by which the application history logs will be checked to clean (in milliseconds)
hbox.history.log.max-age-ms 24 * 60 * 60 * 1000 how long the history log can be saved (in milliseconds)
hbox.history.port 10021 port for the history service
hbox.history.address 0.0.0.0:10021 address for the history service
hbox.history.webapp.port 19886 port for the history http web service
hbox.history.webapp.address 0.0.0.0:19886 address for the history http web service
hbox.history.webapp.https.port 19885 port for the history https web service
hbox.history.webapp.https.address 0.0.0.0:19885 address for the history https web service

MPI Configuration

Property Name Default Meaning
hbox.mpi.install.dir /usr/local/openmpi the installation path of the openmpi
hbox.mpi.extra.ld.library.path (none) the extra library path that openmpi need
hbox.mpi.container.update.status.retry 3 the retry times for the container status update

Docker Configuration

Property Name Default Meaning
hbox.container.type yarn container running type
hbox.docker.registry.host (none) docker register host
hbox.docker.registry.port (none) docker register port
hbox.docker.image (none) docker image name
hbox.docker.worker.dir /work the work dir of the docker container