[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

Ngone51 · jiangxb1987 · commit 0d4e4df06105 · 2020-04-15T11:29:55.000-07:00
### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes apache#27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
@@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: <number>`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.
 
 Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple
+executors according to its available memory and cores, and each executor will be launched in a
+separate Java VM.
 
 # Network
 
diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh
@@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#                           slave.  Default is 1.
+#                           slave.  Default is 1. Note it has been deprecate since Spark 3.0.
 #   SPARK_WORKER_PORT       The base port number for the first worker. If set,
 #                           subsequent workers will increment this number.  If
 #                           unset, Spark will find a valid port number, but

Original file line number	Diff line number	Diff line change
`@@ -22,7 +22,7 @@`
`22`	`22`	`# Environment Variables`
`23`	`23`	`#`
`24`	`24`	`# SPARK_WORKER_INSTANCES The number of worker instances to run on this`
`25`		`-# slave. Default is 1.`
	`25`	`+# slave. Default is 1. Note it has been deprecate since Spark 3.0.`
`26`	`26`	`# SPARK_WORKER_PORT The base port number for the first worker. If set,`
`27`	`27`	`# subsequent workers will increment this number. If`
`28`	`28`	`# unset, Spark will find a valid port number, but`