Skip to content
This repository was archived by the owner on Jul 31, 2023. It is now read-only.

Commit 0d4e4df

Browse files
Ngone51jiangxb1987
authored andcommitted
[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone
### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes apache#27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu <[email protected]> Signed-off-by: Xingbo Jiang <[email protected]>
1 parent 2b10d70 commit 0d4e4df

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

docs/core-migration-guide.md

+2
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,5 @@ license: |
3838
- Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding.
3939

4040
- A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: <number>`.
41+
42+
- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker.

docs/hardware-provisioning.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo
6363
the [tuning guide](tuning.html) for tips on how to reduce it.
6464

6565
Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you
66-
purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In
67-
Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node
68-
with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores
69-
per worker with `SPARK_WORKER_CORES`.
66+
purchase machines with more RAM than this, you can launch multiple executors in a single node. In
67+
Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple
68+
executors according to its available memory and cores, and each executor will be launched in a
69+
separate Java VM.
7070

7171
# Network
7272

sbin/start-slave.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
# Environment Variables
2323
#
2424
# SPARK_WORKER_INSTANCES The number of worker instances to run on this
25-
# slave. Default is 1.
25+
# slave. Default is 1. Note it has been deprecate since Spark 3.0.
2626
# SPARK_WORKER_PORT The base port number for the first worker. If set,
2727
# subsequent workers will increment this number. If
2828
# unset, Spark will find a valid port number, but

0 commit comments

Comments
 (0)