Skip to content

Conversation

@erikerlandson
Copy link
Collaborator

Add resource reqs to driver and executor containers, corresponding to spark resource settings for cores and memory.

foxish and others added 13 commits November 15, 2016 09:21
* Use images with spark pre-installed

* simplify staging for client.jar

* Remove some tarball-uri code.  Fix kube client URI in scheduler backend.  Number executors default to 1

* tweak client again, works across my testing environments

* use executor.sh shim

* allow configuration of service account name for driver pod

* spark image as a configuration setting instead of env var

* namespace from spark.kubernetes.namespace

* configure client with namespace; smooths out cases when not logged in as admin

* Assume a download jar to /opt/spark/kubernetes to avoid dropping protections on /opt
* Add support for dynamic executors

* fill in some sane logic for doKillExecutors

* doRequestTotalExecutors signals graceful executor shutdown, and favors idle executors
@erikerlandson
Copy link
Collaborator Author

This initial push contains the logic to correctly align resource requests for the containers with the spark resource settings. However, when a pod fails to schedule due to insufficient resources, it remains in Pending state, and so there is still the possibility of some arbitrarily large number of pods being scheduled by dynamic executors, that will hang around Pending.

Before this merges, I want to add some logic for detecting when new executors are stuck in Pending so that it can skip trying to spin up new ones.

@foxish foxish force-pushed the k8s-support branch 2 times, most recently from 7566d27 to f6ccb54 Compare December 7, 2016 00:28
@mccheah
Copy link
Collaborator

mccheah commented Feb 22, 2017

@erikerlandson - close this in favor of what we have on https://github.com/apache-spark-on-k8s/spark/?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants