Add Recovery Logic for Failed Pod #624

duyanghao · 2018-03-14T07:38:25Z

Signed-off-by: duyanghao [email protected]

What changes were proposed in this pull request?

Add recovery logic for failed pod and fix MEM_EXCEEDED_EXIT_CODE constant.

How was this patch tested?

Manual tests show successful for recovery of failed pod as below:

make one executor pod fail(register itself failure)
driver can discover the failed pod
driver allocates a new executor pod

spark.executor.instances=5

# kubectl get pods -n=xxx -a -o wide|grep spark-debug-sar-test8
spark-debug-sar-test8           1/1       Completed     0          3m        192.168.25.92    x.x.x.x
spark-debug-sar-test8-exec-1    1/1       Completed     0          3m        192.168.25.94    x.x.x.x
spark-debug-sar-test8-exec-2    1/1       Completed     0          3m        192.168.25.93    x.x.x.x
spark-debug-sar-test8-exec-3    0/1       Error       0          3m        192.168.11.31    x.x.x.x
spark-debug-sar-test8-exec-4    0/1       Error       0          3m        192.168.11.37    x.x.x.x
spark-debug-sar-test8-exec-5    0/1       Error       0          3m        192.168.11.44    x.x.x.x
spark-debug-sar-test8-exec-6    1/1       Completed     0          48s       192.168.25.99    x.x.x.x
spark-debug-sar-test8-exec-7    1/1       Completed     0          48s       192.168.25.95    x.x.x.x
spark-debug-sar-test8-exec-8    1/1       Completed     0          48s       192.168.25.97    x.x.x.x

Signed-off-by: duyanghao <[email protected]>

Add Recovery Logic for Failed Pod

13ad5ce

Signed-off-by: duyanghao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Recovery Logic for Failed Pod #624

Add Recovery Logic for Failed Pod #624

Uh oh!

duyanghao commented Mar 14, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Recovery Logic for Failed Pod #624

Are you sure you want to change the base?

Add Recovery Logic for Failed Pod #624

Uh oh!

Conversation

duyanghao commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

duyanghao commented Mar 14, 2018 •

edited

Loading