-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: Multi-cluster workload scheduling & balancing #31
base: main
Are you sure you want to change the base?
KEP: Multi-cluster workload scheduling & balancing #31
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yue9944882 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. But I think we may also consider whether the proportion distribution can be done in placement API as the alternative and what is the pros and cons.
For example, we could have a replica field in placement API and in each placementDecision, have a field to specify the replica for a cluster. It seems possible, because in this case, placement is to define how to put N replicas to M clusters, and each decision result tells how many replicas should be put in selected clusters. WDYT?
# The target namespace to deploy the workload in the spoke cluster. | ||
spokeNamespace: default | ||
# The content of target workload, supporting: | ||
# - Inline: Embedding a static manifest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit the type of allowed resource here? For example, it can only be resources that can scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u mean clarify the limit in the comment/doc? in the implementation, we can check whether a resource has /scale
by api-discovery, the RESTMapper in the native client library requires only group-version-kind to verify the precondition so i guess it's not necessary to assert the resource metadata explicitly in the api spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an admission control for allowed resources, or what it a user specify a resource here that cannot scale?
kind: ElasticWorkload | ||
spec: | ||
# The target namespace to deploy the workload in the spoke cluster. | ||
spokeNamespace: default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just let this workload to be deployed on spoke in the same ns of this resource on hub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
practically i think that will work for most cases b/c we are usually managing one namespace per application. but am not sure if that will apply to all cases.
# - Even: Filling the min replicas upon every round. i.e. max-min | ||
# - Weighted: Setting a default weight and overriding the weight for a | ||
# few clusters on demand. | ||
distributionStrategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a cluster is added into or removed from the decision of the related placement, will the distribution be recalculated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i think so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we ensure that limitRange.min is satisfied in this case? I think the API can only ensure how evenly the replicas are distributed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my original idea is the final distribution is calculated in two phases (1) initial-distribution i.e. distributionStrategy
(2) second-time re-distribution i.e. balanceStrategy
. so it the initial result from even distribution strategy doesnt conform to the requirement by .limitRange.min
. the final distributed result will be round up to .limitRange.min
. and additionally if .limitRange.min
* selectedClusters
>= the expected total replicas, the reconcile loop should be returning w/o applying any actual changes.
# few clusters on demand. | ||
distributionStrategy: | ||
totalReplicas: 10 | ||
type: [ Even | Propotional ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Proportional
How does proportional be specified? and is Even/proportional are hard or soft limit? Should we mimic the pod spreading policy in kube with a MaxSurge, so MaxSurge=1 actually means an even distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised to Weighted
. as for MaxSurge
, am a bit leaning on Weighted
b/c it looks more intuitive from user's perspective.
the kubernetes community, our multi-cluster workload controller should not raise any | ||
additional requirement on the managing workload API except for enabling the standard | ||
[scale](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource) | ||
subresource via the CRD. Hence, to scale up or down the local workload, the controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is manifestwork is to apply "Any" resources, and most of them dost not scale. To support his, we probably need to have field in the manifestwork to override the replica path in the manifests of manifestwork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am not sure what kind of built-in support we want from the manifestwork api in the current phase. a random idea i can think of is to add a new types of remediation type e.g. UpdateScaleSubresource
which optionally updates local delivered resources via /scale
subresource iff the replicas is the only difference from the existing state and expectation.
fc42cb6
to
1ce0011
Compare
# - Even: Filling the min replicas upon every round. i.e. max-min | ||
# - Weighted: Setting a default weight and overriding the weight for a | ||
# few clusters on demand. | ||
distributionStrategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we ensure that limitRange.min is satisfied in this case? I think the API can only ensure how evenly the replicas are distributed.
# those clusters under the "min" will be the primary choices. | ||
# * "max": the controller will exclude the cluster exceeding the "max" | ||
# from the list of candidates upon re-scheduling. | ||
# - Classful: A classful prioritized rescheduling policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems Classful cover all the cases in LimitRange. Why we need the two options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in the first stage, we will leave the Classful
unimplemented, just None
and LimitRange
in the alpha api should be sufficient
# few clusters on demand. | ||
distributionStrategy: | ||
totalReplicas: 10 | ||
type: [ Even | Weighted ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to specify cluster weight, at "Placement.prioritizerPolicy"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarification "Placement.prioritizerPolicy" only takes effect during cluster selection, while the Weighted
distribution indicates the distribution of replicas for the workload. the a sample of weighted distribution will be something like:
spec:
distributionStrategy:
totalReplicas: 10
type: Weighted
weighted:
defaultWeight: 10
overrides:
- clusterName: xx
weight: 100
cc @qiujian16 @deads2k