-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule: impl balance range scheduler #9005
schedule: impl balance range scheduler #9005
Conversation
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Skipping CI for Draft Pull Request. |
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
50d905f
to
7f0ed67
Compare
/test pull-integration-realcluster-test |
@bufferflies: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: 童剑 <[email protected]>
120a2bf
to
2854acd
Compare
} | ||
|
||
opInfluence := s.OpController.GetOpInfluence(cluster.GetBasicCluster(), operator.WithRangeOption(job.Ranges)) | ||
plan, err := s.prepare(cluster, opInfluence, job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every time we schedule, it will scan the range from the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the info on the regions and stores in the given ranges may be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it affect the other requests like get region or heartbeat if the range is large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the operator of scanning regions acquires Rlock which shares with the region heartbeat. We could reuse the previous distribution by interval duration, not every scheduler.
1628d72
to
40f0f56
Compare
Signed-off-by: 童剑 <[email protected]>
40f0f56
to
246aaa6
Compare
balanceRangeCounter.Inc() | ||
job := s.conf.peek() | ||
if job == nil { | ||
balanceRangeNoJobCounter.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we need not to count this. It is enough that set it true when there is no job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will count if all the jobs are finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using RunningJobCounter
rather than NoJobCounter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if conf.peek()
is nil, it indicates that all the jobs in the configuration are finished, so the metrics just tell DBA that there is no unable job in this scheduler.
now := time.Now() | ||
job.Start = &now | ||
job.Status = running | ||
if err := conf.save(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not recommended that
- modify the input param
- call conf.save() to persist the modified input struct
I prefer more explicit implement. For example, we can use jobs map[JobID]balanceRangeSchedulerJob
and input JobID
here rather than Job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rleungx the functions must ensure that jobs list must contains the given job.
|
||
targetInfluence := p.opInfluence.GetStoreInfluence(p.targetStoreID()) | ||
targetInf := p.job.Role.getStoreInfluence(targetInfluence) | ||
if targetInf < 0 { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if the store has some remove-peer operators
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must consider the scenario involving removing-peer
, so the targeting
is negative, the target score is small. This causes too many peers to be added to this store.
} | ||
targetScore := p.targetScore + targetInf + p.tolerate | ||
|
||
shouldBalance := sourceScore >= targetScore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments to explan the principle? Why is the score calculated this way? What is the score difference from the balanceRegionScheduler? How are conflicts between different balance schedulers handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The balanceRegionScheduler
and balanceLeaderScheduler
only consider the global regions not the given key ranges, so the store status can calculate their score. But the balance-range-scheduler
is different, the given key ranges should set it
ccaa591
to
2e61a74
Compare
2e61a74
to
132d854
Compare
func (conf *balanceRangeSchedulerConfig) begin(index int) *balanceRangeSchedulerJob { | ||
conf.Lock() | ||
defer conf.Unlock() | ||
job := conf.jobs[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we have already had the job, why do we need to get it again?
func (conf *balanceRangeSchedulerConfig) finish(index int) *balanceRangeSchedulerJob { | ||
conf.Lock() | ||
defer conf.Unlock() | ||
job := conf.jobs[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
switch r { | ||
case leader: | ||
return influence.LeaderCount | ||
case follower: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it conflict with balance region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,the next MR will solve it
Signed-off-by: 童剑 <[email protected]>
132d854
to
5795f44
Compare
Signed-off-by: 童剑 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
@@ -463,7 +463,7 @@ func (s *balanceRangeScheduler) prepare(cluster sche.SchedulerCluster, opInfluen | |||
averageScore: averageScore, | |||
job: job, | |||
opInfluence: opInfluence, | |||
tolerate: tolerate, | |||
tolerate: tolerantSizeRatio, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tolerantSizeRatio
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: okJiang, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: Close #9006
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Release note