Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CurrentReplicas as a minimum for fallback replicas #6450

Open
rickbrouwer opened this issue Dec 31, 2024 · 8 comments · May be fixed by #6464
Open

Use CurrentReplicas as a minimum for fallback replicas #6450

rickbrouwer opened this issue Dec 31, 2024 · 8 comments · May be fixed by #6464
Labels
feature-request All issues for new features that have not been committed to needs-discussion

Comments

@rickbrouwer
Copy link
Contributor

rickbrouwer commented Dec 31, 2024

Proposal

From the next discussion #6334 (Editing ScaledObject on Workload in Production).

My proposal is to add a new (boolean) parameter at the fallback: useCurrentReplicasAsMinimum
If this parameter is set to true and it ends up in a fallback, at least the number of replicas that are running at that moment will continue to run. If the fallback.replicas is higher, this setting will be used.

Use-Case

When you are running a large number of pods and something goes wrong, Keda uses a fallback (if configured). However, the difference between the fallback.replicas and the current number of replicas can be large.

Something going wrong can also be temporary, which means it can suddenly scale down considerably and then return to the correct number of replicas. With this feature we ensure that this does not happen.

Is this a feature you are interested in implementing yourself?

Yes

Anything else?

If you have any objections or questions regarding this issue, please let me know :)

@rickbrouwer rickbrouwer added feature-request All issues for new features that have not been committed to needs-discussion labels Dec 31, 2024
@rickbrouwer rickbrouwer linked a pull request Jan 5, 2025 that will close this issue
5 tasks
@rickbrouwer
Copy link
Contributor Author

The proposal is changed to a new parameter behavior with two options: static (default) and useCurrentReplicasAsMinimum.
static will use the fallback.replicas
useCurrentReplicasAsMinimum will use the current running replicas, unless the fallback.replicas is a higher value

@JorTurFer
Copy link
Member

JorTurFer commented Jan 18, 2025

I see the potential of this feature, but as I said in the original discussion, this only applies in one direction (and I'm not 100% sure about being the fallback the root issue). I think that we cloud improve the fallback feature with a dinamyc replica calculation, just setting the boundaries in both directions, something like:

fallback:
    failureThreshold: 3
    replicas: 10
    allowIn: bool
    allowOut: bool

We could use the new 2 fields to calculate the fallback replicas as:

  • if allowIn == false and currentReplicas > replicas, scale to currentReplicas
  • if allowIn == true and currentReplicas > replicas, scale to replicas
  • if allowOut == false and currentReplicas < replicas, scale to currentReplicas
  • if allowOut == true and currentReplicas < replicas, scale to replicas

This could give more control over how the fallback works, for example, I'd like to only ADD replicas in case of lost upstream but never remove them without upstream information

WDYT?
ping @zroubalik @wozniakjan

@rickbrouwer
Copy link
Contributor Author

For a better understanding and summary of how it is now made in #6464 .

A new parameter behavior has been created.
Two values ​​can be chosen here:

static (default)
useCurrentReplicasAsMinimum

If you choose static or nothing (because static is default), it will always go back to the fallback.replicas in case of a fallback. This can be higher or lower than the current replicas.

If you choose useCurrentReplicasAsMinimum, it will only go to the number of current replicas in case of a fallback if these are higher than the fallback.replicas. So, it will never scale down.
Now, this might seem like the option to not set a fallback and not pass deployment.replicas. After all, then it will also continue to run on the current replicas if something doesn't work. The difference is that you explicitly indicate that if a fallback occurs, you want to run at least the current replicas, but, if the fallback.replicas has a higher value, it scales up to this value and goes there as a safety measure. In summary, this is a method to ensure that it never scales back during a fallback.

@JorTurFer
Copy link
Member

Yeah, I see the use case of this new behavior, my point is that it's limited only scaling out and keeping the current value as minimum, but the opposite direction could happen, if I'm consuming messages from broker and I have included a DB in the formula, I could want to block the scaling out if I miss the DB info to ensure that I won't overload it, something like useCurrentReplicasAsMaximum. The naming that I propose was just like having useCurrentReplicasAsMaximum and useCurrentReplicasAsMinimum just with other names, but supporting both directions.

@JorTurFer
Copy link
Member

And maybe it's something that we could implement in the future if someone needs it, I just suggested the other direction to cover both cases at once if it's possible

@rickbrouwer
Copy link
Contributor Author

rickbrouwer commented Jan 23, 2025

Yeah, that's a good idea indeed!

So your behavior options could be:

static (always use fallback.replicas)
useCurrentReplicasAsMinimum (use the highest of current or fallback)
useCurrentReplicasAsMaximum (use the lowest of current or fallback)

For example:

fallback.replicas: 5

With useCurrentReplicasAsMinimum:

  • If current = 3 → use 5 (fallback because higher)
  • If current = 7 → use 7 (current because higher)

With useCurrentReplicasAsMaximum:

  • If current = 3 → use 3 (current because lower)
  • If current = 7 → use 5 (fallback because lower)

@wozniakjan
Copy link
Member

cloud improve the fallback feature with a dynamic replica calculation, just setting the boundaries in both directions

that is a good point regarding boundaries in both ways, and imho one more reason to use behavior field instead of a set of booleans. It would be easier to add new options with intuitive description and keep the test matrix smaller. With flags, we would need to either prevent setting both to true or figure out what is the desired scaling action. And more complex behaviors might complicate the boolean API even more.

static (always use fallback.replicas)
useCurrentReplicasAsMinimum (use the highest of current or fallback)
useCurrentReplicasAsMaximum (use the lowest of current or fallback)

just before we merge it, is it worth exploring alternative names? I mean they are explanatory, correct, but I always have to stop for a second to be sure about which one means what. Alternatives that I thought about

proposed useCurrentReplicasAsMinimum useCurrentReplicasAsMaximum
alt 1 CurrentReplicasUpToFallback FallbackUpToCurrentReplicas
alt 2 CurrentReplicasWithFallbackCeiling CurrentReplicasWithFallbackFloor
alt 3 CurrentReplicasIfLower CurrentReplicasIfHigher

@rickbrouwer
Copy link
Contributor Author

rickbrouwer commented Jan 23, 2025

I agree with better naming. It should be intuitively clear what the value means.
I think alt 3 should be reversed:

proposed useCurrentReplicasAsMinimum useCurrentReplicasAsMaximum
alt 3 CurrentReplicasIfHigher CurrentReplicasIfLower

This because by useCurrentReplicasAsMinimum is set, the current replicas must be used if these are higher, so CurrentReplicasIfHigher (use if current replicas is higher than fallback.replicas).
And by useCurrentReplicasAsMaximum, the current replicas must only be used if these are lower (scale down), so CurrentReplicasIfLower (use if current replicas is lower than fallback.replicas).

Or am I seeing this wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion
Projects
Status: To Triage
Development

Successfully merging a pull request may close this issue.

3 participants