Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balloons: add support for explicit CPUs #352

Closed
wants to merge 3 commits into from

Conversation

fmuyassarov
Copy link
Collaborator

Introduce a new parameter (preferCpus) in Balloons policy for giving preferred CPU sets on each balloon type.
When combining preferIsolCpus with preferCpus, priority will be given to isolated CPUs over the user-specified set of CPUs if these are two different sets.

This commit introduces a new field in the balloons configuration CR,
enabling users to specify their CPU preferences for workloads.
When a user explicitly requests a CPU set, the policy will attempt
to allocate CPUs from the specified set. If the request cannot be
satisfied, the policy will fall back to other available CPUs.

If a user adds `preferIsolCpus` in addition to specifying preferred
CPUs, the policy will prioritize allocating isolated CPUs over the
preferred CPUs set, in case these sets are different.

Signed-off-by: Feruzjon Muyassarov <[email protected]>
Signed-off-by: Feruzjon Muyassarov <[email protected]>
@fmuyassarov fmuyassarov requested review from klihub, marquiz and askervin and removed request for klihub and marquiz July 10, 2024 09:58
Copy link
Collaborator

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.

@fmuyassarov
Copy link
Collaborator Author

Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.

I was expecting this question 😄
To be honest I don't know weather it is good or bad idea to allow users for setting CPUs explicitly. I just had this task on my list. Perhaps @askervin has something to add.

@klihub
Copy link
Collaborator

klihub commented Jul 10, 2024

Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.

I was expecting this question 😄
To be honest I don't know weather it is good or bad idea to allow users for setting CPUs explicitly. I just had this task on my list. Perhaps @askervin has something to add.

So @askervin knows about this and is okay with this ? I'll wait for his review then. I'd rather not open this Pandora's box if we can avoid it in any way. Usually the user does not really want any specific CPU or cpuset by ID(s). Instead they want their workload to be topologically close to some memory controller or one or more PCI devices and that's why they ask for the ability to set specific cpusets for workloads. However, almost always it is better to provide a way for the user/workload to describe what they need ("this workload needs to be close to a PCI device of a particular vendor and class" or "this workload needs to be close to that other workload") and then let the policy in use implement the necessary means of how that is achieved. So, on UI-/UX-visible interfaces (such as workload annotations or policy configuration) we ususlly prefer things to be described in terms of what is needed not in terms of how that could be achieved on some piece of hardware, especially since the latter might not be portable across cluster nodes. Of course, there are exceptions to every rule...

@@ -154,6 +154,10 @@ Balloons policy parameters:
separate `cpu.classes` objects, see below.
- `preferCloseToDevices`: prefer creating new balloons close to
listed devices. List of strings
- `preferCpus`: prefer allocating CPUs from an explicitly defined
set of CPUs. Warning: when combining `preferIsolCpus` with `preferCpus`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can remove word "Warning" from this sentence. This is a good specification about the priorities of the two preferences.

This feature is definitely worth a warning, though, because it breaks the idea that the policy can decide which CPUs it puts to which balloon instance. Using this feature makes the policy configuration platform dependent. Things are very likely go wrong if this is for all nodes in a heterogenous clusters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmuyassarov, you can see excellent content for the warning in @klihub's comment: #352 (comment)

@@ -56,6 +56,9 @@ const (
// virtDevIsolatedCpus is the name of a virtual device close to
// host isolated CPUs.
virtDevIsolatedCpus = "isolated CPUs"
// virtDevPreferredCpus represents a virtual device that is
// associated with a user-preferred set of CPUs
virtDevPreferredCpus = "user preferred CPUs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a single static virtual device name, but there can be many balloon types that have different preferCpus parameter sets.

Maybe you should have two or three balloon types with preferCpus parameters in e2e tests. Two of those could have even overlapping preferCpus. And in addition there should be at least one balloon type without this parameter. Then you can make sure that balloons of the balloon type without the preferCpus parameter will get CPUs that are not mentioned in any of the preferCpus of other balloon types.

When thinking of this case, it sounds to me that you will need a bunch of new virtual devices,. One for each balloon type with this parameter.

@@ -1202,6 +1206,9 @@ func (p *balloons) fillCloseToDevices(blnDefs []*BalloonDef) {
if blnDef.PreferIsolCpus {
blnDef.PreferCloseToDevices = append(blnDef.PreferCloseToDevices, virtDevIsolatedCpus)
}
if cpuset.MustParse(blnDef.PreferCpus).Size() != 0 {
Copy link
Collaborator

@askervin askervin Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error handling for invalid preferCpus, and the value is parsed more than once. The opposite would be better.

Would you change this so that you parse each preferCpus only once and store the value to an internal attribute of balloon type? If you'd do this as part of configuration validation, then you could give a nice error message on parse errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MustParse doesn't return an error so I guess moving the caller to validateConfig won't change anything in terms of error handling. But, I think we can use cpuset.Parse() similarly to this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I'm okay to move the calling the MustParse/Parse to validateConfig() .

Copy link
Collaborator Author

@fmuyassarov fmuyassarov Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But probably we should all agree first if this PR is something that can be beneficial and merged. Because otherwise I will waste my & your time trying to fix something that we don't want to take in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MustParse doesn't return an error so I guess moving the caller to validateConfig won't change anything in terms of error handling. But, I think we can use cpuset.Parse() similarly to this?

Yes, exactly. We definitely do not want to crash on a typo in a config.

But probably we should all agree first if this PR is something that can be beneficial and merged. Because otherwise I will waste my & your time trying to fix something that we don't want to take in.

I agree. I think this PR is a very good start for this feature, thank you for writing it! Let's continue if @klihub and @kad show green(ish) light for this. I think you can put this PR into the draft state.

Would you proceed with preferCoreType (or preferPCores/preferECores... feel free to decide what would be the most suitable parameter name) instead? You should be able to test it in our e2e vm by adding environment variables to balloons policy container: look for OVERRIDE_SYS_CORE_CPUS and OVERRIDE_SYS_ATOM_CPUS from the code. Those variables enable faking that there are P and E cores in the system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you proceed with preferCoreType (or preferPCores/preferECores... feel free to decide what would be the most suitable parameter name) instead? You should be able to test it in our e2e vm by adding environment variables to balloons policy container: look for OVERRIDE_SYS_CORE_CPUS and OVERRIDE_SYS_ATOM_CPUS from the code. Those variables enable faking that there are P and E cores in the system.

Yes, I will proceed with that meanwhile

@askervin
Copy link
Collaborator

Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.

Based on the requests and wishes seen so far regarding CPU affinity, I'm leaning towards handing the gun to the user with all the big fat warnings on it (many picked from your inputs above). Use this only in case of emergency. You will lose all the nice features. The warranty is void if this seal is broken. Not for production use.

The reason is that I'm sure that also in the future there will be special cases where balloons simply is not smart enough out-of-the-box. There could be a special process or a feature or what not running/watching/tracing/enabled exactly on certain core, for instance. Or it might be that (once again) balloons would not be smart enough to fulfill a requirement as the precedence of its preferences is not configurable or something. And because there is no easy alternative for handling such special cases, it would be sad if affinity of single difficult workload would prevent using the balloons policy (and naturally the topology aware policy, too). For this kind of purposes I'd like to enable users to have this parameter in their sleave if everything else fails and there is no time to wait until balloons has proper support for their request.

But to be honest, I'm not leaning towards this direction without hesitation. I see it as a risk if anyone builds a production system that uses this option. I see it also as a risk that we might not get a feature request if someone can hack around their problem by using this feature. These are big risks. Maybe we could add to the list of warnings: "never use this parameter without telling the resource policy developers what they should implement to this policy so that you could stop using this parameter".

If you see better, I think we can postpone this feature and let @fmuyassarov work on P/E cores, which is definitely worth supporting. And adding that support immediately removes one reason to use preferCpus. :D

@klihub
Copy link
Collaborator

klihub commented Jul 12, 2024

Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.

Based on the requests and wishes seen so far regarding CPU affinity, I'm leaning towards handing the gun to the user with all the big fat warnings on it (many picked from your inputs above). Use this only in case of emergency. You will lose all the nice features. The warranty is void if this seal is broken. Not for production use.

The reason is that I'm sure that also in the future there will be special cases where balloons simply is not smart enough out-of-the-box. There could be a special process or a feature or what not running/watching/tracing/enabled exactly on certain core, for instance. Or it might be that (once again) balloons would not be smart enough to fulfill a requirement as the precedence of its preferences is not configurable or something. And because there is no easy alternative for handling such special cases, it would be sad if affinity of single difficult workload would prevent using the balloons policy (and naturally the topology aware policy, too). For this kind of purposes I'd like to enable users to have this parameter in their sleave if everything else fails and there is no time to wait until balloons has proper support for their request.

But to be honest, I'm not leaning towards this direction without hesitation. I see it as a risk if anyone builds a production system that uses this option. I see it also as a risk that we might not get a feature request if someone can hack around their problem by using this feature. These are big risks. Maybe we could add to the list of warnings: "never use this parameter without telling the resource policy developers what they should implement to this policy so that you could stop using this parameter".

If you see better, I think we can postpone this feature and let @fmuyassarov work on P/E cores, which is definitely worth supporting. And adding that support immediately removes one reason to use preferCpus. :D

@askervin If you tell me that we need this, and that we can paint it with big enough "dontcha shoot yourself in the foot" warnings, then I trust your judgement and am fine with it, too.

@kad
Copy link
Collaborator

kad commented Jul 12, 2024

Let's "sleep over" with that configuration option. I agree that giving exact cpuset to the user might be usable in some rare scenarios, as well, it would lead to many of corner cases: e.g. multiple balloons that have overlapping cpusets, or in some cpus being allocated to other ballons in normal policy way if this balloon can be shrink to 0, etc,etc...
What we can use as a potential "middle ground" is the option for balloon that would specify multiple cpusets in priority order, so normal cpu allocator would be using those more as hint rather than explicit list... but again, that might cause many other corner cases. So, let's get back to "drawing board" with this for near future?

@klihub
Copy link
Collaborator

klihub commented Jul 12, 2024

Let's "sleep over" with that configuration option. I agree that giving exact cpuset to the user might be usable in some rare scenarios, as well, it would lead to many of corner cases: e.g. multiple balloons that have overlapping cpusets, or in some cpus being allocated to other ballons in normal policy way if this balloon can be shrink to 0, etc,etc... What we can use as a potential "middle ground" is the option for balloon that would specify multiple cpusets in priority order, so normal cpu allocator would be using those more as hint rather than explicit list... but again, that might cause many other corner cases. So, let's get back to "drawing board" with this for near future?

I am totally on board with that, if that is good enough for @askervin.

@klihub
Copy link
Collaborator

klihub commented Jul 16, 2024

Let's mark this as a draft, while we sleep/sit on it.

@klihub klihub marked this pull request as draft July 16, 2024 10:29
@NoamNakash
Copy link

But to be honest, I'm not leaning towards this direction without hesitation. I see it as a risk if anyone builds a production system that uses this option. I see it also as a risk that we might not get a feature request if someone can hack around their problem by using this feature. These are big risks. Maybe we could add to the list of warnings: "never use this parameter without telling the resource policy developers what they should implement to this policy so that you could stop using this parameter".

Hi, I have a usecase, where we limit certain non-k8s pod workloads on our clusters (systemd services like kube-api, kube-scheduler and other "infra" services) to only use a certain CPU set. the goal is to have "infra" and "workload" completely separate using balloons, and I don't see a way to create that separation without this feature. Would this be something balloon policy is able to do? or do you have an idea on a feature that is less dangerous that would allow us to do this (I don't mind coding it, but I don't have any other idea on how to solve it)?

optionally, would an "excludeCPUs" option to list CPUs to exclude from balloon scheduling for host workloads that are not pods

tl;dr - we have cpu cores where we don't want workloads running (generally would prefer allowing certain workloads to run and disallowing others)

P.S
this is my first time commenting, please tell me if this comment doesn't fit here and where I should be asking this

@fmuyassarov
Copy link
Collaborator Author

@NoamNakash
Sorry for the radio silence, most of the people were and are still on vacation and thus there was no response.
Thank you for sharing your use case. I'm still leaning towards having this option available but I can't decide on my own since this is not my project and will wait for others to come back from the vacation and decide what do we do wit this feature and potential user requests like yours.

@askervin
Copy link
Collaborator

askervin commented Aug 23, 2024

@NoamNakash, sorry about the delay. After quite a few discussions we ended up with a decision not to implement this feature, at least not at this point. I will elaborate this in a separate comment.

For your particular use case, that is separating non-k8s processes on dedicated CPUs, there is an option called availableResources. It definitely was not easy to find, as it was not documented among other balloons options, so I added it in PR #355. This option is not a perfect match, as it lists CPUs available for the policy instead of CPUs not available to the policy.

Do you still think it works for you?

@askervin
Copy link
Collaborator

We have decided not to implement this feature for now.

The reason is that preferring explicitly listed CPUs invites users to misuse this policy in non-scalable and error prone ways that will lead into suboptimal CPU affinity. This option also breaks our design principle by turning the policy configuration from descriptive what kind of CPUs are needed into imperative which CPUs are needed.

This said, we acknowledge that the policy will never offer all possible options to tell what kind of CPUs are needed. If you find a missing option, we invite you to share your use case and possibly propose options that would help solving that.

Until the missing option is not integrated, or you wish to use the balloons or the topology-aware policy to manage your Kubernetes workloads except for few special Kubernetes workloads that you want to run on explicitly listed CPUs, we advice you to use the following workaround.

  1. Do not allow the policy to manage the explicitly listed CPUs. Exclude those CPU from the availableResources in the policy configuration.
  2. Annotate your special workloads so that the policy will not touch their CPU or memory pinning. Use *.preserve.* annotations that are honored by both balloons and topology-aware policies.
  3. Use a NRI plugin that will set the CPU affinity of your special workloads into this special set of CPUs. You can write your own NRI plugin for this, or you can use the memory-qos plugin to pass through cpuset.cpus values among other unified annotations directly to cgroups v2. (Using the memory-qos plugin will not work for cgroups v1.)

@askervin askervin closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants