19 Aug 10:02

799d8f2

v0.10.1 Latest

Latest

This is a new minor release of NRI Reference Plugins. It updates dependencies, enables RDT 'discovery mode', brings a few documentation improvements, and fixes a startup failure on machines with an asymmetric NUMA distance matrix.

What's Changed

build(deps): bump golang.org/x/oauth2 from 0.21.0 to 0.27.0 by @dependabot[bot] in #548
docs: update documentation for RDT monitoring/metrics. by @klihub in #549
rdt,resmgr: allow running RDT in discovery mode. by @klihub in #551
docs: update balloons debugging guidance and add examples by @askervin in #552
go.mod,Makefile: bump golang to latest 1.24.x. by @klihub in #555
sysfs: patch up asymmetric NUMA distances. by @klihub in #554

Full Changelog: v0.10.0...v0.10.1

Contributors

askervin, klihub, and dependabot

Assets 9

02 Jul 14:30

github-actions

v0.10.0

7094685

v0.10.0

This new release of NRI Reference Plugins brings a new NRI plugin, new features in resource policy plugins, a number of bug fixes, end-to-end tests and few use cases in documentation.

What's New

Balloons Policy

Composite balloons enables allocating a diverse set of CPUs for containers with complex CPU requirements. For example, "allocate an equal number of CPUs from both NUMA nodes on CPU socket 0". This allocation enables efficient parallelism inside an AI inference engine container that runs inference on CPU, and still isolate inference engines from each other.
```
balloonTypes:
- name: balance-pkg0-nodes
  components:
  - balloonType: node0
  - balloonType: node1
- name: node0
  preferCloseToDevices:
  - /sys/devices/system/node/node0
- name: node1
  preferCloseToDevices:
  - /sys/devices/system/node/node1
```
Documentation includes recipes for preventing creation of certain containers on a worker node, and resetting CPU and memory pinning of all containers in a cluster.

Topology Aware Policy

Pick CPU and Memory by Topology Hints Normally topology hints are only used to pick the assigned pool for a workload. Once a pool is selected the available resources within the pool are considered equally good for satisfying the topology hints. When the policy is allocating exclusive CPUs and picking pinned memory for the workload, only other potential criteria and attributes are considered for picking the individual resources.

When multiple devices are allocated to a single container, it is possible that this default assumption of all resources within the pool being topologically equal is not true. If a container is allocated misaligned devices, IOW devices with different memory or CPU locality. To overcome this, containers can now be annotated to prefer hint based selection and pinning of CPU and memory resources using the pick-resources-by-hints.resource-policy.nri.io annotation. For example,
```
apiVersion: v1
kind: Pod
metadata:
  name: data-pump
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-net1
    prefer-isolated-cpus.resource-policy.nri.io/container.ctr0: "true"
    pick-resources-by-hints.resource-policy.nri.io/container.ctr0: "true"
spec:
  containers:
  - name: ctr0
    image: dpdk-pump
    imagePullPolicy: Always
    resources:
      requests:
        cpu: 2
        memory: 100M
        vendor.com/sriov_netdevice_A: '1'
        vendor.com/sriov_netdevice_B: '1'
      limits:
        vendor.com/sriov_netdevice_A: '1'
        vendor.com/sriov_netdevice_B: '1'
        cpu: 2
        memory: 100M
```
When annotated like that, the policy will try to pick one exclusive isolated CPU with locality to one device and another with locality to the other. It will also try to pick and pin to memory aligned with these devices.

Common Policy Improvements

These are improvements to common infrastructure and as such are available for the balloons and topology-aware policy plugins, as well as for the wireframe template policy plugin.

Cache Allocation

Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers' processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container. To enable cache control use the control.rdt.enable option which defaults to false.

Plugins can be configured to assign containers by default to a cache class named after the Pod QoS class of the container: one of BestEffort, Burstable, and Guaranteed. The configuration setting controlling this behavior is control.rdt.usagePodQoSAsDefaultClass and it defaults to false.

Additionally, containers can be explicitly annotated to be assigned to a class. Use the rdtclass.resource-policy.nri.io annotation key for this. For instance

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  annotations:
    rdtclass.resource-policy.nri.io/pod: poddefaultclass
    rdtclass.resource-policy.nri.io/container.special-container: specialclass
...

This will assign the container named special-container within the pod to the specialclass RDT class and any other container within the pod to the poddefaultclass RDT class. Effectively these containers' processes will be assigned to the RDT CLOSes corresponding to those classes.

Cache Class/Partitioning Configuration

RDT configuration is supplied as part of thecontrol.rdt configuration block. Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100% of cache lines to BestEffort, Burstable and Guaranteed Pod QoS class containers correspondingly:

config:
  control:
    rdt:
      enable: true
      usePodQoSAsDefaultClass: true
      options:
        l2:
          optional: true
        l3:
          optional: true
        mb:
          optional: true
      partitions:
        fullCache:
          l2Allocation:
            all:
              unified: 100%
          l3Allocation:
            all:
              unified: 100%
          classes:
            BestEffort:
              l2Allocation:
                all:
                  unified: 33%
              l3Allocation:
                all:
                  unified: 33%
            Burstable:
              l2Allocation:
                all:
                  unified: 66%
              l3Allocation:
                all:
                  unified: 66%
            Guaranteed:
              l2Allocation:
                all:
                  unified: 100%
              l3Allocation:
                all:
                  unified: 100%

Cache Allocation Prerequisites

Note that for cache allocation control to work, you must have

a hardware platform which supports cache allocation
resctrlfs pseudofilesystem enabled in your kernel, and loaded if it is a module
the resctrlfs filesystem mounted (possibly with extra options for your platform)

New plugin: nri-memory-policy

The NRI memory policy plugin sets Linux memory policy for new containers.
The memory policy plugin, for instance, advises kernel to interleave memory pages of a container on all NUMA nodes in the system, or on all NUMA nodes near the same socket where container's allowed CPUs are located.
The plugin works as a stand-alone plugin, and it works together with NRI resource policy plugins and Kubernetes resource managers. It recognizes CPU and memory pinning set by resource management components. The memory policy plugin should be after the resource policy plugins in the NRI plugins chain.
Memory policy for a container is defined in pod annotations.
At the time of NRI plugins release, latest released containerd or CRI-O do not support NRI Linux memory policy adjustments, or NRI container command line adjustments for a workaround. Using this plugin requires a container runtime that is built with NRI version including command line adjustments. (NRI version > 0.9.0)

What's Changed

resmgr,config: allow configuring cache allocation via goresctrl. by @klihub in #541
resmgr: expose RDT metrics. by @klihub in #543
Balloons with components by @askervin in #526
topology-aware: try picking resources by hints first by @klihub in #545
memory-policy: NRI plugin for setting memory policy by @askervin in #517
mempolicy: go interface for set_mempolicy and get_mempolicy syscalls by @askervin in #514
mpolset: get/set memory policy and exec a command by @askervin in #515
topology-aware: fix format of container-exported memsets. by @klihub in #532
resmgr: update container-exported resource data. by @klihub in #537
sysfs: add a helper for gathering whatever IDs related to CPUs by @askervin in #513
sysfs: fix CPU.GetCaches() to not return empty slice. by @klihub in #533
sysfs: export CPUFreq.{Min,Max}. by @klihub in #534
helm: add Chart for memory-policy deployment by @askervin in #519
go.{mod,sum}: use new goresctrl tag v0.9.0. by @klihub in #544
Drop tools.go in favor of native tool directive support in go 1.24 by @fmuyassarov in #535
golang: bump go version to 1.24[.3]. by @klihub in #528

Full Changelog: v0.9.4...v0.10.0

Contributors

askervin, klihub, and fmuyassarov

Assets 9

14 Apr 07:43

github-actions

v0.9.4

fba3364

v0.9.4

This is a new minor release of NRI Reference Plugins. It fixes incorrect caching of Pod Resource API query results which in some cases could result in incorrect generated topology hints.

What's Changed

resmgr: purge cached pod resource list upon pod stop/removal. by @klihub in #507
github: explicitly ensure contents-only copying by @fmuyassarov in #508

Full Changelog: v0.9.3...v0.9.4

Contributors

klihub and fmuyassarov

Assets 8

02 Apr 11:28

github-actions

v0.9.3

3968d09

v0.9.3

This is a new minor release of NRI Reference Plugins. It brings several new features, a number of bug fixes, end-to-end tests, and test coverage.

What's New

Balloons Policy

Cluster level visibility to CPU affinity. Configuration option agent.nodeResourceTopology: true enables observing balloons as zones in NodeResourceTopology custom resources. Furthermore, if showContainersInNrt: true is defined, information on each container, including CPU affinity, will be shown as a subzone of its balloon.

Example configuration:
```
showContainersInNrt: true
agent:
  nodeResourceTopology: true
```
Enables listing balloons and their cpusets on K8SNODE with
```
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="balloon") | {"balloon":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
```
and containers with their cpusets on the same node:
```
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="allocation for container") | {"container":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
```
System load balancing. Even if two containers run on disjoint sets of logical CPUs, they may nevertheless affect each others performance. This happens, for instance, if two memory-intensive containers share the same level 2 cache, or if they are compute-intenstive, use the same compute resources of a physical CPU core, and run on two hyperthreads of the same core.

New system load balancing in the balloons policy is based on classifying loads generated containers using new loadClasses configuration option. Based on the load classes associated with balloonTypes using loads, the policy allocates CPUs to new and existing balloons so that it avoids overloading level 2 caches or physical CPU cores.

Example: policy prefers selecting CPUs for all "inference engine" and "computational-fluid-dynamics" balloons within separate level 2 cache blocks to prevent cache trashing by any two of containers in these balloons.
```
balloonTypes:
- name: inference-engine
  loads:
  - memory-intensive
  ...
- name: computational-fluid-dynamics
  loads:
  - memory-intensive
...
loadClasses:
- name: memory-intensive
  level: l2cache
```

Topology Aware Policy

Improved topology hint control: the topologyhints.resource-policy.nri.io annotation key can be used to enable or disable topology hint generation for one or more containers altogether, or selectively for mounts, devices, and pod resources types.
For example:

metadata:
  annotations:
    # disable topology hint generation for all containers by default
    topologyhints.resource-policy.nri.io/pod: none
    # disable other than mount-based hints for the 'diskwriter' container
    topologyhints.resource-policy.nri.io/container.diskwriter: mounts
    # disable other than device-based hints for the 'videoencoder' container
    topologyhints.resource-policy.nri.io/container.videoencoder: devices
    # disable other than pod resource-based hints for the 'dpdk' container
    topologyhints.resource-policy.nri.io/container.dpdk: pod-resources
    # enable device and pod resource-based hints for 'networkpump' container
    topologyhints.resource-policy.nri.io/container.networkpump: devices,pod-resources

It is also possible to enable and disable topology hint generation based on mount or device path, using allow and deny lists. See the updated documentation for more details.

relaxed system topology restrictions: the policy should not refuse to start up if a NUMA node is shared by more than one pool at the same topology hierarchy level. In particular, a single NUMA node shared by all sockets should not prevent startup any more.
improved Burstable QoS class container handling: the policy now allocates memory to burstable QoS class containers based on memory request estimates. This should lower the probability for unexpected allocation failures when burstable containers are used to allocate a node close to full capacity.
better global shared allocation preference: a preferSharedCPUs: true global configuration option now applies to all containers, unless they are annotated to opt out using the prefer-shared-cpus.resource-policy.nri.io annotation.

Common Policy Improvements

Container cache and memory bandwidth allocation enables class-based management of system L2 and L3 cache and memory bandwidth. They are modeled as class-based uncountable and shareable resources. Containers can be assigned to predefined classes-of-service (CLOS), or RDT classes for short. Each class defines a specific configuration for cache and memory bandwidth allocation, which is applied to all containers within that class. The assigned container class is resolved and mapped to a CLOS in the runtime using goresctrl library. RDT control must be enabled in the runtime and the assigned classes must be defined in the runtime configuration. Otherwise the runtime might fail to create containers that are assigned to an RDT class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.

A container can be assigned either to an RDT class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the rdtclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:

apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy # or 'BalloonsPolicy' for the 'balloons' policy
metadata:
  name: default
spec:
  ...
  control:
    rdt:
      enable: true
      usePodQoSAsDefaultClass: true

RDT class assignment is also possible using annotations. For instance, to assign the packetpump container to the highprio and the scheduler container to the midprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:

metadata:
  annotations:
    rdtclass.resource-policy.nri.io/container.packetpump: highprio
    rdtclass.resource-policy.nri.io/container.scheduler: midprio

Container block I/O prioritization allows class-based control block I/O prioritization and throttling. Containers can be assigned to predefined block I/O classes. Each class defines a specific configuration of prioritization and throttling parameters which are applied to all containers assigned to the class. The assigned container class is resolved and mapped to actual parameters in the runtime using goresctrl library. Block I/O control must be enabled in the runtime and the classes must be defined in the runtime configuration. Otherwise the runtime fails to create containers that are assigned to a block I/O class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.

A container can be assigned either to a Block I/O class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the blockioclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:

apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy
metadata:
  name: default
spec:
  ...
  control:
    blockio:
      enable: true
      usePodQoSAsDefaultClass: true

Class assignment is also possible using annotations. For instance, to assign the database container to the highprio and
the logger container to the lowprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:

metadata:
  annotations:
    blockioclass.resource-policy.nri.io/container.database: highprio
    blockioclass.resource-policy.nri.io/container.logger: lowprio

What's Changed

balloons: do not require minFreq and maxFreq in CPU classes by @askervin in #455
balloons: expose balloons and optionally containers with affinity in NRT by @askervin in #469
balloons: introduce loadClasses for avoiding unwanted overloading in critical locations by @askervin in #493
topology-aware: exclude isolated CPUs from policy-picked reserved cpusets. by @klihub in #474
topology-aware: rework building the topology pool tree. by @klihub in #477
topology-aware: allocate burstable container memory by requests. by @klihub in #491
topology-aware: better semantics for globally configured shared CPU preference. by @klihub in #498
topology-aware: more consistent setup error handling. by @klihub in #502
memtierd: allow overriding go version for image build. by @klihub in #456
resmgr: improve annotated topology hint control. by @klihub in #499
resmgr: eliminate extra container state 'overlay'. by @klihub in https://github.com/containers/nr...

Contributors

askervin, klihub, and 2 other contributors

Assets 8

18 Dec 08:15

github-actions

v0.8.0

00c77b6

v0.8.0

This is a new major release of NRI Reference Plugins. It brings several new features, a number of bug fixes, improvements to the build system, to CI, end-to-end tests, and test coverage.

What's New

Balloons Policy

New preserve policy option enables matching containers whose CPU
and memory affinity must not be modified by the resource policy.

This enables allowing selected containers to access all CPUs and
memories. For example, allow pcm-sensor-server
to access MSRs on every CPU for low-level metrics:
```
preserve:
  matchExpressions:
    - key: pod/labels/app.kubernetes.io/name
      operator: In
      values:
        - pcm-sensor-server
```
Earlier this required cpu.preserve.resource-policy.nri.io and
memory.preserve.resource-policy.nri.io pod annotations.

New freqGovernor CPU class option enables setting CPU frequency
governor based on the CPU class of a balloon. Example:

balloonTypes:
- name: powersaving
  cpuClass: mypowersave
control:
  cpu:
    classes:
      mypowersave:
        freqGovernor: powersave

New memoryTypes balloon type option specifies required memory
types when setting memory affinity. For example, containers in
high-memory-bandwidth balloons will use only HBM when configured as:
```
balloonTypes:
- name: high-memory-bandwidth
  memoryTypes:
  - HBM
```
Support memory-type.resource-policy.nri.io pod annotation for
setting memory affinity into closest HBM, DRAM, PMEM, or any
combination. This annotation is a pod level override to the
memoryTypes balloon type option.
L2-cache group aware CPU allocation and sharing. For example,
containers in a balloon can be allowed to burst on idle
(unallocated) CPUs that share the same L2 cache as CPUs allocated to
the balloon.
```
balloonTypes:
- name: l2burst
  shareIdleCPUsInSame: l2cache
```
Override to pinMemory policy option in balloon type level. Enables
setting memory affinity of containers only in certain balloons while
others are not set, and vice versa. Example:
```
pinMemory: false
balloonTypes:
- name: latency-sensitive
  pinMemory: true
  preferIsolCpus: true
  preferNewBalloons: true
```
New default configuration runs Guaranteed containers on dedicated
CPUs while BestEffort and Burstable containers are allowed to share
remaining CPUs on the same socket, but not cross socket boundaries.
Balance BestEffort containers between balloons with equal amount of
available resources.
Smaller risk for OOMs on pinMemory: true, as memory affinity was
refactored to use smart libmem.

Topology Aware Policy

The Topology Aware policy can now export prometheus metrics per topology zone. Exported metrics include pool CPU set and memory set, shared CPU subpool total capacity, allocations and available capacity, memory total capacity, allocations and available amount, number of assigned containers and containers in the shared subpool.

To enable exporting these metrics, make sure that you are running with the latest policy configuration custom resource definition and you have policy included in the spec/instrumentation/metrics/enabled slice, like this:

...
spec:
...
  instrumentation:
  ...
    metrics:
      enabled:
      - policy
...

The Topology Aware policy can now use data from the kubelet's Pod Resource API to generate extra topology hints for resource allocation and alignment. These hints are disabled in the default configuration installed by Helm charts. To enable them, make sure that you are running with the latest policy configuration custom resource definition and you have spec/agent/podResourceAPI set to true in the configuration, like this:

spec:
  agent:
    ...
    podResourceAPI: true
...

Support memory-type.resource-policy.nri.io pod annotation for
setting memory affinity into closest HBM, DRAM or PMEM, or any
combination.

What's Changed

Balloons Policy Fixes and Improvements

balloons: add "preserve" option to match containers whose pinning must not be modified by @askervin in #368
balloons: add support for cpu frequency governor tuning by @fmuyassarov in #374
balloons: set frequency scaling governor only when requested by @fmuyassarov in #379
balloons: improve handling of containers with no CPU requests by @askervin in #386
balloons: add debug logging to selecting a balloon type by @askervin in #396
balloons: support for L2 cache cluster allocation by @askervin in #384
balloons: add memoryTypes to balloon types by @askervin in #395
Add balloon type specific pinMemory option by @askervin in #451

Topology Aware Policy Fixes and Improvements

metrics: add topology-aware policy metrics collection. by @klihub in #406
topology-aware: correctly reconfigure implicit affinities for configuration changes. by @klihub in #394
fixes: copy assigned memory zone in grant clone. by @klihub in #413

New Policy Agnostic Metrics, Common De Facto Exporters

metrics: cleanup metrics registration, collection and gathering. by @klihub in #403
metrics: add de-facto standard collectors. by @klihub in #404
metrics: simplify policy/backend metrics collection interface. by @klihub in #408
metrics: add policy system collector. by @klihub in #405

Topology Hints Based on Pod Resource API

podresapi: agent,config,helm: make agent runtime configurable. by @klihub in #418
podresapi: resmgr,agent: generate topology hints from Pod Resource API. by @klihub in #419
podresapi: topology-aware: use Pod Resource API hints if present. by @klihub in #420
agent,resmgr: merge PodResources{List,Map}, cache last List() result. by @klihub in #423

Common Resource Management Fixes and Improvements

resmgr: fix "qosclass" in policy expressions by @askervin in #387
resmgr,agent: propagate startup config error back to CR. by @klihub in #416
libmem: implement policy-agnostic memory allocation/accounting. by @klihub in #332
libmem: typo and thinko fixes. by @klihub in #381
sysfs: enable faking CPU cache configurations using OVERRIDE_SYS_CACHES by @askervin in #383
cpuallocator, plugins: handle priority as an option. by @klihub in #414
Fix typos in expression code doc and matchExpression yamls by @askervin in #370

Helm Chart and Configuration Fixes and Improvements

helm: enable prometheus autodiscovery by @klihub in #393
helm: new balloons default configuration by @askervin in #391
apis/config: use consistent assignment in +kubebuilder:validation tags. by @klihub in #397
sample-configs: fix a copy-pasted comment thinko. by @klihub in #402

End-to-end Testing Fixes and Improvements

e2e: pull and save runtime logs after each test. by @klihub in #367
e2e: adjust metrics test for updated PrettyName(). by @klihub in #366
e2e: switch default test distro to fedora/40-cloud-base. by @klihub in #375
e2e: fix provisioning for Ubuntu cloud image. by @klihub in #377
e2e: enable vagrant debugging. by @klihub in #376
e2e: adjust $VM_HOSTNAME for policy node config usage. by @klihub in #378
e2e: skip long running tests by default. by @klihub in #373
e2e: fix command filenames in test output directories by @askervin in #390
e2e: containerd 2.0.0. provisioning fixup. by @klihub in #400
e2e/balloons: remove unknown/unused helm-launch argument. by @klihub in #407

Build Environment Fixes and Improvements

build: enable building debug binaries and images by @askervin in #388
build: update controller-tools to v0.16.5. by @klihub in #398
build: enable race-detector in DEBUG=1 builds. by @klihub in #409
build: enable race-detector in image build, too. by @klihub in #410
d...

Contributors

askervin, klihub, and fmuyassarov

Assets 8

23 Sep 08:21

github-actions

v0.7.1

14f103d

v0.7.1

This release of NRI Reference Plugins brings new features, a few bug fixes, and updates to the documentation.

Highlights

balloons policy now supports assigning kernel-isolated CPU cores to balloons when available. To prefer isolated CPU cores for a balloon, use the new preferIsolCpus boolean configuration option. For instance,

balloonTypes:
  - name: high-prio-physical-core
    minCPUs: 2
    maxCPUs: 2
    preferNewBalloons: true
    preferIsolCpus: true
    hideHyperthreads: true
...

balloons policy now supports assigning performance optimized or energy efficient CPU cores to balloons when available. For instance, to define a balloon with energy efficient core preference and another one with performance core preference use the new preferCoreType configuration option like this:

balloonTypes:
  - name: low-prio
    namespaces:
      - logging
      - monitoring
    preferCoreType: efficient
...
  - name: high-prio
    preferCoreType: performance
...

Topology-aware policy now allocates CPU cores in clusters of shared last-level cache. Whenever this provides different grouping than the rest of the topology, for instance hyperthreads, the CPU allocator now divides cores into groups defined by shared last-level cache. The topology-aware policy tries to allocate as few LLC groups to a container as possible and tries to avoid sharing an LLC group by multiple containers.

What's New

balloons: add support for isolated cpus. by @fmuyassarov in #344
balloons: add support for power efficient & high performance cores by @fmuyassarov in #354
cpuallocator: implement clustered allocation based on cache groups. by @klihub in #343

What Changed

Resource assignment policies should now try harder to detect when a new container is a restarted instance of an existing container which has just exited or crashed. This should fix problems where a crashing container could not be restarted on an nearly fully allocated node.

deps: bump NRT dependencies to v0.1.2. by @fmuyassarov in #348
topology-aware: add missing SingleThreadForCPUs() to mockSysfs. by @klihub in #349
balloons: add support for isolated cpus. by @fmuyassarov in #344
cpuallocator: implement clustered allocation based on cache groups. by @klihub in #343
fixes: fix host-wait-vm-ssh-server, improve vm-reboot. by @klihub in #350
fix: clean up plugin at the beginning/end of tests. by @klihub in #351
doc: add availableResources in the balloons policy documentation by @askervin in #355
build: allow building a single plugin image. by @klihub in #357
balloons: add support for power efficient & high performance cores by @fmuyassarov in #354
e2e: fix cni_plugin=bridge in provisioning a vm by @askervin in #359
e2e: bridge CNI setup fixes for Fedora/containerd. by @klihub in #361
e2e: use bridge CNI plugin by default. by @klihub in #362
CI: verify in smaller steps, verify binary builds. by @klihub in #364
resmgr: lifecycle overlap detection and workaround. by @klihub in #358

Full Changelog: v0.7.0...v0.7.1

Contributors

askervin, klihub, and fmuyassarov

Assets 8

03 Jul 07:30

github-actions

v0.7.0

17e1028

v0.7.0

This release of NRI Reference Plugins brings in new features and important bug fixes.

Highlights

Topology-aware and balloons resource policies now support soft-disabling of hyperthreads per container. This improves the performance of some classes of workloads. Both policies support new pod annotation:
```
hide-hyperthreads.resource-policy.nri.io/container.<CONTAINER-NAME>: "true"
```
and the balloons policy has new balloon-type option hideHyperthreads that soft-disables hyperthreads on all containers assigned to a balloon of this type.

The topology-aware policy supports pinning containers to high-bandwidth memory (HBM), or both HBM and DRAM, when pods are annotated with

memory-type.resource-policy.nri.io/container.<CONTAINER-NAME>: hbm
memory-type.resource-policy.nri.io/container.<CONTAINER-NAME>: hbm,dram

Automatic hardware topology hint generation has been fixed in the topology-aware policy. For instance, if a container uses a PCI device, the policy prefers pinning the container to CPUs and memory that are close to the device.

What's New

balloons: hideHyperthreads balloon type option and annotation by @askervin in #338
topology-aware: add support for hide-hyperthreads annotation. by @askervin in #331

What Changed

topology-aware: don't ignore HBM memory nodes without close CPUs. by @klihub in #329
topology-aware: relax NUMA node topology checks. by @klihub in #336
resmgr: exit when ttrpc connection goes down. by @klihub in #319
cpuallocator: don't filter based on single CoreKind. by @klihub in #345
sysfs,cpuallocator: fix CPU cluster discovery. by @klihub in #337
sysfs: survive NUMA nodes without memory. by @klihub in #339
sysfs: allow non-uniform thread count. by @klihub in #340
helm: flip podPriorityClassNodeCritical to true. by @klihub in #312
config-manager: allow configuring NRI timeouts. by @klihub in #318

New Contributors

@elezar made their first contribution in #328

Full Changelog: v0.5.0...v0.7.0

Contributors

askervin, elezar, and klihub

Assets 8

29 Mar 15:54

github-actions

v0.5.1

194c433

v0.5.1

This release of the NRI Reference Plugins brings a few improvements to hardware topology detection and resource assignment.

What's New

cpuallocator: topology discovery fixes and improvements. by @klihub in #206
cpuallocator: add support for hybrid core discovery, preferred allocation. by @klihub in #295
topology-aware: configurable allocation priority by @klihub in #282
resmgr: enable opentelemetry tracing (span propagation) over the NRI ttrpc connection. by @klihub in #293

Updates, Fixes, and Other Improvements

sysfs: dump system discovery results in a more predictable order. by @klihub in #294
github: package and publish interim unstable Helm charts from the main and release branches by @marquiz, @klihub in #303

Full Changelog: v0.4.1...v0.5.1

Contributors

klihub and marquiz

Assets 8

16 Mar 14:48

github-actions

v0.4.1

659d042

v0.4.1

This major release of the NRI Reference Plugins brings new features to a few plugins, numerous smaller other improvements, and several bug fixes.

Highlights

balloons policy: add groupBy balloon type option
Group containers into same balloon instances if their groupBy expressions evaluate to the same group. For example, the following expression prefers assigning all containers in the pod to a balloon that already contains containers from the same namespace and have the same nsballoon pod label value

  ...
  balloonTypes:
    - name: my-pods
      groupBy: ${pod/namespace}-${pod/labels/nsballoon}
  ...

If there is no such a balloon, or if such instances do not have enough CPUs, then finding a suitable balloon continues as before: assign to some other existing balloon or create a new balloon if that is preferred.

balloons policy: add balloon matchExpressions option
Assign containers to balloon instances by balloon match expressions, similar to affinity expression of the topology-aware policy. Expressions are evaluated for containers which are not explicitly assigned to any balloon by an annotation. If an expression matches a container, the container is assigned to an instance of the corresponding balloon. For instance, the following matchExpression will grab all containers with matching pod names to the associated balloon:

  ...
  balloonTypes:
    - name: my-pods
      matchExpressions:
        - key: pod/name
          operator: MatchesAny
          values: [ myPod*, nginx* ]
  ...

balloons & topology-aware policies: allow preserving existing resource assignments
Containers and pods can now be annotated to prevent the policy from touching their existing CPU or memory pinning.

What's New

balloons: implement groupBy option by @askervin in #278
balloons: allow assigning containers to balloons by runtime-evaluated expressions by @klihub in #260
balloons policy: more regular built-in balloons, treat them much like user-defined ones
Built-in reserved and default balloon types are no longer special cases. They can be configured with the same parameters as user-defined balloons.
balloons: support preserving CPU and memory pinning by @askervin in #257
topology-aware: support preserving CPU and memory pinning by @askervin in #258
feat(helm): Introduce priorityClassName system-node-critical by @ffuerste in #220
helm: allow setting NRI plugin index via values by @klihub in #227

Updates, Fixes And Other Improvements

balloons: fix the order of assigning containers into balloons by @askervin in #273
balloons: fix logged balloon name by @klihub in #259
balloons & topology-aware policies: better handling of UpdateContainer[Resources] requests
Fill in missing bits in partial container resource updates from the current resource assignment. Filter out redundant resource updates without invoking the policy.
memtierd: update the nri-memtierd plugin to use memtierd v0.1.1 by @askervin in #287
operator: ensure to kustomize operator manifests before local deployment by @fmuyassarov in #240
resmgr: better expression validation, cleaner key resolution by @klihub in #256
resmgr: inject mount before container state update by @klihub in #223
resmgr: log containers by pretty name during startup by @klihub in #245
instrumentation: fix resource creation, use parent-based sampler by @klihub in #233
instrumentation: allow proper reconfiguration of tracing by @klihub in #234
cache: support annotations to preserve CPU and memory pinning by @askervin in #249
cache, resmgr: expose key evaluation, implement key substitution by @klihub in #277
cache: fix generated pod scope and simple affinity expressions by @klihub in #285
cache: store creation time of pod and containers cache objects by @askervin in #272
topology-aware: log resource operations at info level by @klihub in #252
doc: clarify selecting balloon type by @askervin in #281
doc: more consistent terminology in balloons documentation by @askervin in #269
fixes: rename default config group label, support/fall back to deprecated labels. by @klihub in #231

New Contributors

@ffuerste made their first contribution in #220

Full List of Merged PRs

For a full list of changes see v0.3.2...v0.4.1

Contributors

askervin, klihub, and 2 other contributors

Assets 8

29 Dec 10:28

github-actions

v0.3.2

7a0bf5e

v0.3.2

This patch release fixes image versioning for the operator.

What's New

operator: point containerImage tag to the latest release (v0.3.2) by @fmuyassarov in #218

Contributors

fmuyassarov

Assets 8

Releases: containers/nri-plugins

v0.10.1

What's Changed

Contributors

Uh oh!

v0.10.0

What's New

Balloons Policy

Topology Aware Policy

Common Policy Improvements

New plugin: nri-memory-policy

What's Changed

Contributors

Uh oh!

v0.9.4

What's Changed

Contributors

Uh oh!

v0.9.3

What's New

Balloons Policy

Topology Aware Policy

Common Policy Improvements

What's Changed

Contributors

Uh oh!

v0.8.0

What's New

Balloons Policy

Topology Aware Policy

What's Changed

Balloons Policy Fixes and Improvements

Topology Aware Policy Fixes and Improvements

New Policy Agnostic Metrics, Common De Facto Exporters

Topology Hints Based on Pod Resource API

Common Resource Management Fixes and Improvements

Helm Chart and Configuration Fixes and Improvements

End-to-end Testing Fixes and Improvements

Build Environment Fixes and Improvements

Contributors

Uh oh!

v0.7.1

Highlights

What's New

What Changed

Contributors

Uh oh!

v0.7.0

Highlights

What's New

What Changed

New Contributors

Contributors

Uh oh!

v0.5.1

What's New

Updates, Fixes, and Other Improvements

Contributors

Uh oh!

v0.4.1

Highlights

What's New

Updates, Fixes And Other Improvements

New Contributors

Full List of Merged PRs

Contributors

Uh oh!

v0.3.2

What's New

Contributors

Uh oh!