Skip to content

Conversation

@zstas
Copy link

@zstas zstas commented Sep 22, 2025

Co-developed-by: Jarrod Baumann [email protected]

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

<!-- Enter the release note text here if needed or remove this section! -->

sayboras and others added 30 commits September 11, 2025 08:22
The current accepted values for KPR are true and false only, hence other
value will break helm install even if upgradeCompatibility is set to
older versions (e.g. 1.8, 1.9, 1.12).

Signed-off-by: Tam Mach <[email protected]>
Documents the request-timeout ingress annotation.

Signed-off-by: iofq <[email protected]>
…Blocks

cilium#41616 found a data race in setting Block.predict. Block was never
meant to be changed during reachability analysis; all bookkeeping must be done
out-of-band (like the live bitmap).

This test reliably triggers the race (and others like it) when run with -race.

Signed-off-by: Timo Beckers <[email protected]>
Block should not be modified as it's shared between users of (copies of) a
CollectionSpec. Store prediction results out-of-band instead.

Changing visitBlock() to take a Block instead of a *Block was considered
in addition, but proved too costly due to the amount of copying required.

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/bpf/analyze
cpu: AMD Ryzen 7 3700X 8-Core Processor
                 │   old.txt   │              new.txt              │
                 │   sec/op    │   sec/op     vs base              │
ComputeBlocks-16   642.5µ ± 1%   645.0µ ± 2%       ~ (p=0.699 n=6)
Reachability-16    31.39µ ± 4%   33.83µ ± 2%  +7.78% (p=0.002 n=6)
geomean            142.0µ        147.7µ       +4.02%

                 │   old.txt    │              new.txt               │
                 │     B/op     │     B/op      vs base              │
ComputeBlocks-16   372.6Ki ± 0%   372.6Ki ± 0%  +0.01% (p=0.002 n=6)
Reachability-16    8.172Ki ± 0%   8.328Ki ± 0%  +1.91% (p=0.002 n=6)
geomean            55.18Ki        55.71Ki       +0.96%

                 │   old.txt   │               new.txt                │
                 │  allocs/op  │  allocs/op   vs base                 │
ComputeBlocks-16   8.054k ± 0%   8.054k ± 0%        ~ (p=1.000 n=6) ¹
Reachability-16     3.000 ± 0%    4.000 ± 0%  +33.33% (p=0.002 n=6)
geomean             155.4         179.5       +15.47%
¹ all samples are equal

Signed-off-by: Timo Beckers <[email protected]>
This change introduces a new feature that allows for tracing IPv4 packets with an
embedded a trace ID in its IP option. Code changes include creating the
feature flag, parsing, and the BPF map to store the trace ID.

The following changes are included:

- A new feature-gate, `ip-tracing-option-type`, is added to enable and configure
the IP option type to be used for tracing.
- Helper functions are implemented to parse IPv4 options and extract the trace ID.
- Logic is added to save the parsed IPv4 options into a per-CPU array map, which
can then be used by other parts of the system. 

Signed-off-by: Ben Bigdelle <[email protected]>
This change introduces the ability to parse IP options in ingress BPF
programs. This is a prerequisite for implementing IP-based tracing on
the ingress path. If the feature is enabled, and a trace ID exists for a
packet, it is stored into the BPF map to be used in event messages.

- Implemented parsing logic for IP options in ingress programs.
- Store extracted IP options into per-CPU array map.

Signed-off-by: Ben Bigdelle <[email protected]>
This change refactors the drop notify tests to be version-aware. This
is a preparatory step to allow for the introduction of new fields to
the `DropNotify` struct in a backward-compatible manner.

The tests are updated to:

- Define separate test cases for different versions of the `DropNotify`
struct.

Signed-off-by: Ben Bigdelle <[email protected]>
This change extends the `DropNotify` struct to include the IP trace

ID.

The following changes are included:
- The `DropNotify` struct in the control plane is updated to include
the `IPTraceID` field.
- The BPF code is updated to check to see if there is a stored trace ID
in the BPF map and, if so, populating it.
- The `cilium-monitor` output is updated to display the IP trace ID when
present in a drop notify message.

Signed-off-by: Ben Bigdelle <[email protected]>
This change extends the `TraceNotify` struct to include the IP trace.

The following changes are included:
- The `TraceNotify` struct in the control plane is updated to include
the `IPTraceID` field.
- At the creation of a TraceNotify event, check to see if IP trace is
stored in the BPF map and populate it in the message if so.
- The `cilium-monitor` output is updated to display the IP trace ID when
present in a trace notify message.

Signed-off-by: Ben Bigdelle <[email protected]>
This change introduces the IPTraceID field to the Hubble protobuf. This allows
IP-based tracing information to be propagated and associated with flows
observed by Hubble.

The following changes are included:
- A new `IPTraceID` message type is defined in `flow.proto`, containing
the trace ID and the IP option type.
- The Hubble parser is updated to decode the IP trace ID from monitor
events (both drop and trace notifications) and populate the `ip_trace_id`
field in the resulting `Flow` message.
- The Hubble printer is updated to display the IP trace ID in the
output.

Signed-off-by: Ben Bigdelle <[email protected]>
This change introduces the ability to filter Hubble flows by IP trace ID directly
from the Hubble CLI.

The following changes are included:
- A new `--ip-trace-id` flag is added to the `hubble observe` command,
which can be specified multiple times to filter for multiple trace IDs.
- A new `IPTraceIDFilter` is implemented to perform the filtering logic based
onthe provided trace IDs.
- The `IPTraceIDFilter` is added to the list of default filters.
- The help text for the `hubble observe` command is updated to include the new flag.

Signed-off-by: Ben Bigdelle <[email protected]>
we stored the port name in the frontend mapping struct,
but we didn't add it to the frontend params.

if we have this type of lrp

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "lrp-addr"
spec:
  redirectFrontend:
    addressMatcher:
      ip: "169.254.169.254"
      toPorts:
        - port: "8080"
          name: "test"
          protocol: TCP
        - port: "8081"
          name: "test1"
          protocol: TCP
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        app: proxy
    toPorts:
      - port: "80"
        name: "test"
        protocol: TCP
      - port: "81"
        name: "test1"
        protocol: TCP
and pod

apiVersion: v1
kind: Pod
metadata:
  name: lrp-pod
  labels:
    app: proxy
spec:
  containers:
    - name: lrp-pod
      image: nginx
      ports:
        - containerPort: 80
          name: test
          protocol: TCP
        - containerPort: 81
          name: test1
          protocol: TCP
we will end up with
6    169.254.169.254:8080/TCP   LocalRedirect   1 => 10.244.1.75:80/TCP (active)
                                                2 => 10.244.1.75:81/TCP (active)
7    169.254.169.254:8081/TCP   LocalRedirect   1 => 10.244.1.75:80/TCP (active)
                                                2 => 10.244.1.75:81/TCP (active)

with this PR, we will get the correct backend

8    169.254.169.254:8080/TCP   LocalRedirect   1 => 10.244.1.30:80/TCP (active)
9    169.254.169.254:8081/TCP   LocalRedirect   1 => 10.244.1.30:81/TCP (active)

Signed-off-by: Liyi Huang <[email protected]>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
We already had logic to prune statically unreachable tail calls, tail
calls which were unreachable due to compile time macros. However, we
did not prune tail calls which were unreachable due to load time config.

This commit updates the existing logic to also prune tail calls which
are unreachable due to load time config. This should allow us to migrate
macros that control reachability of tail calls to load time config.

Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Timo Beckers <[email protected]>
Move the logic for pruning unused tail calls from collection.go to
a new file.

Signed-off-by: Dylan Reimerink <[email protected]>
The `LoadCollectionSpec` function was a wrapper around
`ebpf.LoadCollectionSpec` that additionally did the unused tail call
pruning. Now that this functionality has been moved into the
`LoadCollection` function, this wrapper is no longer needed.

Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
Fixes CI flake cilium#41550

With multiple NodeUpdate/NodeDelete events firing at once, the order
that the state will be committed to the file is non-determisitc.

Instead of deleting the checkpoint file and waiting for the correct
state to be written back, we poll the file, reading until the correct
state is found or timeout.

Signed-off-by: Charlie Kenney <[email protected]>
Update procedure of Cilium Open Source installation on Rancher managed
cluster using HelmCharts and Helm Operator during bootstrapping
of the cluster.

Signed-off-by: Filip Wardzichowski <[email protected]>
matching port numbers on different l4 protocol don't work as
expected. this commit fixes it by grouping all ports on a pod
and produces the services, backends and frontends links instead of
doing it on a per port per container basis.

Signed-off-by: Bernardo Soares <[email protected]>
Use a single command line for both cilium agent and cilium operator.

Signed-off-by: André Martins <[email protected]>
This remove the usage of GlobalServiceCache in the agent which was only useful
to count the number of global Service. This count didn't accounted the local
cluster and thus is misleading. While performance impact was not tested this
removes managing two level of nested maps and a global lock on each
remote endpoints updates which should certainly be valuable.

The global services count reported through cilium-dbg and the CLI
is no longer supported/exposed. Users with an older version of the CLI would
always see a count of 0 reported. Global Service counts will continue to be
reported per cluster along the count with other resources though.

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
Report per cluster metrics using the watch store for endpoints, global
services, MCS service exports like we are doing already for remote
nodes. This doesn't include identities unfortunately which doesn't use
the watch store.

We are no longer attempting to report Global Services and Global Service
Export count. Note that those global count were not accounting the local
cluster which was misleading.

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
This was only needed for the upgrade from v1.17, and can now safely go away
in the v1.19 release.

Signed-off-by: Julian Wiedmann <[email protected]>
This was only needed for the upgrade from v1.17, and can now safely go away
in the v1.19 release.

Signed-off-by: Julian Wiedmann <[email protected]>
Ideally we're not using the ipcache to determine the source identity, but
fully rely on the identity transported via VNI.

Condense the inbound path a bit to have more clarity in which cases we
still require a ipcache lookup.

Signed-off-by: Julian Wiedmann <[email protected]>
… to v1.19.1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
L4AddrFromString takes in a string formatted L4Addr (as in 443/tcp)
and returns an equivalent L4Addr struct

Signed-off-by: Bernardo Soares <[email protected]>
MrFreezeex and others added 25 commits September 22, 2025 19:42
The BPF programs will contain the SkipLB map definition regardless of
whether this is used or not. This can cause flakes in CI if the loader
races loading endpoint programs in parallel and each of those are trying
to pin the skiplb maps. To avoid this always open and pin the skiplb maps.

The error in logs was:

level=error msg="Error while reloading endpoint BPF program" ...
  error="loading eBPF collection into the kernel: map cilium_skip_lb6: pin map to .../cilium_skip_lb6: file exists"

Signed-off-by: Jussi Maki <[email protected]>
This is another goroutine forked by workqueue which we can't reliably wait for
from ShutDown().

Signed-off-by: Jussi Maki <[email protected]>
It is possible to update the local node to hold an inconsistent ENI state which prevents correct ENI device configuration.
This change makes setOwnNodeWithoutPoolUpdate handle local node updates consistent with how updateLocalNodeResource handles it.

Fixes: cilium#41626

Signed-off-by: Jason Aliyetti <[email protected]>
Some cleanup missed in commit:
Commit: e202a4e
Author: Louis DeLosSantos <[email protected]>
Date:   Wed Oct 30 11:01:50 2024 -0400

    ipsec: despecify decrypted overlay

We no longer use the EncryptedOverlayReqID constant anywhere in the
codebase to specify a reqid for overlay traffic. Remove this constant.

Signed-off-by: Louis DeLosSantos <[email protected]>
This is highly verbose and doesn't see enough use to be enabled with
debug. It can still be enabled explicitly.

Signed-off-by: David Bimmler <[email protected]>
There's a shared file in statedir/endpoint-policy.log that is written to
by all endpoints which have policy debug logging enabled. Prior to this
commit, they'd all allocate their own lumberjack.Logger wrapping a file
descriptor for this file. That's broken since a lumberjack logger tracks the
size of writes to determine when to rotate the logfile. Since they'd all
track individual writes of endpoints, the size of the file could get
large without ever rotating. Worse, once rotated, all other endpoints
would still write to their FD, ie the old file.

Clean this up by sharing a single logger, which is likely a bottleneck,
but one that shouldn't be hit in prod since this is clearly a debug option.

Signed-off-by: David Bimmler <[email protected]>
This commit fixes an `Owns` call that was updated
to use EndpointSlice (instead of Endpoint) in

This change was missed in the refactor in cilium#41323.

Signed-off-by: Nick Young <[email protected]>
This is a move/rename only commit preparing the structure to refactor
the namespace manager as a Cell.

Signed-off-by: Alexandre Perrin <[email protected]>
For testing purposes, reducing the use of the public
namespace.NewManager.

Cosmetic dedup in local_observer_test.go on the way, making noopParser
accept testing.TB as param.

Signed-off-by: Alexandre Perrin <[email protected]>
Setup the namespace cleanup as a job.Timer instead of open-coding in our
own goroutine.

Signed-off-by: Alexandre Perrin <[email protected]>
This commit extends the pkg/shell to allow configuring the shell socket path
via cell config. This is useful in all those cases in which we may want
to leverage pkg/shell for IPC (eg. in tests with multiple forked processes)
or if we just want to change the default path for convenience.
Documentation updates have been generated accordingly.

Signed-off-by: Simone Magnani <[email protected]>
Signed-off-by: Jarno Rajahalme <[email protected]>
When Cilium starts in tunnel mode, a route for each remote node pod
CIDRs is added to the current node. For these routes, the MTU is set to
1450, to include the tunnel overhead.

If the routing mode is then changed to native and Cilium is restarted,
the stale routes should be deleted at startup. Unfortunately, this is
currently not happening because the MTU is set to 1500 in the deletion
request, resulting in the folowing error from netlink:

msg="Unable to delete route" ... error="no such process"

In other words, the route cannot be found because the MTU value set in
the deletion request is not matching the one of the installed route.

To solve this, just disregard the MTU value while deleting a route. This
should still allow to correctly remove stale IPsec related routes as
intended in commit 35ca979

Fixes: 35ca979 ("datapath/linux/route: Fix Delete")
Fixes: cilium#41811

Signed-off-by: Fabio Falzoi <[email protected]>
Fill the [metav1.TypeMeta] for objects added via the Clientset if the
TypeMeta is unset.

E.g. CoreV1().Nodes().Create(&Node{ObjectMeta{Name: "foo"}}) would have
previously created a node object with Kind="" and APIVersion="" and with
this it'll have Kind="Node" and APIVersion="v1".

Signed-off-by: Jussi Maki <[email protected]>
Hostfw and ipsec aren't compatible.

Signed-off-by: darox <[email protected]>
* Fix v6 utils that had `svc_one` hardcoded.
* Improve `pkt_defs.py` by using `()` instead of \.

Signed-off-by: Marc Suñé <[email protected]>
Changes introduced in df5501e missed to assign the interface_mac,
in the ipv6 ND tests, resulting in packets sent with the NULL MAC
address.

The test passes as asserts use the same value to check against.

Use another (valid) MAC for the interface.

Signed-off-by: Marc Suñé <[email protected]>
Commit 11c329f fixed handling of ICMPv6 neighbour solicitations
that didn't have Link Layer Source option. For NA, it adds the
8 additional bytes of the option.

While moving IPv6 NDP unit tests to scapy, unit tests failed due to
an incorrect ICMPv6 checksum for non-LLSRC opt NS packets was. The
problem is that ICMPv6 pseudoheader contains the payload length and
the code was not considering it as part of the csum diff.

This was not spotted because:

* Unit tests (before scapy) don't check csums.
* When L4 csum is offloaded, it really doesn't matter.

This commit changes `icmp6_send_ndisc_adv()`:

* Fixes the csum accordingly
* Removes an unnecessary call to `l4_csum_replace()`, by
  accumulating the csum diff in `sum`.

NOTE: it would be a good idea to refactor `icmp6_send_ndisc_adv()`
to use direct packet access _and_ avoid superfluous copies of the
(new) icmpv6 hdr and (new) opts. Seems like a good first issue :).

Signed-off-by: Marc Suñé <[email protected]>
Fix ASSERT_CTX_BUF_OFF() implementation incorrectly accessing
data (in stack) instead of __data (in the body of the assert).

Move aux pointers to __DATA, __DATA_END to avoid this in the
future.

Signed-off-by: Marc Suñé <[email protected]>
This commit adapts the IPv6 NDP BPF unit test to scapy.

Signed-off-by: Marc Suñé <[email protected]>
Create an auxiliary function that encapsulates the return code
checks, reducing drastically the number of lines.

Signed-off-by: Marc Suñé <[email protected]>
This commit fixes an issue for encrypted packets arriving in bpf_host.
When checking that they are encrypted using the packet mark, we
shouldn't expect the mark to be equal to MARK_MAGIC_DECRYPT. Instead, we
should check that the MARK_MAGIC_DECRYPT bit is set.

This issue isn't affecting anything today, but will once we support
IPsec + BPF Host Routing.

Fixes: 1dadae3 ("bpf: Don't skip local delivery for plain-text packets")
Signed-off-by: Paul Chaignon <[email protected]>
@zstas zstas force-pushed the vtep_policy_cleanup branch 2 times, most recently from 33c7688 to d623d6a Compare September 24, 2025 13:07
Co-developed-by: Jarrod Baumann <[email protected]>
Signed-off-by: Jarrod Baumann <[email protected]>
Signed-off-by: Stanislav Zaikin <[email protected]>
@zstas zstas force-pushed the vtep_policy_cleanup branch from d623d6a to d89617a Compare September 24, 2025 14:06
@jarrodb jarrodb force-pushed the vtep_policy_cleanup branch from 95fbccb to d89617a Compare September 24, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.