test: expand LRP test to include lifecycle events #4086

karina-ranadive · 2025-10-17T20:32:08Z

Reason for Change:
Test Steps:

Initial Validation - Performs an initial DNS test to verify LRP is working correctly with metrics validation
Cilium Command Validation - Uses Cilium CLI commands to validate that LRP is properly configured in the Cilium dataplane, checking both cilium lrp list and cilium service list outputs
Pod Restart Testing - Restarts client pods and verifies that LRP functionality persists after pod restarts, ensuring the policy survives pod lifecycle changes
Post-Restart Validation - Re-runs DNS tests and Cilium validation to confirm LRP still works after client pod restarts
Resource Recreation Testing:

Deleting client daemonsets and LRP resources
Recreating the LRP policy
Restarting node-local-dns pods to pick up the new configuration
Recreating client daemonsets

Port Forward Re-establishment - Sets up a new port forward to the newly restarted node-local-dns pod (using a different local port to avoid conflicts)
Final Metrics Validation - Validates that DNS requests are properly redirected to the new node-local-dns pod through metrics monitoring
Final Cilium Validation - Performs a final check using Cilium commands to ensure LRP is still active after all the resource recreation

Purpose: This comprehensive lifecycle test ensures that Local Redirect Policy functionality is robust and survives various operational scenarios including pod restarts, resource deletion/recreation, and service restarts - validating both functional behavior through DNS metrics and dataplane configuration through Cilium CLI inspection.

Issue Fixed:

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
relevant PR labels added

Notes:

Copilot

Pull Request Overview

Adds negative Local Redirect Policy (LRP) test scenarios and parameterizes Prometheus endpoint usage in existing LRP tests.

Extends testLRPCase to accept a Prometheus address for flexible metric validation.
Introduces comprehensive negative and resilience LRP tests (resource recreation, pod restarts, cilium validation).
Adds helper functions for validating Cilium LRP state and recreating test resources.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
test/integration/lrp/lrp_test.go	Adds negative LRP test flow, Prometheus address parameter, and Cilium/LRP validation helpers.
test/integration/lrp/lrp_fqdn_test.go	Updates calls to testLRPCase to include the new Prometheus address parameter.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-17T20:32:44Z

test/integration/lrp/lrp_test.go

+		require.Greater(t, afterValue, beforeValue, "dns metric count did not increase after command - before: %.0f, after: %.0f", beforeValue, afterValue)
 	} else {
-		require.Equal(t, afterMetric.GetCounter().GetValue(), beforeMetric.GetCounter().GetValue(), "dns metric count increased after command")
+		require.Equal(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue)


The testify assertions use formatted messages with placeholders but require.Greater and require.Equal do not perform format substitution; use require.Greaterf / require.Equalf or pre-format the message with fmt.Sprintf to ensure values appear in failure output.

Suggested change

require.Equal(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue)

require.Equalf(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue)

Copilot · 2025-10-17T20:32:45Z

test/integration/lrp/lrp_test.go

+	// Wait for deletion to complete
+	time.Sleep(10 * time.Second)


Fixed sleep introduces unnecessary delay and potential flakiness; replace with a poll-based wait (e.g., repeatedly checking for resource absence/recreation with a timeout) to reduce test duration and improve reliability.

Suggested change

// Wait for deletion to complete

time.Sleep(10 * time.Second)

// Wait for deletion to complete (poll for absence of client DaemonSet and LRP)

retry.DoWithTimeout(ctx, "wait for client DaemonSet deletion", 30*time.Second, func(ctx context.Context) (bool, error) {

_, err := dsClient.Get(ctx, clientDS.Name, metav1.GetOptions{})

if err != nil {

// DaemonSet not found

return true, nil

}

// Still exists

return false, nil

})

retry.DoWithTimeout(ctx, "wait for LRP deletion", 30*time.Second, func(ctx context.Context) (bool, error) {

_, err := lrpClient.Get(ctx, lrp.Name, metav1.GetOptions{})

if err != nil {

// LRP not found

return true, nil

}

// Still exists

return false, nil

})

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

test/integration/lrp/lrp_test.go:517

The rand.Seed() call is made every time TakeOne() is invoked, which can cause issues in concurrent tests or when called multiple times in quick succession (as the seed would be very similar). Additionally, rand.Seed() has been deprecated since Go 1.20 in favor of using the global generator which is automatically seeded.

Recommendation: Remove the rand.Seed() call and use the global random generator directly, which is automatically seeded in Go 1.20+:

func TakeOne[T any](slice []T) T {
    if len(slice) == 0 {
        var zero T
        return zero
    }
    return slice[rand.Intn(len(slice))]
}

Alternatively, if you need better randomness, use crypto/rand or math/rand/v2 for Go 1.22+.

// TakeOne takes one item from the slice randomly; if empty, it returns the empty value for the type
// Use in testing only
func TakeOne[T any](slice []T) T {
	if len(slice) == 0 {
		var zero T
		return zero
	}
	rand.Seed(uint64(time.Now().UnixNano()))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/integration/manifests/cilium/v1.13/cilium-config/cilium-config.yaml

Copilot · 2025-11-12T22:28:32Z

test/integration/lrp/lrp_test.go

+	// Wait for deletion to complete
+	time.Sleep(10 * time.Second)


[nitpick] Using a hardcoded 10-second sleep to wait for resource deletion is fragile and can lead to flaky tests. The sleep might be too short in slow environments or unnecessarily long in fast environments.

Recommendation: Replace the hardcoded sleep with proper wait functions to ensure deletion has completed. You can use WaitForPodsDelete() for pods and consider implementing similar wait functions for LRP deletion, or use Kubernetes' metadata.deletionTimestamp to poll for actual deletion:

// Wait for client pods to be deleted err = kubernetes.WaitForPodsDelete(ctx, cs, originalPod.Namespace, clientLabelSelector) require.NoError(t, err) // Wait for LRP to be deleted by polling err = waitForLRPDelete(ctx, lrpClient, lrp.Name) require.NoError(t, err)

QxBytes · 2025-11-12T19:12:57Z

test/integration/lrp/lrp_test.go

+
+	// Step 1: Initial DNS test to verify LRP is working
+	t.Log("Step 1: Initial DNS test - verifying LRP functionality")
+	testLRPCase(t, ctx, clientPod, []string{


is this the same as the basic lrp test called in TestLRP?

QxBytes · 2025-11-12T19:33:06Z

test/integration/lrp/lrp_test.go

+		if strings.Contains(line, "nodelocaldns") && strings.Contains(line, "kube-system") {
+			// Validate that the line contains expected components
+			require.Contains(t, line, "kube-system", "LRP line should contain kube-system namespace")
+			require.Contains(t, line, "nodelocaldns", "LRP line should contain nodelocaldns name")


these two require checks are redundant right? If we are in the if statement the line should already contain kube-system and nodelocaldns.

QxBytes · 2025-11-12T19:44:24Z

test/integration/lrp/lrp_test.go

+
+// restartClientPodsAndGetPod restarts the client daemonset and returns a new pod reference
+func restartClientPodsAndGetPod(t *testing.T, ctx context.Context, cs *k8sclient.Clientset, originalPod corev1.Pod) corev1.Pod {
+	// Find the daemonset name by looking up the pod's owner


is this comment still accurate? we seem to be finding the pod's node name with the pod detail information

QxBytes · 2025-11-12T19:53:56Z

test/integration/lrp/lrp_test.go

+	ciliumCS, err := ciliumClientset.NewForConfig(config)
+	require.NoError(t, err)
+
+	nodeName := originalPod.Spec.NodeName


Comparing this with restartClientPodsAndGetPod-- this seems like a simpler way to get the original pod's nodename. Is there a difference between this and doing podDetails, err := cs.CoreV1().Pods(originalPod.Namespace).Get(ctx, originalPod.Name, metav1.GetOptions{}) in restartClientPodsAndGetPod ?

QxBytes · 2025-11-12T23:35:56Z

test/integration/lrp/lrp_test.go

+	recreatedPod := deleteAndRecreateResources(t, ctx, cs, clientPod)
+
+	// Step 7: Final verification after recreation
+	t.Log("Step 7: Final verification after resource recreation - skipping basic DNS test, will validate with metrics in Step 8")


is there supposed to be code for step 7?

karina-ranadive requested a review from a team as a code owner October 17, 2025 20:32

karina-ranadive requested review from Copilot and thatmattlong October 17, 2025 20:32

Copilot AI reviewed Oct 17, 2025

View reviewed changes

karina-ranadive force-pushed the karanadive/lrp-negative-test branch from 321a3ea to ba2b602 Compare October 17, 2025 20:39

karina-ranadive force-pushed the karanadive/lrp-negative-test branch from ba2b602 to b947a40 Compare October 27, 2025 21:25

karina-ranadive force-pushed the karanadive/lrp-negative-test branch from 49f04a2 to 1b852ce Compare November 7, 2025 20:34

test: expand LRP test to include lifecycle events

4e6b260

karina-ranadive force-pushed the karanadive/lrp-negative-test branch from dfb53d7 to 4e6b260 Compare November 10, 2025 17:42

karina-ranadive changed the title ~~LRP negative test~~ LRP lifecycle test Nov 10, 2025

fix

022de0e

karina-ranadive changed the title ~~LRP lifecycle test~~ test: expand LRP test to include lifecycle events Nov 12, 2025

jpayne3506 requested a review from Copilot November 12, 2025 22:20

Copilot started reviewing on behalf of jpayne3506 November 12, 2025 22:21 View session

Copilot finished reviewing on behalf of jpayne3506 November 12, 2025 22:24

Copilot AI reviewed Nov 12, 2025

View reviewed changes

QxBytes reviewed Nov 12, 2025

View reviewed changes

	require.Equal(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue)
	require.Equalf(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue)

		// Wait for deletion to complete
		time.Sleep(10 * time.Second)

-	// Wait for deletion to complete
-	time.Sleep(10 * time.Second)
+	// Wait for deletion to complete (poll for absence of client DaemonSet and LRP)
+	retry.DoWithTimeout(ctx, "wait for client DaemonSet deletion", 30*time.Second, func(ctx context.Context) (bool, error) {
+		_, err := dsClient.Get(ctx, clientDS.Name, metav1.GetOptions{})
+		if err != nil {
+			// DaemonSet not found
+			return true, nil
+		}
+		// Still exists
+		return false, nil
+	})
+	retry.DoWithTimeout(ctx, "wait for LRP deletion", 30*time.Second, func(ctx context.Context) (bool, error) {
+		_, err := lrpClient.Get(ctx, lrp.Name, metav1.GetOptions{})
+		if err != nil {
+			// LRP not found
+			return true, nil
+		}
+		// Still exists
+		return false, nil
+	})

test: expand LRP test to include lifecycle events #4086

Are you sure you want to change the base?

test: expand LRP test to include lifecycle events #4086

Uh oh!

Conversation

karina-ranadive commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

jpayne3506 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

QxBytes Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

QxBytes Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

QxBytes Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

QxBytes Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

QxBytes Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karina-ranadive commented Oct 17, 2025 •

edited

Loading