Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network churn Load test - Add network policy enforcement latency measurement #431

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d69ce5d
fix service churn feature pipeline name (#417)
jshr-w Dec 2, 2024
5024f5c
Merge branch 'main' of https://github.com/Azure/telescope into networ…
agrawaliti Dec 30, 2024
a0599f3
Refactor YAML files for network churn: clean up formatting and add OW…
agrawaliti Dec 30, 2024
9814101
Merge branch 'main' into network-churn
agrawaliti Jan 6, 2025
b9aa5f6
Update nodes_per_nodepool value in validate-resources.yml to 500
agrawaliti Jan 8, 2025
e976e31
Enable existing namespaces in load-config.yaml
agrawaliti Jan 8, 2025
9c20982
Remove redundant parameters from validate-resources.yml
agrawaliti Jan 9, 2025
9b55220
Disable existing namespaces in autoscale and slo configurations; adju…
agrawaliti Jan 13, 2025
bdb7267
Enable existing namespaces in autoscale configuration and update temp…
agrawaliti Jan 13, 2025
6ced26d
Fix indentation in collect-clusterloader2.yml for template path consi…
agrawaliti Jan 13, 2025
2e3e8ee
Update load-config.yaml and slo.py for deployment size adjustments an…
agrawaliti Jan 14, 2025
568b69c
Add new parameters for namespaces, network policies, and service test…
agrawaliti Jan 15, 2025
6ddeace
To test Commit
agrawaliti Jan 15, 2025
a3c8a83
Revert and update deployment size
agrawaliti Jan 15, 2025
56df80f
test load condition
agrawaliti Jan 15, 2025
b7e79fc
Test valid datatype
agrawaliti Jan 15, 2025
95e7006
test string type
agrawaliti Jan 15, 2025
ed4c562
test type
agrawaliti Jan 15, 2025
293b0cb
add commnet
agrawaliti Jan 15, 2025
6f6950f
test if
agrawaliti Jan 15, 2025
725c6c2
conditional logic
agrawaliti Jan 15, 2025
b27b806
refactor: update group size parameters and maintain backward compatib…
agrawaliti Jan 15, 2025
ec21cce
fix: update nginx image to use the latest version from the Azure cont…
agrawaliti Jan 15, 2025
c26601d
add metrics for npm
agrawaliti Jan 15, 2025
00c1e73
adding condition
agrawaliti Jan 15, 2025
6beb6b1
adding trigger
agrawaliti Jan 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions jobs/competitive-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ parameters:
- name: run_id
type: string
default: ''
- name: run_id_2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is run id 2 for?

Copy link
Author

@agrawaliti agrawaliti Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using two different pre created cluster for azure_cilium and azure_cni_overlay and I am passing those two clusters using run_id and run_id_2, as creating two new cluster for every run with 1000 nodes each takes a very long time, so I am passing two cluster tags to run tests on them.

On second thought I am thinking i can do it with terraform and schedule it to run periodically.

type: string
default: ''
- name: timeout_in_minutes
type: number
default: 60 # default when not specified is 60 minutes
Expand All @@ -48,6 +51,9 @@ parameters:
- name: ssh_key_enabled
type: boolean
default: true
- name: use_secondary_cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is secondary cluster for?

type: boolean
default: false

jobs:
- job: ${{ parameters.cloud }}
Expand All @@ -62,10 +68,12 @@ jobs:
cloud: ${{ parameters.cloud }}
region: ${{ parameters.regions[0] }}
run_id: ${{ parameters.run_id }}
run_id_2: ${{ parameters.run_id_2 }}
test_modules_dir: ${{ parameters.test_modules_dir }}
retry_attempt_count: ${{ parameters.retry_attempt_count }}
credential_type: ${{ parameters.credential_type }}
ssh_key_enabled: ${{ parameters.ssh_key_enabled }}
use_secondary_cluster: ${{ parameters.use_secondary_cluster }}
- template: /steps/provision-resources.yml
parameters:
cloud: ${{ parameters.cloud }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@

{{$Image := DefaultParam .Image "mcr.microsoft.com/oss/kubernetes/pause:3.6"}}

{{$EnableNetworkPolicyEnforcementLatencyTest := DefaultParam .EnableNetworkPolicyEnforcementLatencyTest false}}
{{$TargetLabelValue := DefaultParam .TargetLabelValue "enforcement-latency"}}
# Run a server pod for network policy enforcement latency test only on every Nth pod.
# Default run on every pod.
{{$NetPolServerOnEveryNthPod := 1}}
{{$RunNetPolicyTest := and $EnableNetworkPolicyEnforcementLatencyTest (eq (Mod .Index $NetPolServerOnEveryNthPod) 0)}}

apiVersion: apps/v1
kind: Deployment
metadata:
Expand All @@ -16,7 +23,7 @@ spec:
replicas: {{.Replicas}}
selector:
matchLabels:
name: {{.Name}}
name: {{if $RunNetPolicyTest}}policy-load-{{end}}{{.Name}}
strategy:
type: RollingUpdate
rollingUpdate:
Expand All @@ -25,29 +32,43 @@ spec:
template:
metadata:
labels:
name: {{.Name}}
name: {{if $RunNetPolicyTest}}policy-load-{{end}}{{.Name}}
group: {{.Group}}
{{if .SvcName}}
svc: {{.SvcName}}-{{.Index}}
{{end}}
restart: {{.deploymentLabel}}
{{if $RunNetPolicyTest}}
net-pol-test: {{$TargetLabelValue}}
{{end}}
spec:
nodeSelector:
slo: "true"
{{if $RunNetPolicyTest}}
hostNetwork: false
containers:
- image: acnpublic.azurecr.io/scaletest/nginx:latest
name: nginx-server
ports:
- containerPort: 80
resources:
requests:
cpu: {{$CpuRequest}}
memory: {{$MemoryRequest}}
{{else}}
containers:
- env:
- name: ENV_VAR
value: a
image: {{$Image}}
imagePullPolicy: IfNotPresent
name: {{.Name}}
ports:
ports: []
resources:
requests:
cpu: {{$CpuRequest}}
memory: {{$MemoryRequest}}
# Add not-ready/unreachable tolerations for 15 minutes so that node
# failure doesn't trigger pod deletion.
{{end}}
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
Expand All @@ -60,4 +81,4 @@ spec:
- key: "slo"
operator: "Equal"
value: "true"
effect: "NoSchedule"
effect: "NoSchedule"
71 changes: 67 additions & 4 deletions modules/python/clusterloader2/slo/config/load-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: load-config

# Config options for test type
{{$SERVICE_TEST := DefaultParam .CL2_SERVICE_TEST true}}
{{$NETWORK_TEST := DefaultParam .CL2_NETWORK_TEST false}}

# Config options for test parameters
{{$nodesPerNamespace := DefaultParam .CL2_NODES_PER_NAMESPACE 100}}
Expand All @@ -12,12 +13,12 @@ name: load-config
{{$groupName := DefaultParam .CL2_GROUP_NAME "service-discovery"}}

# TODO(jshr-w): This should eventually use >1 namespace.
{{$namespaces := 1}}
{{$namespaces := DefaultParam .CL2_NO_OF_NAMESPACES 1}}
{{$nodes := DefaultParam .CL2_NODES 1000}}

{{$deploymentQPS := DivideFloat $loadTestThroughput $deploymentSize}}
{{$operationTimeout := DefaultParam .CL2_OPERATION_TIMEOUT "15m"}}
{{$totalPods := MultiplyInt $namespaces $nodes $podsPerNode}}
{{$totalPods := MultiplyInt $namespaces $nodesPerNamespace $podsPerNode}}
{{$podsPerNamespace := DivideInt $totalPods $namespaces}}
{{$deploymentsPerNamespace := DivideInt $podsPerNamespace $deploymentSize}}

Expand All @@ -31,7 +32,17 @@ name: load-config
{{$BIG_GROUP_SIZE := DefaultParam .BIG_GROUP_SIZE 4000}}
{{$SMALL_GROUP_SIZE := DefaultParam .SMALL_GROUP_SIZE 20}}
{{$bigDeploymentsPerNamespace := DefaultParam .bigDeploymentsPerNamespace 1}}
{{$smallDeploymentPods := SubtractInt $podsPerNamespace (MultiplyInt $bigDeploymentsPerNamespace $BIG_GROUP_SIZE)}}

# Use explicit conditional block to assign smallDeploymentPods to maintain backward compatibility
{{$calculatedPods := SubtractInt $podsPerNamespace (MultiplyInt $bigDeploymentsPerNamespace $BIG_GROUP_SIZE)}}

{{$smallDeploymentPods := 0}}
{{if $NETWORK_TEST}}
{{$smallDeploymentPods = $podsPerNamespace}}
{{else}}
{{$smallDeploymentPods = $calculatedPods}}
{{end}}

{{$smallDeploymentsPerNamespace := DivideInt $smallDeploymentPods $SMALL_GROUP_SIZE}}

namespace:
Expand All @@ -53,7 +64,7 @@ tuningSets:
qps: {{$deploymentQPS}}

steps:
- name: Log - namespaces={{$namespaces}}, nodesPerNamespace={{$nodesPerNamespace}}, podsPerNode={{$podsPerNode}}, totalPods={{$totalPods}}, podsPerNamespace={{$podsPerNamespace}}, deploymentsPerNamespace={{$deploymentsPerNamespace}}, deploymentSize={{$deploymentSize}}, deploymentQPS={{$deploymentQPS}}
- name: Log - namespaces={{$namespaces}}, nodes={{$nodes}}, nodesPerNamespace={{$nodesPerNamespace}}, podsPerNode={{$podsPerNode}}, totalPods={{$totalPods}}, podsPerNamespace={{$podsPerNamespace}}, deploymentsPerNamespace={{$deploymentsPerNamespace}}, deploymentSize={{$deploymentSize}}, deploymentQPS={{$deploymentQPS}}
measurements:
- Identifier: Dummy
Method: Sleep
Expand All @@ -74,6 +85,13 @@ steps:
action: start
{{end}}

{{if $NETWORK_TEST}}
- module:
path: /modules/network-policy/net-policy-metrics.yaml
params:
action: start
{{end}}

{{range $i := Loop $repeats}}
{{if $SERVICE_TEST}}
- module:
Expand All @@ -85,6 +103,15 @@ steps:
bigServicesPerNamespace: {{$bigDeploymentsPerNamespace}}
{{end}}

{{if $NETWORK_TEST}}
- module:
path: modules/network-policy/net-policy-enforcement-latency.yaml
params:
setup: true
run: true
testType: "pod-creation"
{{end}}

- module:
path: /modules/reconcile-objects.yaml
params:
Expand All @@ -101,6 +128,27 @@ steps:
Group: {{$groupName}}
deploymentLabel: start

{{if $NETWORK_TEST}}
- module:
path: modules/network-policy/net-policy-metrics.yaml
params:
action: gather
usePolicyCreationMetrics: true
usePodCreationMetrics: true

- module:
path: modules/network-policy/net-policy-enforcement-latency.yaml
params:
complete: true
testType: "pod-creation"

- module:
path: modules/network-policy/net-policy-enforcement-latency.yaml
params:
run: true
testType: "policy-creation"
{{end}}

- module:
path: /modules/reconcile-objects.yaml
params:
Expand Down Expand Up @@ -152,3 +200,18 @@ steps:
params:
action: gather
group: {{$groupName}}

{{if $NETWORK_TEST}}
- module:
path: modules/network-policy/net-policy-metrics.yaml
params:
action: gather
usePolicyCreationMetrics: true
usePodCreationMetrics: true

- module:
path: modules/network-policy/net-policy-enforcement-latency.yaml
params:
complete: true
testType: "policy-creation"
{{end}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{{$NETWORK_POLICY_ENFORCEMENT_LATENCY_BASELINE := DefaultParam .CL2_NETWORK_POLICY_ENFORCEMENT_LATENCY_BASELINE false}}
{{$NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_KEY := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_KEY "net-pol-test"}}
{{$NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_VALUE := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_VALUE "enforcement-latency"}}
{{$NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_KEY := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_KEY "test"}}
{{$NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_VALUE := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_VALUE "net-policy-client"}}
{{$NET_POLICY_ENFORCEMENT_LATENCY_MAX_TARGET_PODS_PER_NS := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LATENCY_MAX_TARGET_PODS_PER_NS 100}}
{{$NET_POLICY_ENFORCEMENT_LOAD_COUNT := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LOAD_COUNT 1000}}
{{$NET_POLICY_ENFORCEMENT_LOAD_QPS := DefaultParam .CL2_NET_POLICY_ENFORCEMENT_LOAD_QPS 10}}
{{$NET_POLICY_ENFORCEMENT_LOAD_TARGET_NAME := DefaultParam .CL2_POLICY_ENFORCEMENT_LOAD_TARGET_NAME "small-deployment"}}

{{$setup := DefaultParam .setup false}}
{{$run := DefaultParam .run false}}
{{$complete := DefaultParam .complete false}}
{{$testType := DefaultParam .testType "policy-creation"}}
# Target port needs to match the server container port of target pods that have
# "targetLabelKey: targetLabelValue" label selector.
{{$targetPort := 80}}

steps:
{{if $setup}}
- name: Setup network policy enforcement latency measurement
measurements:
- Identifier: NetworkPolicyEnforcement
Method: NetworkPolicyEnforcement
Params:
action: setup
targetLabelKey: {{$NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_KEY}}
targetLabelValue: {{$NET_POLICY_ENFORCEMENT_LATENCY_TARGET_LABEL_VALUE}}
baseline: {{$NETWORK_POLICY_ENFORCEMENT_LATENCY_BASELINE}}
testClientNodeSelectorKey: {{$NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_KEY}}
testClientNodeSelectorValue: {{$NET_POLICY_ENFORCEMENT_LATENCY_NODE_LABEL_VALUE}}
{{end}}

{{if $run}}
- name: "Run pod creation network policy enforcement latency measurement (testType={{$testType}})"
measurements:
- Identifier: NetworkPolicyEnforcement
Method: NetworkPolicyEnforcement
Params:
action: run
testType: {{$testType}}
targetPort: {{$targetPort}}
maxTargets: {{$NET_POLICY_ENFORCEMENT_LATENCY_MAX_TARGET_PODS_PER_NS}}
policyLoadCount: {{$NET_POLICY_ENFORCEMENT_LOAD_COUNT}}
policyLoadQPS: {{$NET_POLICY_ENFORCEMENT_LOAD_QPS}}
policyLoadTargetBaseName: {{$NET_POLICY_ENFORCEMENT_LOAD_TARGET_NAME}}
{{end}}

{{if $complete}}
- name: "Complete pod creation network policy enforcement latency measurement (testType={{$testType}})"
measurements:
- Identifier: NetworkPolicyEnforcement
Method: NetworkPolicyEnforcement
Params:
action: complete
testType: {{$testType}}
{{end}}
Loading