Skip to content

Commit

Permalink
Fix infinite requeue for Role/Policy with null description (#98)
Browse files Browse the repository at this point in the history
Fixes aws-controllers-k8s/community#1919
Fixes aws-controllers-k8s/community#1772
Fixes aws-controllers-k8s/community#1610
Fixes aws-controllers-k8s/community#1490
Fixes aws-controllers-k8s/community#2000
Fixes aws-controllers-k8s/community#1939
Fixes aws-controllers-k8s/community#1932

Currently, the controller requeues infinitely if a Role or Policy resource
is created with a null (empty string) description. This issue stems from the
late initialization process used to populate fields defaulted by the AWS API.

For `CreateRole` and `CreatePolicy` APIs, AWS accept a Description field but
fails to return it. This disrupts the controller's runtime logic causing
unnecessary infinite requeues and preventing resources from being marked
Synced

While a fix could target code-gen and runtime, this is a unique behavior
specific to IAM APIs. As a temporary solution, we disable late initialization
for the Description field and avoid unpopulating it after a Create call

This commit fixes the infinite requeue bug for Roles and Policies with null
descriptions and adds e2e tests to prevent regressions

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • Loading branch information
a-hilaly authored Mar 12, 2024
1 parent 15a6365 commit ae12460
Show file tree
Hide file tree
Showing 12 changed files with 238 additions and 34 deletions.
8 changes: 4 additions & 4 deletions apis/v1alpha1/ack-generate-metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
ack_generate_info:
build_date: "2024-03-06T21:39:35Z"
build_hash: a5ba3c851434263128a1464a2c41e528779eeefa
build_date: "2024-03-12T15:06:30Z"
build_hash: d86061f5f579fe5d3a07528917d95e34e79c4dc0
go_version: go1.22.0
version: v0.32.1
version: v0.32.1-1-gd86061f-dirty
api_directory_checksum: 761a2c708651b0273bf39d98dddaf029de23d337
api_version: v1alpha1
aws_sdk_go_version: v1.49.0
generator_config_info:
file_checksum: ee030a9c18c7fa34d8f7f4777997df355c029971
file_checksum: 8f85dee132779d172c0fd19537c3bbd709e079d6
original_file_name: generator.yaml
last_modification:
reason: API generation
18 changes: 16 additions & 2 deletions apis/v1alpha1/generator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@ resources:
- InvalidInput
- MalformedPolicyDocument
fields:
# Left for historical purposes. It looks like late_initialize is was
# causing the controller to infinitely requeue (every 5 seconds) when the
# description was set to nil. Not it looks like this is not needed
# anymore.
# Note(a-hilaly): Very likely the API behavior has changed and the
# late_initialize is no longer needed.
#
Description:
# You might be wondering why description is late-initialized, since
# there isn't a default server-side value for description.
Expand All @@ -165,7 +172,10 @@ resources:
# we set the late initialize property of the Description field here to
# override the Spec.Description to the original value we set in the
# CreatePolicy *input* shape.
late_initialize: {}
#late_initialize: {}
set:
- ignore: true
method: Create
Path:
late_initialize: {}
Tags:
Expand Down Expand Up @@ -211,9 +221,13 @@ resources:
set:
# The input and output shapes are different...
- from: PermissionsBoundary.PermissionsBoundaryArn
# Left for historical purposes.
Description:
set:
- ignore: true
method: Create
# See above in Policy resource about why this is here.
late_initialize: {}
# late_initialize: {}
Path:
late_initialize: {}
# In order to support attaching zero or more policies to a role, we use
Expand Down
18 changes: 16 additions & 2 deletions generator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@ resources:
- InvalidInput
- MalformedPolicyDocument
fields:
# Left for historical purposes. It looks like late_initialize is was
# causing the controller to infinitely requeue (every 5 seconds) when the
# description was set to nil. Not it looks like this is not needed
# anymore.
# Note(a-hilaly): Very likely the API behavior has changed and the
# late_initialize is no longer needed.
#
Description:
# You might be wondering why description is late-initialized, since
# there isn't a default server-side value for description.
Expand All @@ -165,7 +172,10 @@ resources:
# we set the late initialize property of the Description field here to
# override the Spec.Description to the original value we set in the
# CreatePolicy *input* shape.
late_initialize: {}
#late_initialize: {}
set:
- ignore: true
method: Create
Path:
late_initialize: {}
Tags:
Expand Down Expand Up @@ -211,9 +221,13 @@ resources:
set:
# The input and output shapes are different...
- from: PermissionsBoundary.PermissionsBoundaryArn
# Left for historical purposes.
Description:
set:
- ignore: true
method: Create
# See above in Policy resource about why this is here.
late_initialize: {}
# late_initialize: {}
Path:
late_initialize: {}
# In order to support attaching zero or more policies to a role, we use
Expand Down
8 changes: 1 addition & 7 deletions pkg/resource/policy/manager.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 0 additions & 5 deletions pkg/resource/policy/sdk.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 1 addition & 7 deletions pkg/resource/role/manager.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 0 additions & 5 deletions pkg/resource/role/sdk.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions test/e2e/resources/policy_no_description.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: iam.services.k8s.aws/v1alpha1
kind: Policy
metadata:
name: $POLICY_NAME
spec:
name: $POLICY_NAME
policyDocument: '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":"s3:ListAllMyBuckets","Resource":"arn:aws:s3:::*"},{"Effect":"Allow","Action":["s3:List*"],"Resource":["*"]}]}'
19 changes: 19 additions & 0 deletions test/e2e/resources/role_no_description.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
name: $ROLE_NAME
spec:
name: $ROLE_NAME
assumeRolePolicyDocument: >
{
"Version":"2012-10-17",
"Statement": [{
"Effect":"Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": ["sts:AssumeRole"]
}]
}
155 changes: 155 additions & 0 deletions test/e2e/tests/test_descriptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Copyright Amazon.com Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You may
# not use this file except in compliance with the License. A copy of the
# License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.

"""Integration tests for resource description fields
The reason we are dedicating a test file for this is because the description/lateinitialize
bug is the most common bug we have seen in the ACK project. This test file is dedicated
to testing the description field for policy and role resources.
See bugs:
-
"""

import json
import time

import pytest

from acktest.k8s import condition
from acktest.k8s import resource as k8s
from acktest.resources import random_suffix_name
from e2e import service_marker, CRD_GROUP, CRD_VERSION, load_resource
from e2e.common.types import POLICY_RESOURCE_PLURAL
from e2e.replacement_values import REPLACEMENT_VALUES
from e2e import policy
from e2e import role

DELETE_WAIT_SECONDS = 10
CREATE_WAIT_SECONDS = 10
MODIFY_WAIT_SECONDS = 10


@pytest.fixture(scope="module")
def policy_with_no_description():
policy_name = random_suffix_name("my-simple-policy", 24)

replacements = REPLACEMENT_VALUES.copy()
replacements['POLICY_NAME'] = policy_name

resource_data = load_resource(
"policy_no_description",
additional_replacements=replacements,
)

ref = k8s.CustomResourceReference(
CRD_GROUP, CRD_VERSION, POLICY_RESOURCE_PLURAL,
policy_name, namespace="default",
)
k8s.create_custom_resource(ref, resource_data)
cr = k8s.wait_resource_consumed_by_controller(ref)

cr = k8s.get_resource(ref)
assert cr is not None
assert 'status' in cr
assert 'ackResourceMetadata' in cr['status']
assert 'arn' in cr['status']['ackResourceMetadata']
policy_arn = cr['status']['ackResourceMetadata']['arn']

policy.wait_until_exists(policy_arn)

assert cr is not None
assert k8s.get_resource_exists(ref)

yield (ref, cr)

_, deleted = k8s.delete_custom_resource(
ref,
period_length=DELETE_WAIT_SECONDS,
)
assert deleted

policy.wait_until_deleted(policy_arn)

@pytest.fixture(scope="module")
def role_with_no_description():
role_name = random_suffix_name("my-simple-role", 24)

replacements = REPLACEMENT_VALUES.copy()
replacements['ROLE_NAME'] = role_name

resource_data = load_resource(
"role_no_description",
additional_replacements=replacements,
)

ref = k8s.CustomResourceReference(
CRD_GROUP, CRD_VERSION, "roles",
role_name, namespace="default",
)
k8s.create_custom_resource(ref, resource_data)
cr = k8s.wait_resource_consumed_by_controller(ref)

role.wait_until_exists(role_name)

assert cr is not None
assert k8s.get_resource_exists(ref)

yield (ref, cr)

_, deleted = k8s.delete_custom_resource(
ref,
period_length=DELETE_WAIT_SECONDS,
)
assert deleted

role.wait_until_deleted(role_name)

@service_marker
@pytest.mark.canary
class TestRole:
def test_role_empty_description(self, role_with_no_description):
ref, res = role_with_no_description
role_name = ref.name

time.sleep(CREATE_WAIT_SECONDS)
condition.assert_synced(ref)
condition.assert_type_status(
ref,
cond_type_match=condition.CONDITION_TYPE_LATE_INITIALIZED,
cond_status_match=True,
)

updates = {
"spec": {
"description": "non empty description",
},
}
k8s.patch_custom_resource(ref, updates)
time.sleep(MODIFY_WAIT_SECONDS)
condition.assert_synced(ref)

latest = role.get(role_name)
assert latest is not None
assert latest["Description"] == "non empty description"

def test_policy_empty_description(self, policy_with_no_description):
ref, res = policy_with_no_description

time.sleep(CREATE_WAIT_SECONDS)
condition.assert_synced(ref)
condition.assert_type_status(
ref,
cond_type_match=condition.CONDITION_TYPE_LATE_INITIALIZED,
cond_status_match=True,
)

4 changes: 4 additions & 0 deletions test/e2e/tests/test_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ def test_crud(self, simple_policy):
k8s.patch_custom_resource(ref, updates)
time.sleep(MODIFY_WAIT_AFTER_SECONDS)

condition.assert_synced(ref)

latest_tags = policy.get_tags(policy_arn)
after_update_expected_tags = [
{
Expand All @@ -129,6 +131,8 @@ def test_crud(self, simple_policy):
k8s.patch_custom_resource(ref, updates)
time.sleep(MODIFY_WAIT_AFTER_SECONDS)

condition.assert_synced(ref)

latest_tags = policy.get_tags(policy_arn)
after_update_expected_tags = [
{
Expand Down
Loading

0 comments on commit ae12460

Please sign in to comment.