Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to skip upgrade if no changes detected #379

Merged
merged 33 commits into from
Aug 27, 2024

Conversation

tylermaginnis
Copy link
Contributor

@tylermaginnis tylermaginnis commented Aug 25, 2024

Description

This PR introduces a feature to skip the Helm upgrade if no changes are detected. This is achieved by evaluating the detailed exit code returned by the helm diff command. The exit codes are interpreted as follows:

Revised this enum for accuracy on 8.26

  1. No changes detected.
  2. Errors encountered.
  3. Changes detected.

This feature addresses the need to conditionally trigger Helm deployments based on whether there are actual changes, as discussed in Issue #54 - helm-diff.
.
Changes Made
Added the --detailed-exitcode flag to the helm diff command.
Evaluated the exit code to determine if changes were detected or if errors were encountered.
Updated the diff function to return a DiffResult with the appropriate exit code.

Testing

Ensure that the helm diff command returns the correct exit codes.
Verify that the upgrade is skipped if no changes are detected.
Confirm that changes are applied if the exit code indicates changes.
Handle errors appropriately based on the exit code.

References
Issue #54 - helm-diff
Issue #4 - helm-sops

@tylermaginnis
Copy link
Contributor Author

tylermaginnis commented Aug 25, 2024

FYI, I notice a Test is failing. Implementing --detailed-exitcodes causes the helm-diff process to write to stderr, which causes an application exception to bubble up. Exit codes are usually reserved (0 for success, non-zero for failure), but here exit code 2 is also reserved for changes detected).

@tylermaginnis
Copy link
Contributor Author

Or... Is that a linting error because I prefixed exit_code with _ to suppress warnings...

@tylermaginnis
Copy link
Contributor Author

Ah... Lemme run a test cycle against the fmt command and see what issues I caused. My changes did touch the worker/thread/job_result pattern.

@tylermaginnis
Copy link
Contributor Author

Updated the code formatting according to CI/CD checks.

@tylermaginnis
Copy link
Contributor Author

tylermaginnis commented Aug 25, 2024

Ah, I thought this could fail here on the exit codes, since this PR... stderr is output from the rust-diff plugin when rust-diff is called with --detailed-exitcodes. Technically, this is expected behavior, because E101 (Panic) is the code returned by Rust applications when any stderr is present.

How would you like me to fix this?

A) Revise the test case to accept E101 as the valid result.
B) Implement E0 (OK) when E0 or E2 (no-diff detected/diff-detected), but propogate E1 (other exeption) as E101.

Copy link
Collaborator

@brianmay brianmay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I have some feedback, please correct me if I misunderstood anything.

src/helm.rs Outdated
@@ -130,6 +132,7 @@ impl HelmResult {
installation: installation.clone(),
result,
command,
_exit_code: 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clippy errors are due to prefixing the exit_code with '', maybe remove the prefix?

Copy link
Contributor Author

@tylermaginnis tylermaginnis Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the prefix initially to resolve warnings, but it caused errors. Is there a difference between our rust versions?

$ rustc --version rustc 1.76.0 (07dca489a 2024-02-04)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the rust 1.80.0 for development, and looks like the CI is using the latest stable version, which I would assume should be 1.80.0 also.

src/helm.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
@brianmay
Copy link
Collaborator

brianmay commented Aug 25, 2024

It is perhaps worth noting that occasionally helm-diff will occasionally get things wrong, e.g. it might say that there are no changes, when in actual fact there are changes.

I think this typically might happen if a user has manually edited a value. helm-diff will compare against the previous chart install and not notice anything ha changed. It will say that they are no changes, but the upgrade will reset the value to the helm chart value.

I think it is worth this change regardless, but do need to be aware of its limitations.

Maybe we need a way to bypass the check, in case we need to manually force a change through.

I assume you mean the helm diff plugin, not rust diff plugin. Not sure I understand what you are getting at, the only errors I see are due to the name of the _exit_code value.

@tylermaginnis
Copy link
Contributor Author

tylermaginnis commented Aug 26, 2024

It is perhaps worth noting that occasionally helm-diff will occasionally get things wrong, e.g. it might say that there are no changes, when in actual fact there are changes.

I think this typically might happen if a user has manually edited a value. helm-diff will compare against the previous chart install and not notice anything ha changed. It will say that they are no changes, but the upgrade will reset the value to the helm chart value.

I think it is worth this change regardless, but do need to be aware of its limitations.

Maybe we need a way to bypass the check, in case we need to manually force a change through.

I assume you mean the helm diff plugin, not rust diff plugin. Not sure I understand what you are getting at, the only errors I see are due to the name of the _exit_code value.

Yes, I think you're right. So - I'm not sure what context you run this from (in an automation pipeline, or manually).

Does this tool also need the ability to run autonomously (e.g., if no user input after 10 seconds, do something to continue the script)?

I added two options - if you pass

$ cargo run -- --vdir ./example/helm-values upgrade --bypass-skip-upgrade-on-no-changes

or

$ cargo run -- --vdir ./example/helm-values upgrade -b

This will upgrade everything regardless if DiffResult is NoChanges.

Without this flag, the user will see:

$ No changes detected. Upgrade anyway? (Y/n)

Here, the user can input an empty string, 'Y', 'y', or 'yes', and then it will proceed with the upgrade. Any other entry will result in no upgrade for that job.

Let me know if you think there's anything else we need to include in this branch prior to merging it. As always, I disclaim that this worked perfectly fine in my environment w/ minikube, but I would advise you checking to see how this works for you - as I assume you have more than one namespace or cluster, and I only have 1 in my dev environment.

@brianmay
Copy link
Collaborator

I don't think you have pushed your latest changes yet, so just responding to your comments above.

I think that there should be a way to bypass the prompt, and just assume N. For cases when there is no STDIN, e.g. CI. Maybe we could autotest if there is a STDIN, I generally try to avoid that though.

I am using the rust 1.80.0 for development, and looks like the CI is using the latest stable version, which I would assume should be 1.80.0 also.

The CI should be pinned to 1.80 actually, so it doesn't automagically break when 1.81 is released with new clippy checks, but I have been lazy.

…nchanged

Feature/tm skip upgrade if unchanged + akip upgrade if unchanged and CR
@tylermaginnis
Copy link
Contributor Author

I don't think you have pushed your latest changes yet, so just responding to your comments above.

I think that there should be a way to bypass the prompt, and just assume N. For cases when there is no STDIN, e.g. CI. Maybe we could autotest if there is a STDIN, I generally try to avoid that though.

I am using the rust 1.80.0 for development, and looks like the CI is using the latest stable version, which I would assume should be 1.80.0 also.

The CI should be pinned to 1.80 actually, so it doesn't automagically break when 1.81 is released with new clippy checks, but I have been lazy.

Thanks, I merged my fork's branch to main. Those updates should be reflected here now.

In a few moments, I will add a default behavior for the Upgrade -b subflag that assumes the preferred behavior is "no" and add an optional subflag -b -y for assuming "yes..."

Technically, we can detect no stdin for CI encironments, but we would need to add a new crate (atty).

Let me know if this is preferred over adding additional commandline options.

@brianmay
Copy link
Collaborator

That sounds good to me.

Thanks!

@tylermaginnis
Copy link
Contributor Author

That sounds good to me.

Thanks!

I fixed the format-checks. and I will be back in less than an hour to complete those command line argument extensions we discussed.

I'm trying to get this "completed," but not merged (that's at your discretion) before my interview with a Director of Game Content Engineering @ EA tomorrow.

Best regards,
Tyler

@brianmay
Copy link
Collaborator

Any preference on how to address this?

I will have a look.

@brianmay
Copy link
Collaborator

On one hand don't let perfect be the enemy here. Clippy is suppose to be an aid, not hinder development.

On the other hand, wondering if we can reduce these variables to 1.

e.g. Maybe internally we could have somewhere:

enum BypassControl {
    BypassAndAssumeYes,
    BypassAndAssumeNo,
    Normal,
}

That instance of exit code can be ignored, as per message, with exit_code: _.

That way you only need to pass one value to the functions.

I am terrible with names. Not sure that BypassControl is the best name here. Bypass what? Maybe we need a simple name to call this new feature?

@tylermaginnis
Copy link
Contributor Author

On one hand don't let perfect be the enemy here. Clippy is suppose to be an aid, not hinder development.

On the other hand, wondering if we can reduce these variables to 1.

e.g. Maybe internally we could have somewhere:

enum BypassControl {
    BypassAndAssumeYes,
    BypassAndAssumeNo,
    Normal,
}

That instance of exit code can be ignored, as per message, with exit_code: _.

That way you only need to pass one value to the functions.

I am terrible with names. Not sure that BypassControl is the best name here. Bypass what? Maybe we need a simple name to call this new feature?

For clarity, we can name it:

enum UpgradeControl { BypassAndAssumeYes, BypassAndAssumeNo, Normal, }

I will push this up in the next 5 minutes.

@tylermaginnis
Copy link
Contributor Author

On one hand don't let perfect be the enemy here. Clippy is suppose to be an aid, not hinder development.

On the other hand, wondering if we can reduce these variables to 1.

e.g. Maybe internally we could have somewhere:

enum BypassControl {
    BypassAndAssumeYes,
    BypassAndAssumeNo,
    Normal,
}

That instance of exit code can be ignored, as per message, with exit_code: _.

That way you only need to pass one value to the functions.

I am terrible with names. Not sure that BypassControl is the best name here. Bypass what? Maybe we need a simple name to call this new feature?

Took a little longer than 5 minutes. I added an internal UpgradeControl enum without changing the Args, tested the behavior, ran clippy and cargo fmt linting.

@brianmay
Copy link
Collaborator

One problem is that the diff operation now returns an error if there were differences.

That is I see the message:

aws-load-balancer-controller diff Bad Exit Code 2

Which while understandable why this happens, not great UI to print errors when there are no errors.

Wonder if we need to somehow pass CommandLine a list of acceptable return codes that are not actually errors. Up to now I have had the luxury of assuming that anything other then 0 is an error, which no longer applies.

Or, we could just do this after this PR is merged if you want to do that also.

@tylermaginnis
Copy link
Contributor Author

One problem is that the diff operation now returns an error if there were differences.

That is I see the message:

aws-load-balancer-controller diff Bad Exit Code 2

Which while understandable why this happens, not great UI to print errors when there are no errors.

Wonder if we need to somehow pass CommandLine a list of acceptable return codes that are not actually errors. Up to now I have had the luxury of assuming that anything other then 0 is an error, which no longer applies.

Or, we could just do this after this PR is merged if you want to do that also.

Hm ... Lemme try to come up w/ a quick-fix if you're gonna be online for the next 30 minutes. Controlling the exit codes should be fairly straightforward.

@tylermaginnis
Copy link
Contributor Author

tylermaginnis commented Aug 27, 2024

One problem is that the diff operation now returns an error if there were differences.

That is I see the message:

aws-load-balancer-controller diff Bad Exit Code 2

Which while understandable why this happens, not great UI to print errors when there are no errors.

Wonder if we need to somehow pass CommandLine a list of acceptable return codes that are not actually errors. Up to now I have had the luxury of assuming that anything other then 0 is an error, which no longer applies.

Or, we could just do this after this PR is merged if you want to do that also.

        let kind = match output {
            Err(err) => Err(CommandErrorKind::FailedToStart { err }),
            Ok(output) => {
                if output.status.success() || exit_code == 2 {
                    Ok(())
                } else {
                    Err(CommandErrorKind::BadExitCode {})
                }
            }
        };

Now, we won't hit the BadExitCode case for E2.

@brianmay
Copy link
Collaborator

Just tested it myself:

  • It seems that the upgrade is not running even though there are differences. Used -bn. Might be somehow connected with the error exit, although I am not seeing it. Might try debugging this a bit more.

  • The debug! macro is printing stuff directly to the screen. Instead of being intercepted and sent to the Output. Which interferes with the full screen tui output (when --output=tui). I suspect this might be a pre-existing bug, in which case don't worry about it.

  • Prompting for questions is a problem. There are two issues I see here. 1. It doesn't play nicely with the Output module. 2. It involves blocking an async thread, which is generally considered bad.

@tylermaginnis
Copy link
Contributor Author

Also -- went through and selectively suppressed the single remaining clippy warning for exit_code in text.rs.

This should not effect other unused variables that are introduced later.

@tylermaginnis
Copy link
Contributor Author

  • It seems that the upgrade is not running even though there are differences. Used -bn. Might be somehow connected with the error exit, although I am not seeing it. Might try debugging this a bit more.

Working on this.

@tylermaginnis
Copy link
Contributor Author

tylermaginnis commented Aug 27, 2024

  • It seems that the upgrade is not running even though there are differences. Used -bn. Might be somehow connected with the error exit, although I am not seeing it. Might try debugging this a bit more.

So -- the way this is setup now: -by will run the upgrades, and -bn will not run the upgrades.

I know these bools can get a bit counter-intuitive, but the logic here is:

-by (bypass skip upgrades and upgrade anyway if no changes)
-bn (bypass skip upgrades and never upgrade if no changes)

Taking a closer look at the -- changes -- scenario.

You're falling into this case:

                      UpgradeControl::BypassAndAssumeNo => {
                            // Do nothing, as bypass_assume_no is an implicit case
                        }

The behavior you want is in this case:

                        UpgradeControl::BypassAndAssumeYes => {
                            helm::upgrade(installation, helm_repos, tx, true).await?;
                            helm::upgrade(installation, helm_repos, tx, false).await?;
                        }

@brianmay
Copy link
Collaborator

Disregard the first bullet point, I stuffed up. I was running the "diff" command not the "upgrade" command which I thought I was. This bit works as expected, when I actually run it correctly.

messages are written to avoid interfering with TUI mode
…unchanged

Remove UI prompt intererfered with TUI. Alter the way specific debug
@tylermaginnis
Copy link
Contributor Author

Disregard the first bullet point, I stuffed up. I was running the "diff" command not the "upgrade" command which I thought I was. This bit works as expected, when I actually run it correctly.

Cool, I didn't make any changes regarding that.

I did remove the text inputs, as it was blocking the TUI, and merged a default case (-b and -bn are now the same, but -by will do installs). I also changed the way those debug messages were written to the TUI so now they're not interfering with the output.

@brianmay
Copy link
Collaborator

Good.

I do have code to intercept the debug! macro, but currently it only works on the first/main task. And I have mixed feelings about, I tend to feel that maybe this is the wrong tool for this task, and is excessively complicated. IMHO: Tracing is about saving information for the helmci developer, not displaying it to the end user. I probably should delete it, and do something more like what you have done.

Will retest.

@brianmay
Copy link
Collaborator

Not a bug here, just an observation: Same helm charts will autogenerate certificates for the webhook (e.g. aws-load-balancer-controller), and as such helm-diff will always show changes.

I suspect that this might be a "me" problem, I probably need to pass the helm chart a static values. Or something.

@brianmay
Copy link
Collaborator

Everything currently looking pretty good.

@tylermaginnis
Copy link
Contributor Author

Not a bug here, just an observation: Same helm charts will autogenerate certificates for the webhook (e.g. aws-load-balancer-controller), and as such helm-diff will always show changes.

I suspect that this might be a "me" problem, I probably need to pass the helm chart a static values. Or something.

Hm... I would suggest managing those autogenerated certs in kubernetes secrets, so the helm chart will ref the kube secret values and never change. Not exactly sure how your charts are setup.

@brianmay
Copy link
Collaborator

At the moment we have secrets in a values.secrets files (which helmci reads), which is encrypted by transcrypt. https://github.com/elasticdog/transcrypt/

But transcrypt has not been updated in ages, can be annoying at times, and probably has security issues. So I implemented support for helm-secrets in helmci, which will be better.

Will take a 30 minute break, and if I haven't thought of anything by then merge this PR.

Thanks again for your contribution.

@brianmay brianmay merged commit 2f66a84 into electronicarts:main Aug 27, 2024
1 check passed
@brianmay
Copy link
Collaborator

I made some minor changes. CommandLine now supports allowed_exit_codes, so list of success exit codes can by customized for each invocation, and we now print the exit codes for each command result.

brianmay pushed a commit that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants