Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add encryption to remote_state #3586

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

norman-zon
Copy link

add the most basic and naive implementation of an encryption attribute to the remote_state block, to generate encryption config for the backend

Description

Fixes #3495.

This PR adds an attribute encryption to the remote_state block that generates a encryption block in the backend config. This allow to use the OpenTofu state encryption directly from within terragrunt, thus keeping encryption config DRY.

TODOs

Read the Gruntwork contribution guidelines.

  • Update the docs.
  • Run the relevant tests successfully, including pre-commit checks.
  • Ensure any 3rd party code adheres with our license policy or delete this line if its not applicable.
  • Include release notes. If this PR is backward incompatible, include a migration guide.

Release Notes (draft)

Added / Removed / Updated [X].

Migration Guide

@norman-zon norman-zon mentioned this pull request Nov 21, 2024
4 tasks
@yhakbar
Copy link
Collaborator

yhakbar commented Nov 22, 2024

First: Great job, @norman-zon ! This is looking good!

If we're going to have another attribute on the remote_state, why not have it be part of the config? The reason I was saying that the most straightforward approach to this was to have an extra configuration block for remote_state was to support multiple encryption blocks like it is in the terraform.backend block.

Given all that, would you be willing to adjust the implementation so that it's either a configuration block or an attribute that's a part of config? If not, is there a reason that the current approach is better?

@norman-zon
Copy link
Author

@yhakbar thanks for the kind words.

I did not nest encryption in the config block, because it seemed to me this would make it necessary to split the config, to clarify it creates two different terraform blocks .
So this would lead to:

remote_state {
  backend = "xxx"
  config = {
    bucket {
      name = mybucket
      prefix = ....
    }
    encryption {
       key_provider = ....
    }
}

Otherwise mixing up attributes for the bucket (backend block) and encryption (encryption block) in config seems confusing to me. But this would make it necessary to refactor the whole config block and be a breaking change.

Does this make sense to you?

@yhakbar
Copy link
Collaborator

yhakbar commented Nov 22, 2024

@yhakbar thanks for the kind words.

I did not nest encryption in the config block, because it seemed to me this would make it necessary to split the config, to clarify it creates two different terraform blocks .
...
Does this make sense to you?

Can you confirm that the configuration has to create two terraform blocks? Is it the case that the following is invalid OpenTofu configuration?

terraform {
  backend "s3" {
    bucket         = "bucket"
    key            = "something/tf.tfstate"
  }
  encryption {
    key_provider "test" "default" {
  }
}

I would expect that I would prefer for them to be configured together, but I haven't used this feature, so I'm speaking out of ignorance.

@norman-zon
Copy link
Author

Sorry, I think I was not clear enough. We don't need to have two different terraform blocks, but two different blocks inside terraform.

Can you confirm that the configuration has to create two terraform blocks? Is it the case that the following is invalid OpenTofu configuration?

terraform {
  backend "s3" {
    bucket         = "bucket"
    key            = "something/tf.tfstate"
  }
  encryption {
    key_provider "test" "default" {
  }
}

This is valid OpentTofu config.

For me it would be counter-intuitive, to have the remote_state.config block, which as of now is de-facto translated into a backend tofu/terraform block mixed up with attributes that have to end up in the encryption tofu/terraform block.

For example we would have something like:

remote_state {
  backend = "xxx"
  config = {
    bucket = "my-bucket"
    key_provider = "pbkdf2"
  }

So one config block in terragrunt is the source for two different block types in tofu/terraform. I would rather separate this.

Does this clear up my rationale?

@norman-zon norman-zon marked this pull request as ready for review December 3, 2024 13:55
@norman-zon norman-zon force-pushed the 3495-state-encryption branch from 41e4fc1 to b9c07cd Compare December 4, 2024 12:47
@norman-zon
Copy link
Author

@yhakbar , could you take a look at this again? I would love to wrap it up.

@yhakbar
Copy link
Collaborator

yhakbar commented Dec 4, 2024

Hey @norman-zon ,

Sorry I haven't had a ton of time to devote to reviewing this. I'll try my best to install this version locally and play with it sometime soon to be able to speak to whether it provides a good experience for a Terragrunt user.

My delay is partially due to the fact that I'm inexperienced with the OpenTofu feature.

@yhakbar yhakbar self-assigned this Dec 6, 2024
@yhakbar
Copy link
Collaborator

yhakbar commented Dec 6, 2024

Hey @norman-zon ,

First, I'm taking a deeper look at this PR, and I want to say great work! This is really nice. It definitely looks like you're on the right path, and I'm gonna be trying it out locally to make sure it does work in a fixture I create.

Here are there the main issues that I'm seeing that would prevent this from being merged:

  1. There is/are no added fixture(s) for this behavior. It really helps to have valid implementation of a new feature as a fixture to demonstrate that it works, and to explain how it can be used.

  2. There is/are not added integration test(s) for this behavior. It really helps to have integration tests for new features so that we know no regression is happening in future updates.

  3. Lints are failing. It's important to ensure lints pass so that we know we're keeping code quality at a manageable level. I know you said in your issue that you're not super familiar with Golang, but golangci-lint is the linter for Golang, and when adding the relevant plugin for your code editor (see this), you'll get live feedback as you edit to make sure you pass the lints.

    These are the failures we're seeing in CI:

    codegen/generate.go:295:3: commentFormatting: put a space between `//` and comment text (gocritic)
                    //extract key_provider first to create key_provider block
                    ^
    remote/remote_state.go:244:52: non-wrapping format verb for fmt.Errorf. Use `%w` to format errors (errorlint)
                    return fmt.Errorf("error creating provider: %v", err)
                                                                     ^
    remote/remote_state.go:254:57: non-wrapping format verb for fmt.Errorf. Use `%w` to format errors (errorlint)
                    return fmt.Errorf("error decoding struct to map: %v", err)
                                                                          ^
    remote/remote_encryption.go:28:50: the given struct should be annotated with the `mapstructure` tag (musttag)
            if err := mapstructure.Decode(encryptionConfig, &b); err != nil {
                                                            ^
    remote/remote_encryption_test.go:10:1: Function TestUnmarshalConfig missing the call to method parallel (paralleltest)
    func TestUnmarshalConfig(t *testing.T) {
    ^
    remote/remote_encryption_test.go:108:2: Range statement for test TestUnmarshalConfig missing the call to method parallel in test Run (paralleltest)
            for _, tt := range tests {
            ^
    remote/remote_encryption_test.go:124:1: Function TestToMap missing the call to method parallel (paralleltest)
    func TestToMap(t *testing.T) {
    ^
    remote/remote_encryption_test.go:185:2: Range statement for test TestToMap missing the call to method parallel in test Run (paralleltest)
            for _, tt := range tests {
            ^
    codegen/generate.go:298:16: fmt.Errorf can be replaced with errors.New (perfsprint)
                            return nil, fmt.Errorf(encryptionKeyProviderKey + " is mandatory but not found in the encryption map")
                                        ^
    remote/remote_encryption.go:81:2: ST1003: struct field KmsKeyId should be KmsKeyID (stylecheck)
            KmsKeyId                        int    `mapstructure:"kms_key_id"`
            ^
    remote/remote_encryption_test.go:201:5: require-error: for error assertions use require (testifylint)
                                    assert.NoError(t, err)
                                    ^
    codegen/generate.go:300:3: assignments should only be cuddled with other assignments (wsl)
                    keyProviderTraversal := hcl.Traversal{
                    ^
    codegen/generate.go:344:4: assignments should only be cuddled with other assignments (wsl)
                            ctyVal, err := convertValue(encryption[key])
                            ^
    codegen/generate.go:345:4: only one cuddle assignment allowed before if statement (wsl)
                            if err != nil {
                            ^
    codegen/generate.go:348:4: if statements should only be cuddled with assignments (wsl)
                            if keyProviderBlockBody != nil {
                            ^
    remote/remote_encryption.go:38:2: only one cuddle assignment allowed before if statement (wsl)
            if err != nil {
            ^
    remote/remote_encryption.go:41:2: if statements should only be cuddled with assignments (wsl)
            if err := decoder.Decode(encryptionConfig); err != nil {
            ^
    remote/remote_encryption.go:51:2: only one cuddle assignment allowed before if statement (wsl)
            if err != nil {
            ^
    remote/remote_encryption.go:54:2: return statements should not be cuddled if block has more than two lines (wsl)
            return result, nil
            ^
    config/config.go:288:3: assignments should only be cuddled with other assignments (wsl)
                    config.Encryption = remoteStateEncryption
                    ^
    
  4. Current integration tests are failing. It can be hard to run all the tests locally, especially if you don't have an active AWS subscription, so I'll give you the names of the failing tests. You can run only the specific failing test by running go test -run 'NAME OF TEST' ./.... If you can't have the test pass locally because of stuff missing in the cloud or something, don't worry about it. Just know why the test is failing, and that it's not failing because of a panic or anything, and we can re-run the tests in CI to see if they pass there.

    Failing Tests:

    • TestReadTerragruntAuthProviderCmdRemoteState
    • TestDependenciesOptimisation
    • TestRenderJSONConfig
    • TestDependencyMockOutputMergeWithStateDefault
    • TestDependencyMockOutputMergeWithStateFalse
    • TestOrderedMapOutputRegressions1102
    • TestDependencyMockOutputMergeStrategyWithStateDeepMapOnly
    • TestDependencyMockOutputMergeStrategyWithStateCompatFalse
    • TestDependencyMockOutputMergeStrategyWithStateCompatTrue
    • TestDependencyMockOutputMergeStrategyWithStateNoMerge
    • TestDependencyMockOutputMergeStrategyWithStateCompatConflict
    • TestDependencyMockOutputMergeStrategyWithStateDefault
    • TestDependencyMockOutputMergeStrategyWithStateShallow
    • TestTerragruntRemoteStateCodegenGeneratesBackendBlock
    • TestDependencyMockOutputMergeWithStateNoOverride
    • TestDependencyMockOutputMergeWithStateTrueNotAllowed
    • TestTerragruntRemoteStateCodegenDoesNotGenerateWithSkip
    • TestDependencyMockOutputMergeWithStateTrue
    • TestTerragruntRemoteStateCodegenErrorsIfExists
    • TestTerragruntRemoteStateCodegenOverwrites
  5. Docs aren't updated. Features that aren't documented aren't used, so we need to make sure this is well documented for folks to take advantage of it!

I know that's a lot of feedback, and I hope that's not intimidating. Tag me again if you encounter any issue you can't overcome or want another look, and I'm happy to help. Feel free to reach out in Discord too if you need a quick question answered on Golang development, etc. You might also be able to get more eyes on this from other members of the community to help you out.

@norman-zon norman-zon marked this pull request as draft December 10, 2024 16:06
@norman-zon norman-zon force-pushed the 3495-state-encryption branch 3 times, most recently from 3d2711d to f0c0e4e Compare December 12, 2024 15:23
add the most basic and naive implementation of an encryption attribute
to the remote_state block, to generate encryption config for the backend
add strict typing to enforce usage of attributes for the encryption
config that are recongnized by tofu to avoid defering error handling to
them
@norman-zon norman-zon force-pushed the 3495-state-encryption branch from f0c0e4e to 0334462 Compare December 13, 2024 11:37
@norman-zon norman-zon marked this pull request as ready for review December 13, 2024 11:38
@norman-zon
Copy link
Author

@yhakbar
Thank you for the detailed feedback, this helped a lot.
I incorporated all requested changes.

For the integration tests to work, you would have to create a GCP and AWS KMS key and add them here.

As for how to test the encryption, checking for the encrypted_data values to be base64 was the best I could come up with.

The documentation is a bit scarce, I know. But it felt like everything is explained in the Tofu docs already. Again on the topic of this being a OpenTofu only feature, I was not sure how to indicate this best.

@norman-zon norman-zon changed the title [WIP]feat: add encryption to remote_state feat: add encryption to remote_state Dec 13, 2024
@yhakbar
Copy link
Collaborator

yhakbar commented Dec 13, 2024

Hey @norman-zon ,

This is looking really good! Well done, adding all of this.

Do you mind having the AWS key depend on this?

- arn: arn:aws:kms:us-east-1:087285199408:key/bd372994-d969-464a-a261-6cc850c58a92

We already setup a test key to support integrations with AWS KMS. I'm syncing up with the team on the GCP side of things. We have a GCP project we put aside for Terragrunt integration tests that is used for testing remote state storage, but I don't think we leave around a test key for GCP KMS testing.

To get this moving along, I would actually ask that you just skip the GCP tests for now (t.Skip), with a message stating that you need to setup your own key for it to work right, so it's skipped by default. We'll make another ticket to circle back to this and add integration tests for it.

As for how to test it properly, I think you're doing the right thing. As long as you verify that the state file does indeed have base64 encoded values, and OpenTofu is continuing to work, we should be fine.

I'm gonna read through things more carefully, so you might get more feedback soon.

Copy link
Collaborator

@yhakbar yhakbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits on the docs. Once you update the AWS KMS key and add the skip for GCP tests, I'll run the tests locally, then approve.


encryption = {
key_provider = "pbkdf2"
passphrase = "SUPERSECRETPASSPHRASE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend documenting this with get_env here, as even if you were going for passphrase based authentication, you have no need to store the secret in plain text.

Make sure the HCL is formatted too (put some space after passphrase)!

}
encryption {
key_provider "pbkdf2" "default" {
passphrase = "SUPERSECRETPASSPHRASE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this in plain text just to demonstrate the point.

@norman-zon
Copy link
Author

norman-zon commented Dec 18, 2024

Thanks again for the feedback.

I updated the PR.

Unfortunately I cannot get AWS KMS to work. I added the key you mentioned but get this error:

=== NAME  TestTofuStateEncryptionAWSKMS
    integration_tofu_state_encryption_test.go:70: Failed to run Terragrunt command 'terragrunt apply -auto-approve --terragrunt-non-interactive --terragrunt-working-dir /var/folders/x5/ch39pzb94s503sxr04shwxbjmnk0d6/T/terragrunt-test4001692128/fixtures/tofu-state-encryption/aws-kms' due to error: util.ProcessExecutionError Failed to execute "tofu init" in /var/folders/x5/ch39pzb94s503sxr04shwxbjmnk0d6/T/terragrunt-test4001692128/fixtures/tofu-state-encryption/aws-kms
        ╷
        │ Error: Unable to build encryption key data
        │
        │ key_provider.aws_kms.default failed with error: errors were encountered in
        │ aws kms configuration
        │ No valid credential sources found : Please see
        │ https://opentofu.org/docs/language/settings/backends/s3
        │ for more information about providing credentials.
        │
        │ Error: failed to refresh cached credentials, no EC2 IMDS role found,
        │ operation error ec2imds: GetMetadata, exceeded maximum number of attempts,
        │ 3, request send failed, Get
        │ "http://169.254.169.254/latest/meta-data/iam/security-credentials/": dial
        │ tcp 169.254.169.254:80: connect: host is down
        │

Maybe someone with more AWS experience can chip in here?

EDIT: Of course this was just me not having permissions on the AWS KMS key. I tested with a key I created in my own org and it worked. So it should work like it is in your CI

@norman-zon norman-zon force-pushed the 3495-state-encryption branch from 36caf96 to 0558538 Compare December 18, 2024 08:59
@yhakbar
Copy link
Collaborator

yhakbar commented Dec 20, 2024

Hey @norman-zon , great work! I feel like we're almost there!

I see failing unit tests here:

  • TestUnmarshalConfig/AWSKMS_invalid_config
  • TestUnmarshalConfig/AWSKMS_full_config
  • TestRemoteStateAsCtyDrift

If you can fix them soon, feel free to do so, otherwise, I'll see if I can chain a PR into this before the end of the day ET to fix the tests so that we can merge your PR in. We are on company holiday starting next week, so we might not be able to merge this until next year otherwise.

@norman-zon
Copy link
Author

I was on holiday myself, so no worries. The last 3 tests are fixed now. Thanks for your ongoing support and patience!

@yhakbar
Copy link
Collaborator

yhakbar commented Jan 9, 2025

I owe you a re-review on this. Will look to get to this soon, @norman-zon !

@yhakbar
Copy link
Collaborator

yhakbar commented Jan 9, 2025

Seeing failures that do seem related to the changes you were trying to effect earlier:

    remote_encryption_test.go:224: failed to unmarshal config: failed to decode key provider: 1 error(s) decoding:
        
        * 'kms_key_id' expected type 'string', got unconvertible type 'int', value: '123456789'
=== CONT  TestDiffersFrom/gcs_null_values_ignored
�[0;90m15:04:43.193�[0m �[0;94mDEBUG �[0m Backend gcs has not changed.
=== CONT  TestToMap/PBKDF2_partial_config
=== CONT  TestToMap/GCPKMS_full_config
--- FAIL: TestToMap (0.00s)
    --- PASS: TestToMap/PBKDF2_full_config (0.00s)
    --- FAIL: TestToMap/AWSKMS_full_config (0.00s)
    --- PASS: TestToMap/PBKDF2_partial_config (0.00s)
    --- PASS: TestToMap/GCPKMS_full_config (0.00s)
=== NAME  TestReadTerragruntConfigFull
    integration_test.go:2308: 
        	Error Trace:	/home/circleci/project/test/integration_test.go:2308
        	Error:      	Not equal: 
        	            	expected: map[string]interface {}{"backend":"local", "config":map[string]interface {}{"path":"foo.tfstate"}, "disable_dependency_optimization":false, "disable_init":false, "generate":map[string]interface {}{"if_exists":"overwrite_terragrunt", "path":"backend.tf"}}
        	            	actual  : map[string]interface {}{"backend":"local", "config":map[string]interface {}{"path":"foo.tfstate"}, "disable_dependency_optimization":false, "disable_init":false, "encryption":map[string]interface {}{"path":"foo.tfstate"}, "generate":map[string]interface {}{"if_exists":"overwrite_terragrunt", "path":"backend.tf"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-(map[string]interface {}) (len=5) {
        	            	+(map[string]interface {}) (len=6) {
        	            	  (string) (len=7) "backend": (string) (len=5) "local",
        	            	@@ -7,2 +7,5 @@
        	            	  (string) (len=12) "disable_init": (bool) false,
        	            	+ (string) (len=10) "encryption": (map[string]interface {}) (len=1) {
        	            	+  (string) (len=4) "path": (string) (len=11) "foo.tfstate"
        	            	+ },
        	            	  (string) (len=8) "generate": (map[string]interface {}) (len=2) {
        	Test:       	TestReadTerragruntConfigFull
--- FAIL: TestReadTerragruntConfigFull (3.20s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initalizing dependencies fails when state is encrypted
2 participants