Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud: allow to call S3 cleanup via defer #1145

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lzap
Copy link
Contributor

@lzap lzap commented Jan 15, 2025

I noticed when upload or registration of AMI fails, the image is left on the S3 bucket forever. To fix this, we need to be able to use defer to delete the object even if an error occurs. Therefore new Cleanup function is introduced doing exactly that. The process still cleans the S3 bucket, but we can do this in composer/bib now:

// call cleanup and ignore possible error
defer a.Cleanup(name, key, bucket)

// do upload and processing

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have many thoughts here but some questions first:

You write that this needed for ibcli/bib, I looked at bib and it does "AWS.Upload()" and then "aws.Register()" . Looking at Register() it seems it does its own cleanup. Should Upload() just do the same? I.e. I wonder if the user of the library should have the burden of the cleanup or if we can handle it internally for the user (which is less for the library user to think about).

I'm getting at that it would be nice to get to some transaction API (if we need the user to handle the cleanup) that is similar to the rest of go, e.g.:

resource, err := aws.NewTransactionOrWhateverNameWeComeUpWith(bucket, key)
if err != nil {
   return err
}
defer resource.Close()
// or
defer func() { if err != nil { resource.Destroy() } }()

depending on what we exactly need. But first I'm curious to understand what exactly is happening (and where) that leads to the leakage.

// Cleanup removes any temporary resources created during the image registration process.
// Make sure to call this function via defer to ensure cleanup is always performed.
// It will return an error if resources no longer exist or if there was an error deleting them.
func (a *AWS) Cleanup(name, bucket, key string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this "DeleteObject" as this is what it is doing? Why is "name" passed in? AFAICT it is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I saw an interface in bib codebase, I got impression there is an idea to have a common uploader interface at some point. Azure also has a cleanup similar to S3, tho, in that case this is completely done by a caller.

Idiomatic Go is to create an interface called Cleaner with this only method and then clients can use that to perform cleaning. I dontt mind changing this to anything else, but there is Closer in the standard library as well as Reader as well. For DeleteObject we would need to call this Deleter. But what if in the future cleaning involves more steps not just deleting "objects"?

Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the whole flow in cmd/boot-aws/main.go, in doSetup() is useful here.

I think the problem here is that when running this part

	uploadOutput, err := a.Upload(filename, bucketName, keyName)
	if err != nil {
		return fmt.Errorf("Upload() failed: %s", err.Error())
	}

	fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))

	// ... snip ...

	ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
	if err != nil {
		return fmt.Errorf("Register(): %s", err.Error())
	}

there's no indication that AWS.Register() will delete the object at bucketName/keyName (though, it's mentioned in the docstring).

Whether it makes sense to delete the object from the S3 bucket depends on how you look at it:

  • If the purpose of the object is only to become an AMI, there's no reason to keep it around.
  • If you have a disk image in S3 that you might want to turn into an AMI but also maybe reuse, perhaps transfer it to another region to make another AMI, or download it to troubleshoot an error, it's kinda rude to have the API delete it from your bucket.

I think the solution that makes the most sense here is to separate the object deletion from the Register() function and instead leave it up to the caller of Upload() to do what they want with the object. Then, a user of the API can do:

	uploadOutput, err := a.Upload(filename, bucketName, keyName)
	if err != nil {
		return fmt.Errorf("Upload() failed: %s", err.Error())
	}

	fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))
	defer a.DeleteObject(uploadOutput)

	// ... snip ...

	ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
	if err != nil {
		return fmt.Errorf("Register(): %s", err.Error())
	}

or they can choose to

	uploadOutput, err := a.Upload(filename, bucketName, keyName)
	if err != nil {
		return fmt.Errorf("Upload() failed: %s", err.Error())
	}

	fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))

	// ... snip ...

	ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
	if err != nil {
		return fmt.Errorf("Register(): %s", err.Error())
	}
	// Only delete the object after it's finished converting to an AMI.
	// If the conversion/registration fails, keep it for troubleshooting.
	if err := a.DeleteObject(uploadOutput); err != nil {
		...
	}

@lzap
Copy link
Contributor Author

lzap commented Jan 16, 2025

Yes I agree that separating it completely is the best way, but I did not have the guts to do an API breaking change. If we miss some place, S3 files could be actually collecting there for some users.

Franky, the API is still clunky and I see no point why this isn`t a single call, let me rewrite this in idiomatic Go:

type ImporterEC2 {
 bucketName string
 keyName string
 bootMode string
 importRole string
 arch string
 // whatever is needed and must not be part of the interface
}

func (im *importerEC2) Import(name string, image io.Reader) {
 defer cleanup()

 // upload and register
}

This is more safe, easier to use, possibly a common interface can be introduced and finally - really easy to create on top of existing API which can be deprecated. Thoughts?

@mvo5
Copy link
Contributor

mvo5 commented Jan 16, 2025

Just a quick note - I think breaking API to introduce better API is fine (and actually encouraged as we have some non-ideal API in some places) as long as the person also takes responsibility to fix the callers (usually osbuild-composer, bib and now ibcli) I see no problem (and we do it all the time)

@achilleas-k
Copy link
Member

Yes I agree that separating it completely is the best way, but I did not have the guts to do an API breaking change. If we miss some place, S3 files could be actually collecting there for some users.

Franky, the API is still clunky and I see no point why this isn`t a single call, let me rewrite this in idiomatic Go:

type ImporterEC2 {
 bucketName string
 keyName string
 bootMode string
 importRole string
 arch string
 // whatever is needed and must not be part of the interface
}

func (im *importerEC2) Import(name string, image io.Reader) {
 defer cleanup()

 // upload and register
}

This is more safe, easier to use, possibly a common interface can be introduced and finally - really easy to create on top of existing API which can be deprecated. Thoughts?

This is a common enough use case for us that I agree, it would be useful. But I wouldn't deprecate the separate calls, they're always useful for breaking down the steps for more control.

@lzap
Copy link
Contributor Author

lzap commented Jan 16, 2025

This is a common enough use case for us that I agree, it would be useful. But I wouldn't deprecate the separate calls, they're always useful for breaking down the steps for more control.

Well if the old functions do not get removed, I see little to no value in introducing a new API that ensures it. A massive advantage of defer is that it handles all code paths even errors but it must be in a dedicated function.

So I will then refactor it into the @achilleas-k proposal with a little detail that the defer must be executed before upload even begins to cover transmission errors. I do agree this is the cleanest way perhaps, my only concern is that we might miss a callsite and then someone elses S3 bucket will grow.

Remind me all possible callsites?

  • composer
  • bib
  • ?

@achilleas-k
Copy link
Member

This is a common enough use case for us that I agree, it would be useful. But I wouldn't deprecate the separate calls, they're always useful for breaking down the steps for more control.

Well if the old functions do not get removed, I see little to no value in introducing a new API that ensures it. A massive advantage of defer is that it handles all code paths even errors but it must be in a dedicated function.

I think there's value in having a function that does everything for the common case and also having separate functions that let you break it down to do a partial operation or add things in between calls.
In this case, what I'm imagining is a function that does the whole upload, import, etc, which is more or less a copy of the doSetup() function I linked above but without the boot step.
And all that does is call all the existing functions in order and performs the cleanup.

The value of keeping the existing functions (but fixing the cleanup part) is that we might at some point want to upload an image and import it without deleting the S3 object. Or upload an S3 object and just keep it there. Or import and boot an image that's already in S3.

So I will then refactor it into the @achilleas-k proposal with a little detail that the defer must be executed before upload even begins to cover transmission errors. I do agree this is the cleanest way perhaps, my only concern is that we might miss a callsite and then someone elses S3 bucket will grow.

Remind me all possible callsites?

  • composer
  • bib
  • ?

Sadly, this is another case where discussions about where certain bits of code should live stalled and we got stuck with two copies.
osbuild-composer has a separate (and different) awscloud package.

@thozza
Copy link
Member

thozza commented Jan 24, 2025

Sadly, this is another case where discussions about where certain bits of code should live stalled and we got stuck with two copies. osbuild-composer has a separate (and different) awscloud package.

I think that we should really remove it from composer and standardize on images version...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants