-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud: allow to call S3 cleanup via defer #1145
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have many thoughts here but some questions first:
You write that this needed for ibcli/bib, I looked at bib and it does "AWS.Upload()" and then "aws.Register()" . Looking at Register() it seems it does its own cleanup. Should Upload() just do the same? I.e. I wonder if the user of the library should have the burden of the cleanup or if we can handle it internally for the user (which is less for the library user to think about).
I'm getting at that it would be nice to get to some transaction API (if we need the user to handle the cleanup) that is similar to the rest of go, e.g.:
resource, err := aws.NewTransactionOrWhateverNameWeComeUpWith(bucket, key)
if err != nil {
return err
}
defer resource.Close()
// or
defer func() { if err != nil { resource.Destroy() } }()
depending on what we exactly need. But first I'm curious to understand what exactly is happening (and where) that leads to the leakage.
// Cleanup removes any temporary resources created during the image registration process. | ||
// Make sure to call this function via defer to ensure cleanup is always performed. | ||
// It will return an error if resources no longer exist or if there was an error deleting them. | ||
func (a *AWS) Cleanup(name, bucket, key string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this "DeleteObject" as this is what it is doing? Why is "name" passed in? AFAICT it is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I saw an interface in bib codebase, I got impression there is an idea to have a common uploader interface at some point. Azure also has a cleanup similar to S3, tho, in that case this is completely done by a caller.
Idiomatic Go is to create an interface called Cleaner
with this only method and then clients can use that to perform cleaning. I dontt mind changing this to anything else, but there is Closer
in the standard library as well as Reader
as well. For DeleteObject
we would need to call this Deleter
. But what if in the future cleaning involves more steps not just deleting "objects"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the whole flow in cmd/boot-aws/main.go
, in doSetup()
is useful here.
I think the problem here is that when running this part
uploadOutput, err := a.Upload(filename, bucketName, keyName)
if err != nil {
return fmt.Errorf("Upload() failed: %s", err.Error())
}
fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))
// ... snip ...
ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
if err != nil {
return fmt.Errorf("Register(): %s", err.Error())
}
there's no indication that AWS.Register()
will delete the object at bucketName/keyName
(though, it's mentioned in the docstring).
Whether it makes sense to delete the object from the S3 bucket depends on how you look at it:
- If the purpose of the object is only to become an AMI, there's no reason to keep it around.
- If you have a disk image in S3 that you might want to turn into an AMI but also maybe reuse, perhaps transfer it to another region to make another AMI, or download it to troubleshoot an error, it's kinda rude to have the API delete it from your bucket.
I think the solution that makes the most sense here is to separate the object deletion from the Register()
function and instead leave it up to the caller of Upload()
to do what they want with the object. Then, a user of the API can do:
uploadOutput, err := a.Upload(filename, bucketName, keyName)
if err != nil {
return fmt.Errorf("Upload() failed: %s", err.Error())
}
fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))
defer a.DeleteObject(uploadOutput)
// ... snip ...
ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
if err != nil {
return fmt.Errorf("Register(): %s", err.Error())
}
or they can choose to
uploadOutput, err := a.Upload(filename, bucketName, keyName)
if err != nil {
return fmt.Errorf("Upload() failed: %s", err.Error())
}
fmt.Printf("file uploaded to %s\n", aws.StringValue(&uploadOutput.Location))
// ... snip ...
ami, snapshot, err := a.Register(imageName, bucketName, keyName, nil, arch, bootMode, importRole)
if err != nil {
return fmt.Errorf("Register(): %s", err.Error())
}
// Only delete the object after it's finished converting to an AMI.
// If the conversion/registration fails, keep it for troubleshooting.
if err := a.DeleteObject(uploadOutput); err != nil {
...
}
Yes I agree that separating it completely is the best way, but I did not have the guts to do an API breaking change. If we miss some place, S3 files could be actually collecting there for some users. Franky, the API is still clunky and I see no point why this isn`t a single call, let me rewrite this in idiomatic Go:
This is more safe, easier to use, possibly a common interface can be introduced and finally - really easy to create on top of existing API which can be deprecated. Thoughts? |
Just a quick note - I think breaking API to introduce better API is fine (and actually encouraged as we have some non-ideal API in some places) as long as the person also takes responsibility to fix the callers (usually osbuild-composer, bib and now ibcli) I see no problem (and we do it all the time) |
This is a common enough use case for us that I agree, it would be useful. But I wouldn't deprecate the separate calls, they're always useful for breaking down the steps for more control. |
Well if the old functions do not get removed, I see little to no value in introducing a new API that ensures it. A massive advantage of So I will then refactor it into the @achilleas-k proposal with a little detail that the Remind me all possible callsites?
|
I think there's value in having a function that does everything for the common case and also having separate functions that let you break it down to do a partial operation or add things in between calls. The value of keeping the existing functions (but fixing the cleanup part) is that we might at some point want to upload an image and import it without deleting the S3 object. Or upload an S3 object and just keep it there. Or import and boot an image that's already in S3.
Sadly, this is another case where discussions about where certain bits of code should live stalled and we got stuck with two copies. |
I think that we should really remove it from composer and standardize on images version... |
I noticed when upload or registration of AMI fails, the image is left on the S3 bucket forever. To fix this, we need to be able to use
defer
to delete the object even if an error occurs. Therefore new Cleanup function is introduced doing exactly that. The process still cleans the S3 bucket, but we can do this in composer/bib now: