Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to evaluate versioned buckets #51

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

enticedwanderer
Copy link

Summary:

This PR adds the ability for users to list and evaluate all versioned objects in the bucket. The existing behavior was using ListObjectsV2 API which only listed current/latest objects on a versioned bucket. As a result all the metrics did not account for older versions of objects which were present on the bucket and taking up space. For my use case this was unacceptable, as I need to know when I'm approaching the storage limit of my bucket.

In order to address this a few changes were introduced:

  1. Provide a new flag --s3.with-versions which changes the behavior of the API calls and is purely opt in (default is false) which maintains backwards compatibility.
  2. Abstract away the counters into a new ItemAggregator struct which keeps track of statistics and define a separate parallel method on how to evaluate all objects in a bucket using ListObjectVersions API.
  3. Select between 2 different implementations (CountViaListObjectsV2 and CountViaListObjectVersions) the appropriate method based on the flag.
  4. Extend test case semantics to support the new API usage and write additional unit tests to exercise them.

Further things of note:

  1. Right now, the flag is global and not per bucket. This is ok, because ListObjectVersions is backwards compatible with non-versioned buckets and will function just as well with any bucket. Otherwise, we would also have to query the bucket versioning status.
  2. In integ testing, on my S3 Provider (E2 idrive) the ListObjectVersions API performs about 10-15% slower for the same non-versioned bucket. Obviously for a versioned bucket the difference can be much much bigger. This matters if the evaluation time is close to the job timeout in Prometheus (i.e. you have a huge bucket with a lot of objects).

Tested on my account end to end across 3 different buckets (versioned, non-versioned, empty non-versioned).

Happy to consider any changes you want me to make or any other suggestions you might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant