Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-based cleanup for Maven snapshot versions #33420

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dianaStr7
Copy link
Contributor

Pull Request: Implement Cleanup Function for Maven Snapshot Versions

Overview

This pull request introduces a cleanup function for Maven snapshot versions in Gitea, enabling more efficient management of package storage. The new feature allows users to specify how many of the most recent Maven snapshot builds to retain, optimizing storage by automatically removing older files.

Features

  • New Configurable Variable: RETAIN_MAVEN_SNAPSHOT_BUILDS allows setting the number of Maven snapshot builds to keep.
    • Default is -1, which keeps all builds.
  • Automated Cleanup: Tied to the cleanup expired data process, this function targets files within Maven snapshots, not the versions themselves, ensuring that only essential files are kept.

Implementation

The feature extends the existing cleanup job to include files from Maven snapshot versions based on the specified retention policy set in app.ini. It checks against the highest build number from the Maven metadata file to determine which files to retain.

Impact

This enhancement helps manage disk space more effectively by providing control over how many builds of Maven snapshots are retained, potentially reducing storage requirements for projects using Maven within Gitea.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 27, 2025
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 27, 2025
@github-actions github-actions bot added modifies/go Pull requests that update Go code docs-update-needed The document needs to be updated synchronously labels Jan 27, 2025
@dianaStr7
Copy link
Contributor Author

Considered Alternative Approach

I considered another way to handle Maven snapshot cleanup that uses the properties field for package files to store build numbers right when files are uploaded. This would make database searches simpler.

This method would need a database migration to work, so it's a bit more complicated. I have the code ready for this alternative if we think it's worth the extra steps later on.

@delvh delvh changed the title Added cleanup method for files in Maven snapshot versions Add time-based cleanup for Maven snapshot versions Jan 31, 2025
@delvh
Copy link
Member

delvh commented Jan 31, 2025

Note to any reviewer: Review this PR carefully.
It seems like an LLM at the very least helped with creating this PR.
That in and of itself is nothing bad, as long as the output is indeed correct.
Given my experience with ChatGPT, I find it rather likely that it missed some edge cases though.

@dianaStr7
Copy link
Contributor Author

A remark: Yes, it's true that I used ChatGPT for help since I'm not an experienced Go developer, but I handled the core logic and testing myself. I'd appreciate any comments or critiques to help us improve this, as we really need the feature :)

Note to any reviewer: Review this PR carefully. It seems like an LLM at the very least helped with creating this PR. That in and of itself is nothing bad, as long as the output is indeed correct. Given my experience with ChatGPT, I find it rather likely that it missed some edge cases though.

@@ -109,3 +131,20 @@ func ParsePackageMetaData(r io.Reader) (*Metadata, error) {
Dependencies: dependencies,
}, nil
}

// ParseMavenMetadata parses the Maven metadata XML to extract the build number.
func ParseMavenMetaData(r io.Reader) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thoroughly confused by this method signature and docs: Nothing indicates to me that this method parses the entire XML, and throws the result except for the build number away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the signature and added a new snapshotMetadata type which can be extended later, does it make it more readable?

}
}

log.Info("Filtered %d files out of %d total files for version ID %d with maxBuildNumber %d", len(filteredFiles), len(files), versionID, maxBuildNumber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… I do not like seeing log statements inside models/.
That should be the job of the services/ layer.
Thus, what about adding an extra return value (skippedFiles) that contains all skipped files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks, will do!

}
}

log.Info("Filtered %d files out of %d total files for version ID %d with maxBuildNumber %d", len(filteredFiles), len(files), versionID, maxBuildNumber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info logs are rare within Gitea.
At most, gather statistics inside a success message after successfully running the cron job, or downgrade it to debug.


thresholdBuildNumber := maxBuildNumber - retainBuilds
if thresholdBuildNumber <= 0 {
log.Info("No files to clean up, as the threshold build number is less than or equal to zero for versionID %d", versionID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the error messages and logs in general, I recommend explicitly stating during what sort of action they occurred.
When you actually see these logs, it will be hard to guess that they belong to the Maven snapshot cleanup otherwise, especially when a request happens in the meantime.

return nil
}

func extractMaxBuildNumberFromMetadata(ctx context.Context, metadataFile *packages.PackageFile) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you define this method in models already?

@GiteaBot GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Feb 20, 2025
@delvh delvh self-requested a review March 8, 2025 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-needed The document needs to be updated synchronously lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged modifies/go Pull requests that update Go code size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants