Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically clean up old temporary uploads #19

Open
pasevin opened this issue Dec 12, 2019 · 7 comments
Open

Periodically clean up old temporary uploads #19

pasevin opened this issue Dec 12, 2019 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@pasevin
Copy link
Contributor

pasevin commented Dec 12, 2019

Would be awesome to have this library integrate with Celery to clean up from time to time the stale files in temporary upload folder and from database.

Thanks for your work! I'm almost done with integrating this library in my project :)

@jcohen02
Copy link
Collaborator

Great to hear that you're integrating this into your project. :-)

I agree that periodic cleanup of stale temporary uploads would be good.

Celery could be a good way to do this but I wonder if it may be a little heavyweight? I wonder if perhaps something more lightweight might be easier to integrate and keep the app lighter on large dependencies? For example, maybe django-bakground-tasks or django-periodically could be suitable (I've never used either of these)?

Thanks for your interest in the project and your contributions, @pasevin.

@jcohen02 jcohen02 added the enhancement New feature or request label Dec 14, 2019
@pasevin
Copy link
Contributor Author

pasevin commented Dec 16, 2019

If we're talking about built in feature of clean up, then you're probably right.
I was thinking to do something like this library does: https://github.com/xaralis/django-static-sitemaps#running-as-celery-task

Basically you as a developer have to set up your Celery task runner and then django-drf-filepond just works with it.

Your suggestion is probably better in the way that as a developer I don't have to worry about setting up my task runner, the library would handle that.

@jcohen02
Copy link
Collaborator

I think that was my concern when I mentioned having something more lightweight - the need to be running a task runner which adds another element of complexity and another dependency for the person using django-drf-filepond in their app (if they're not already using Celery, that is).

I guess Celery is going to be more feature-rich and likely to be more robust than the other options I highlighted but I've not used them before so don't know for sure.

I'm intending to investigate the two Django apps mentioned above to see how practical it would be to use these to provide the sort of clean-up functionality you suggested. If they're not looking like the right sort of thing then Celery will be the way to go, I think.

@larsschellhas
Copy link

Instead of using a scheduled cleanup, you could trigger an async cleanup every time a new file is uploaded into the temporary storage. A setting FILEPOND_TEMP_LIFETIME could be added as a datetime.timedelta value and temporary uploads will be cleaned up when they already exist for longer than this lifetime.

This would require no additional dependencies and only run when the storage is actually used. While it will probably run more often on high-load applications than a periodical approach would, it would also run less resource-intensive, since temporary upload won't stockpile for long.

@jcohen02
Copy link
Collaborator

Thanks @larsschellhas, I think this would be a good way to address this longstanding issue without having to introduce additional dependencies or set up cron jobs external to the application.

I guess a further extension to this, to help manage resources on high-load applications, might be to simply store a timestamp when the last cleanup was done and then have something like a TEMPFILE_CLEANUP_MAX_FREQUENCY value that prevents the cleanup task from running again until some time period has passed.

@larsschellhas
Copy link

Agreed, that might be a helpful solution for high-load scenarios, too. However, I'm wondering whether (especially) in high-load scenarios, there is a trade-off between the larger peak loads of lower frequencies vs. smaller peak loads of higher frequencies, @jcohen02

@jcohen02 jcohen02 added this to the v0.6.0 milestone Dec 5, 2022
@DeD1rk
Copy link

DeD1rk commented Apr 23, 2023

While integrating with celery etc. would be cool, I think just providing a management command (and python function) would be flexible enough for a start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants