Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding duplicate uploads #696

Open
Pingou opened this issue Aug 21, 2024 · 6 comments
Open

Avoiding duplicate uploads #696

Pingou opened this issue Aug 21, 2024 · 6 comments

Comments

@Pingou
Copy link

Pingou commented Aug 21, 2024

I modified generate_location to handle duplicates of file uploaded so I only ever have one version of a file, if different users upload the same file.
It is working but the issue I am having is that I return the path of the existing file from generate_location, and shrine will copy the file that is currently uploaded to that path, basically replacing the file by the exact same file. It wouldn't be such an issue usually, but the problem is that if there is some issue, for example server restart or whatever, then the file becomes broken for everyone.
Is there a better way to handle duplicates?

@benkoshy
Copy link
Contributor

benkoshy commented Aug 22, 2024

This is how I might handle the problem - it is untested and just my two cents . I will let the community correct me or advise with better ideas..

Handle via Background job

make it "idempotent" (it's not actually idempotent), but make it "repeatable" without ill-effects.

  1. Where two files are the same. e.g. Both files are stored in the "avatar" field of the User model.
  2. Update the avatar of one User with the avatar of the other user. If the operation fails (for whatever reason), then simply redo the operation.
  3. Then delete the uneeded file.

They key is to allow it to be repeatable without any unintended effects. Look up 'idempotent background jobs' for more info on your search engine of preference.

@Pingou
Copy link
Author

Pingou commented Aug 22, 2024

Thank you @benkoshy , that seems like a good idea but my issue is that the client will have stored the wrong url. I guess I could delete the file and replace it by a symlink to the other file, but that doesn't sound optimal.
Also, performance wise, I'd love to avoid writing down another file.

@benkoshy
Copy link
Contributor

Same issue: #695

Or the alternative is to search for the file and return it if it exists.

that seems like a good idea but my issue is that the client will have stored the wrong url.

Where and how is the client saving the URL? pls show ur code. Also it might be beneficial to close the "Issue" and open it up in the discussion section, to protect the noise from the maintainer.

@Pingou
Copy link
Author

Pingou commented Aug 28, 2024

Sorry for the delay, thank you for the reply @benkoshy. I am already returning the file (or rather the path to the existing file), my problem is that it is overwriting it again. I cannot show the client part, but I do not think it changes anything. I can close the issue, but I feel like this is something that could be useful to some other people, if the functionality is missing.

@benkoshy
Copy link
Contributor

benkoshy commented Aug 28, 2024

Sorry for the delay, thank you for the reply @benkoshy. I am already returning the file (or rather the path to the existing file), my problem is that it is overwriting it again. I cannot show the client part, but I do not think it changes anything. I can close the issue, but I feel like this is something that could be useful to some other people, if the functionality is missing.

Demo with pseudo code if you cannot show the actual?

It may help conceptualize the problem plus allow for someone to suggest solutions.

@Pingou
Copy link
Author

Pingou commented Aug 28, 2024

I cannot change the client code, I need it to be a server side solution. The client just upload files, and some of them are already stored by other users, and I want to save space on the server. Once the client got the response from the server, the file location is stored client side and cannot be updated, and may be necessary in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants