-
-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate hashes of images and include it in image_details field to improve image caching #5238
Comments
Images are already served with all the necessary headers for caching:
|
that caches the image at that unique link. if there are duplicates of the same image, this would not work. if, for instance, the same image is uploaded again by another user (on the same instance or another instance), this wouldn't get the cached image, but make a new request to get the same image, even though the same image with the same hash is available in the image cache because the duplicate has a different link. |
Seems like pictrs could handle this case, maybe via redirects or something on duplicate hashes to the same image. cc @asonix |
well, I guess that would save storage and solve this problem when the duplicate image is on the same instance. what you suggested could probably be another feature request for saving storage as it does not quite achieve what I meant in my feature request. |
Pictrs already deduplicates images if they are identical, although this doesnt seem to be documented. This is only for storage, I believe the api serves full binary data for each duplicate instead of a redirect. Anyway improvements for this should be suggested to pictrs directly. https://git.asonix.dog/asonix/pict-rs https://matrix.to/#/%23pictrs:matrix.asonix.dog?via=matrix.asonix.dog |
No no, that is not what my feature request is about. That is what dessalines suggested. This feature request is for the lemmy clients out there so that image caching across different instances can be possible. For example I (from instance X) see a cat post in a lemmy community and decide to download it and post it in another community. The image I downloaded gets uploaded to my instance and the post gets posted. Now, another user (from instance Y) comes and downloads the cat picture from my post and posts it in another community. It gets uploaded to instance Y. An hour later a lemmy user scrolls through their feed as they see two identical cat pictures in two different lemmy communities. Since there's no hash delivered within the image_details table, the lemming's client fetches the image. The client then proceeds to fetch the second identical cat image from the other lemmy community even though they are the same image, just hosted in different instances. If lemmy's backend included an image hash in the image_details table, the client could've fetched the first identical cat picture from the lemmy community, cached it and then proceeded to load the second identical cat picture in the different lemmy community. Since the previous identical cat picture was fetched and cached, the client can load the second identical cat picture from cache just by comparing the cached images' hashes and the second post's image hash in the image_details table. With just using the link of the image, there is no way to solve this. I also thought of maybe using the BlurHash from #5142 ? It probably could be used for caching instead of the traditional hashes. |
So this would only help in the specific case where a user browses two different Lemmy instances from the same app, and then views posts with identical images but different urls. Thats a very minor use case, and I dont think its worth the effort to optimize for it. |
Yep, I do know that happens though, with multiple communities that serve the same purpose and all that. But like I said at the end, I think the blurhash field that seems like is going to likely be added, can be used for this case. |
Yep, blurhash will be added in the next pictrs and lemmy release. Image hosting in general badly needs a decentralized hosted option, ideally one based on torrents or IPFS, because the situation right now is horrible. The exact same image gets shared to tons of sites and platforms, each having to host their own copy, while sharing none of the bandwidth to serve them, and wasting tons of disk space. We're just exacerbating that problem with lemmy (although the new proxying image feature of pictrs helps). If I had a lot more time I'd work on something. |
Requirements
Is your proposal related to a problem?
The image_details table in (for example) a getpost json response does not include the hash of the image. A hash could be used to cache images better.
I assume the link field in image_details could be used for image caching, but this would not cache duplicate images or duplicates in other instances.
Describe the solution you'd like.
A SHA256 hash would be calculated and stored when an image is uploaded to an instance. This would then be returned in the image_details table.
Describe alternatives you've considered.
None.
Additional context
No response
The text was updated successfully, but these errors were encountered: