Skip to content

Conversation

@lpulley
Copy link
Contributor

@lpulley lpulley commented May 10, 2025

It doesn't seem quite right to me that resize_image's output filename hash reads the path of the input file; shouldn't it read the contents instead? That way:

  • if an input file is modified in place, Zola will not reuse the now-stale processed file
  • if an input file is moved but retains the original filename, Zola will be able to reuse the existing processed file
    • (This one begs the question: should the original filename be included in the processed output filename at all? Should the processed output filename just be a hash digest and an extension? It would minimize the impact of renames and maybe even allow multiple identical input files to reuse a single processed output file.)

(Also, GetImageMetadata::call can just use the full path to the file as its cache key, so the return signature of search_for_file can be simplified.)

Since #2862/#2872 have changed the hash behavior on next, I figure this is maybe a good time to consider this, too, as it also affects the imageproc hash behavior.

@lpulley
Copy link
Contributor Author

lpulley commented Jun 6, 2025

@Keats do I have a good idea here? Bumping because of:

Since #2862/#2872 have changed the hash behavior on next, I figure this is maybe a good time to consider this, too, as it also affects the imageproc hash behavior.

@lpulley
Copy link
Contributor Author

lpulley commented Jun 8, 2025

Hm... this is vulnerable to the input contents changing between enqueueing and image processing. I suppose the queue would have to hold the input contents instead of the input path?

@Keats
Copy link
Collaborator

Keats commented Jun 9, 2025

I think that makes sense

Copy link
Collaborator

@Keats Keats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but the performance changes seem to be a net negative even when you take into account regenerating the image if we change the filename

) -> Result<String> {
let mut hasher = DefaultHasher::new();
hasher.write(input_src.as_ref());
hasher.write(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that change perf significantly? I can imagine reading a whole site of images is going to be much more computationally expensive than filenames

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it doesn't make sense to read the file multiple times, once for the filename and once to actually operate on it

Copy link
Contributor Author

@lpulley lpulley Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been thinking about how to approach this. We'd essentially want to read each input once (at most), and for each operation to hold a reference or key to the in-memory contents of the input for processing.

I'll push something for that if I figure out a good solution.

@lpulley lpulley force-pushed the imageproc-hash-files-not-paths branch from 3801386 to 35110a5 Compare July 8, 2025 02:20
@lpulley lpulley force-pushed the imageproc-hash-files-not-paths branch from 35110a5 to a7b7acb Compare July 26, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants