Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): generate all thumbnails for an asset in one job #13012

Merged
merged 6 commits into from
Sep 28, 2024

Conversation

mertalev
Copy link
Contributor

Description

This PR changes image thumbnail generation to first decode and resize the input image to a preview-sized raw buffer, then reuse that buffer as the input for the preview, thumbnail and thumbhash. This has several advantages:

  • As reading from disk and decoding the image are the slowest steps of thumbnail generation, this makes generating thumbnails significantly faster
  • It is more cache-efficient due to handling everything to do with a file at once instead of spreading this out into jobs that run far apart from one another
  • It is easier to understand thumbnail generation progress when there is a 1:1 relationship between jobs and assets. It's currently a common source of confusion that the number of jobs keeps growing upward, and it's difficult to understand how much progress has been made
  • Other steps during thumbnail generation only need to run once, such as extracting embedded images or running FFprobe

A concern I had was around RAM usage. Remarkably, this approach uses less RAM than the current one despite the use of an in-memory buffer. Peak memory usage dropped from 2.47GiB to 1.22GiB during testing, and the average is also lower compared to preview generation.

Video thumbnails are not optimized in this PR to keep it smaller and more focused. A later PR will use a similar approach through FFmpeg. In the meantime, I refactored the code such that it will be easy to make this change. I intentionally made it generate video thumbnails sequentially instead of using Promise.all to avoid excessive resource usage.

Testing

I ran thumbnail generation on all assets and confirmed that jobs were processed successfully. Setting different image settings and enabling the "extract embedded preview" setting all functioned as expected. I additionally ran an image-only comparison on all assets:

thumbnail-generation-memory-usage

In order from left to right:

  • Thumbnail generation start on main
  • Thumbnail generation previews completed on main (starting thumbnails)
  • Thumbnail generation finished on main
  • Thumbnail generation started in PR
  • Thumbnail generation finished in PR

On main, generating the previews took 43 minutes and completing all jobs took 76 minutes , while this PR completed all thumbnail generation tasks in 44 minutes. For this particular library with default settings, this is a ~73% speedup for images. The gap will widen with more RAW or high megapixel images and when the "extract embedded preview" setting is enabled.

cleanup

add success logs, rename method

do thumbhash too

fixes

fix tests

handle `notify`

wip refactor

refactor
@mertalev mertalev force-pushed the perf/server-preview-thumbnail-together branch from a233104 to ad1b572 Compare September 28, 2024 07:28
Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

server/src/repositories/media.repository.ts Outdated Show resolved Hide resolved
server/src/services/media.service.spec.ts Outdated Show resolved Hide resolved
await this.assetRepository.upsertFile({ assetId: asset.id, type: AssetFileType.THUMBNAIL, path: thumbnailPath });
await this.assetRepository.update({ id: asset.id, updatedAt: new Date() });
await this.assetRepository.upsertJobStatus({ assetId: asset.id, thumbnailAt: new Date() });
await this.assetRepository.upsertJobStatus({ assetId: asset.id, previewAt: new Date(), thumbnailAt: new Date() });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future should we combine these into one column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. There's not much point to having both when they get set at the same time.

@mertalev mertalev enabled auto-merge (squash) September 28, 2024 17:34
@mertalev mertalev merged commit 2bcd27e into main Sep 28, 2024
35 checks passed
@mertalev mertalev deleted the perf/server-preview-thumbnail-together branch September 28, 2024 17:47
Copy link
Member

@martabal martabal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, that's great. Thanks !

@aviv926
Copy link
Contributor

aviv926 commented Sep 28, 2024

It's great. I wonder how many assets you ran the test on?

@mertalev
Copy link
Contributor Author

It's great. I wonder how many assets you ran the test on?

I ran it on 3,500 images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants