Skip to content

Conversation

@FraserThompson
Copy link
Contributor

@FraserThompson FraserThompson commented Aug 28, 2023

Changes:

Fixes #459 and fixes #59

  • Instead of using the MD5 hash of the local file and comparing it to the etag from S3 to determine if a file has changed and needs re-uploading, just compare the size and the modification time.
  • Skipped files are output to console so it's easier to know that it's doing something during deploys of large websites.
  • The number of uploaded files is counted and output at the end so it's easier to tell whether changes were uploaded.
  • Minor refactoring on the upload loop for consistency and readability.

Explanation:

Comparing etags to the MD5 hash of each file is, imo, an inferior method for these reasons:

  1. MD5 hashing files (especially big ones) takes a while, so if you have a site with a lot of large files or you're on a particularly weak computer your deploys will be slow.
  2. Etags for files uploaded to S3 via multi-part (aka any file over 16MB by default) are NOT simply the MD5 hash of the file, which means files over 16MB will always be re-uploaded (Large files will always be re-uploaded #59). It's possible to calculate what the etag will be for files uploaded via multi-part, but it's a bit fiddly.

Instead, comparing the size and the modification time is robust enough that I think it's fine to be the default method. It's the method used by the sync function included in the AWS CLI so if it's good enough there I reckon it's also good enough here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hashing each file means large sites sync very slowly Large files will always be re-uploaded

1 participant