You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered the following error while using django-storages' S3 backend:
botocore.exceptions.ClientError: An error occurred (BadDigest) when calling the PutObject operation (reached max retries: 1): The Content-MD5 you specified did not match what we received.
I tracked it down to it being triggered by the upload of text files containing non-ascii characters to S3.
After a bit of spelunking in django-storages and boto3, I identified storages.utils.ReadBytesWrapper as the culprit.
Its issue is that while it does encode text content as expected in its read method, it does not handle seek and tell correctly. Indeed, it delegates those calls to the underlying text file handler which produces results inconsistent with what the read method returns:
> file = ReadBytesWrapper(ContentFile("é"))
> len(file.read())
2
> # Seek to the "end" of the file, it should return 2 since the binary data has a length
> # of 2 but returns 1 because the text data has a length of 1
> file.seek(0, 2)
1
> # Similar results with tell
> file.tell()
1
boto3 uses seek and tell to determine the length of the content to upload cf, gets an incorrect value from this and then uploads truncated content which do not pass the MD5 checksum check that S3 (thankfully :)) does.
The fix in our codebase is very simple: encode the text data ourselves instead of delegating that to django-storages but it would obviously be better if this was fixed upstream.
The text was updated successfully, but these errors were encountered:
Hi there,
I encountered the following error while using django-storages' S3 backend:
botocore.exceptions.ClientError: An error occurred (BadDigest) when calling the PutObject operation (reached max retries: 1): The Content-MD5 you specified did not match what we received.
I tracked it down to it being triggered by the upload of text files containing non-ascii characters to S3.
After a bit of spelunking in django-storages and boto3, I identified
storages.utils.ReadBytesWrapper
as the culprit.Its issue is that while it does encode text content as expected in its
read
method, it does not handleseek
andtell
correctly. Indeed, it delegates those calls to the underlying text file handler which produces results inconsistent with what theread
method returns:boto3 uses
seek
andtell
to determine the length of the content to upload cf, gets an incorrect value from this and then uploads truncated content which do not pass the MD5 checksum check that S3 (thankfully :)) does.Please find a minimal working example here.
The fix in our codebase is very simple: encode the text data ourselves instead of delegating that to django-storages but it would obviously be better if this was fixed upstream.
The text was updated successfully, but these errors were encountered: