Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems cli does not verify the checksum when downloading an object #6710

Open
1 of 3 tasks
buptwxd2 opened this issue Feb 11, 2022 · 13 comments
Open
1 of 3 tasks

Seems cli does not verify the checksum when downloading an object #6710

buptwxd2 opened this issue Feb 11, 2022 · 13 comments
Assignees
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3

Comments

@buptwxd2
Copy link

buptwxd2 commented Feb 11, 2022

Confirm by changing [ ] to [x] below:

Issue is about usage on:

  • Service API : I want to do X using Y service, what should I do?
  • CLI : passing arguments or cli configurations.
  • Other/Not sure.

Platform/OS/Hardware/Device
What are you running the cli on?

Describe the question
According to "AWS S3 CLI FAQ"(https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html#cli-aws-help-s3-faq), the aws cli tool will try to erify the checksum of downloads when possible.
However, i tried to corrupt an object and used the aws cli to download the corrupted object, it succeeded without detecting the mismatch.

Steps

  1. preapre a local file, size:4MB witih all zero

  2. upload the fille to s3 using aws cli
    aws s3api put-object --bucket xd-bk-1 --key test --body 4M_zero
    root@vm102 ~/x/test# aws s3api head-object --bucket xd-bk-1 --key test
    {
    "AcceptRanges": "bytes",
    "LastModified": "2022-02-11T07:14:36+00:00",
    "ContentLength": 4194304,
    "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"",
    "ContentType": "binary/octet-stream",
    "Metadata": {},
    "StorageClass": "STANDARD"
    }

  3. corrupt the object

  4. Download the object
    root@vm102 ~/x/test# aws s3api get-object --bucket xd-bk-1 --key test d_test
    {
    "AcceptRanges": "bytes",
    "LastModified": "2022-02-11T07:14:36+00:00",
    "ContentLength": 4194304,
    "ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"",
    "ContentType": "binary/octet-stream",
    "Metadata": {},
    "StorageClass": "STANDARD"
    }
    root@vm102 ~/x/test# md5sum d_test
    6c8b11cda139dbb04a83190975220d98 d_test

As a comparison, the s3cmd tool detected the mistach as below
root@vm102 ~/x/test [64]# s3cmd get s3://xd-bk-1/test dtest
download: 's3://xd-bk-1/test' -> 'dtest' [1 of 1]
4194304 of 4194304 100% in 0s 168.38 MB/s done
WARNING: MD5 signatures do not match: computed=6c8b11cda139dbb04a83190975220d98, received=b5cfa9d6c8febd618f91ac2843d50a1c

Logs/output
Get full traceback and error logs by adding --debug to the command.

@buptwxd2 buptwxd2 added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Feb 11, 2022
@kdaily kdaily added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Feb 14, 2022
@kdaily kdaily self-assigned this Feb 14, 2022
@kdaily
Copy link
Member

kdaily commented Feb 14, 2022

Hi @buptwxd2,

Thanks for your post. Can you provide more details as to how you corrupted the object in step 3?

@kdaily kdaily added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Feb 14, 2022
@buptwxd2
Copy link
Author

Hi @kdaily , i am using the open-souce Ceph project for testing which is compatible with AWS S3.
So i could use the internal way to overwrite the backend data, hence corrupt the data.

Here i want to double check if aws cli could check the data integrity as claimed in the FAQ.
As a comparison, the s3cmd tool could detect the E-tag mistach.

It would be great if aws cli could support this behavior.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 15, 2022
@buptwxd2
Copy link
Author

Hi @kdaily,any update on this thread?

Thanks

@kdaily
Copy link
Member

kdaily commented Feb 18, 2022

@buptwxd2,

The documentation you referred to is for the high level aws s3 commands, not for the low level aws s3api commands. The aws s3api commands you are using are directly from the AWS S3 API, and no check of content based on MD5 is computed for a download. For uploads using PutObject via aws s3api put-object, an MD5 check is performed, as noted in the documentation.

If you are using aws s3 cp or aws s3 sync to transfer from S3 to a local file storage, then except in the cases outlined an MD5 check is performed. These operations are the closest comparison to s3cmd. For example, if the object was uploaded via multipart uploads, there is no MD5 for the entire object, MD5s are only checked on each part. If you want to be able to check the MD5 of the entire object, you would need to set this on the object metadata yourself.

I hope this answers your questions!

@kdaily kdaily closed this as completed Feb 18, 2022
@kdaily kdaily added the s3api label Feb 18, 2022
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

@buptwxd2
Copy link
Author

Thanks @kdaily .
So the high level "aws s3 cp" should check the MD5 based on your response?
I tried "aws s3 cp" command but still no mismatch detected.

@kdaily kdaily reopened this Feb 21, 2022
@kdaily
Copy link
Member

kdaily commented Feb 23, 2022

Hi @buptwxd2,

Can you please provide debug logs (add --debug to your command) showing what happens in this case? Please redact any sensitive information. Thanks!

Edit: I'm also reviewing the documentation to confirm it's valid.

@kdaily kdaily added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. s3 and removed s3api labels Feb 23, 2022
@buptwxd2
Copy link
Author

Hi @kdaily

Please see the attached file for the detailed logs.
I corrupted the object 4M and used "aws s3 cp" to download the object.

root@sds5 ~/x/test# aws s3api head-object --bucket xd-bk-2 --key 4M
{
"AcceptRanges": "bytes",
"LastModified": "2022-02-24T10:02:43+00:00",
"ContentLength": 4194304,
"ETag": ""b5cfa9d6c8febd618f91ac2843d50a1c"",
"ContentType": "binary/octet-stream",
"Metadata": {},
"StorageClass": "STANDARD"
}
root@sds5 ~/x/test# aws s3 cp s3://xd-bk-2/4M d_4M
download: s3://xd-bk-2/4M to ./d_4M
root@sds5 ~/x/test# md5sum d_4M
4ad9688f6ce9fe176dcfecf94f96e635 d_4M
log.txt

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 24, 2022
@buptwxd2
Copy link
Author

buptwxd2 commented Mar 1, 2022

Hi @kdaily , any update on this thread?

Thanks.

@kdaily kdaily added the needs-review This issue or pull request needs review from a core team member. label Mar 5, 2022
@kdaily
Copy link
Member

kdaily commented Mar 5, 2022

Hi @buptwxd2,

Still looking into this. Thanks for your patience.

@kdaily kdaily added feature-request A feature should be added or improved. and removed guidance Question that needs advice or information. needs-review This issue or pull request needs review from a core team member. labels Mar 8, 2022
@kdaily
Copy link
Member

kdaily commented Mar 8, 2022

Hi @buptwxd2,

Thanks for your patience. It seems that this functionality was not migrated when the AWS CLI started using the s3transfer implementation. I'm going to update the docs so that they are current, and we will explore what next steps there would be.

@buptwxd2
Copy link
Author

HI @kdaily ,

Do we hava a conclusion on this issue?
Will it be supported to verify the checksum when downloading an object?

Thanks a lot.

@tim-finnigan tim-finnigan added the p2 This is a standard priority issue label Nov 10, 2022
@shaiksuhel1999
Copy link

shaiksuhel1999 commented Dec 13, 2023

Yes, aws cli doesn't do checksum validation in case of download.

I have Verified with High level Api and Low Level Api of aws cli by enabling debug mode but I can't see any where it doing checksum validation.

High Level Api : aws s3 cp /path/to/file s3://bucket/object-key
Note: s3 is high level api of aws-cli

Low Level Api : aws s3api get-object --bucket bucket-name --key object-key /path/to/file
Note: s3api is low level api of aws-cli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3
Projects
None yet
Development

No branches or pull requests

4 participants