-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws s3 sync
downloading unchanged files.
#7228
Comments
Hi @MrJoy thanks for reaching out. Have you tried using the |
I attempted it just now, and it did not change anything. It's still attempting to download ~everything. Note that after submitting this ticket, I updated to 2.7.27 -- so this test was done on 2.7.27 not 2.7.26. |
Thanks for the update. There is an older issue tracking problems with S3 sync here: #599. Some users have reported anomalies when certain files sync that should not, but I wouldn't expect the problem at the scale you're describing where it's happening with hundreds of files. I don't know if I'd be able to reproduce the issue as described but could try. If you can get the debug logs by adding |
@tim-finnigan I'm sorry, I was unclear in my last message: When I said "I attempted it just now", I meant "I attempted to use If you'd like, I can temporarily give you read-only credentials for this bucket and you can see if you are able to recreate the problem from the same source. My personal AWS bill is... not data I'm terribly worried about sharing. I'll get |
I've stripped tokens/signatures/key IDs from the file, but it's otherwise as produced from running: aws-vault exec mrjoy -- aws s3 sync --debug --size-only s3://mrjoy-billing-data/ ~/personal/Finance/AWS_Billing_Data/ |
Thanks @MrJoy for following up and sharing your logs. I couldn't identify any anomalies after scanning through the logs. I think attempting to recreate the issue is a good idea, but for that I recommend reaching out through AWS Support to open a private communication channel. I'd also recommend trying to use |
I went ahead and tried Going through AWS Support is not an option, as this is my personal account and I'm on the Basic plan. |
Using |
Hi again, thanks for your patience, I lost track of this issue. Per the s3 sync documentation
And
Do you have any updates on your end as far as what you've tried? I still can't reproduce the issue but invite others to share their insights here if they know what the problem could be. |
The totality of my script is, at present, this: #!/bin/bash
IFS=$'\n\t'
set -euo pipefail
(
cd ~/personal
git add .
git commit --all --allow-empty -m "AWS bill snapshot, pre-fetch..."
aws-vault exec mrjoy -- aws s3 sync --size-only s3://mrjoy-billing-data/ ~/personal/Finance/AWS_Billing_Data/
git add .
git commit --all --allow-empty -m "AWS bill snapshot, post-fetch..."
)
(
cd ~/mjbackup/aws
git add .
git commit --all --allow-empty -m "AWS log snapshot, pre-fetch..."
aws-vault exec mrjoy -- aws s3 sync --size-only s3://mrjoy-logs/ ~/mjbackup/aws/access/
aws-vault exec mrjoy -- aws s3 sync --size-only s3://mrjoy-api-logs/ ~/mjbackup/aws/api/
git add .
git commit --all --allow-empty -m "AWS log snapshot, post-fetch..."
)
echo 'Done.' As of today, that first sync job has an issue and the other two do not. So the problem is clearly dependent upon the data in S3 and/or my local filesystem. In the case of the first sync job, it's notably that only the Currently, I'm using aws-cli version:
I'm doing a test real quick to have the first sync happen to a different folder, so I can see if it's something to do with the local FS side of things. Will post results momentarily. I'm happy to give you temporary access to that bucket so you can see if that's helpful in reproducing the issue. |
(Just to clarify: When I say no changes result, I mean I wind up with an empty commit despite aws-cli downloading 744.4MB of data.) |
Ok. Re-running (twice) against a clean sub-folder produces the same behavior of the data being re-synced. So it seems to be either an issue on the S3 side, not something related to how the data (originally) got stored on disk locally. |
An example of the details of one object that's getting re-synced. |
@tim-finnigan Would it be helpful if I gave you access to the relevant S3 bucket? |
|
Describe the bug
I have a maintenance script I run to keep a local copy of billing & usage data for my personal AWS account. It's identifying almost every file as changed, on every run even though most of the files haven't been modified in years.
Expected Behavior
Only changed files -- in this case, files representing the current billing period -- should be downloaded.
Current Behavior
Of 6,279 files that do not represent the current billing period, it's consistently re-downloading 5,831 of them. The files it downloads are, byte-for-byte identical to the existing ones. I spot-checked one of the files, and
aws s3 ls
reports the exact same size and timestamp asls
does.Reported by
aws s3 sync
:Reported by
aws s3 ls
:Reported by
ls
:The
post-fetch
commit in all cases shows diffs for the files in the current billing period (as would be expected), and no changes to any of the other files thataws s3 sync
reports as being downloaded.All told,
aws s3 sync
appears to be downloading around 700MB of files on each run that it shouldn't be.Reproduction Steps
The relevant portion of my script is:
The data in the bucket is written by AWS itself.
Possible Solution
No response
Additional Information/Context
No response
CLI version used
2.7.26
Environment details (OS name and version, etc.)
macOS 12.5.1
The text was updated successfully, but these errors were encountered: