Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing output from JSON to text changes the query behaviour #7504

Closed
acdha opened this issue Dec 7, 2022 · 14 comments
Closed

Changing output from JSON to text changes the query behaviour #7504

acdha opened this issue Dec 7, 2022 · 14 comments
Assignees
Labels
documentation This is a problem with documentation. feature-request A feature should be added or improved. p3 This is a minor priority issue

Comments

@acdha
Copy link

acdha commented Dec 7, 2022

Describe the bug

I have a query which returns the correct result using JSON output:

$ aws rds describe-db-cluster-snapshots --db-cluster-identifier "example-production" --region us-east-1 --query "reverse(sort_by(DBClusterSnapshots[*], &SnapshotCreateTime))[0] | DBClusterSnapshotArn"
"arn:aws:rds:us-east-1:example:cluster-snapshot:rds:example-production-2022-12-07-03-29"

If I modify that command to add --output=text, it returns two results with the first one being out of order:

$ aws rds describe-db-cluster-snapshots --db-cluster-identifier "example-production" --region us-east-1 --query "reverse(sort_by(DBClusterSnapshots[*], &SnapshotCreateTime))[0] | DBClusterSnapshotArn"
arn:aws:rds:us-east-1:example:cluster-snapshot:awsbackup:job-34b421c1-9c32-dec6-920e-ce8630a0fd9b
arn:aws:rds:us-east-1:example:cluster-snapshot:rds:example-production-2022-12-07-03-29

Expected Behavior

Changing the output format should not change the query behaviour

Current Behavior

Changing the format from JSON to text causes some kind of structural change which is not visible in the source JSON (i.e. these are returned as a single list, not two lists).

Reproduction Steps

$ aws rds describe-db-cluster-snapshots --db-cluster-identifier "example-production" --region us-east-1 --query "reverse(sort_by(DBClusterSnapshots[*], &SnapshotCreateTime))[0] | DBClusterSnapshotArn"
arn:aws:rds:us-east-1:example:cluster-snapshot:awsbackup:job-34b421c1-9c32-dec6-920e-ce8630a0fd9b
arn:aws:rds:us-east-1:example:cluster-snapshot:rds:example-production-2022-12-07-03-29

Possible Solution

No response

Additional Information/Context

No response

CLI version used

aws-cli/2.9.4 Python/3.11.0 Darwin/22.1.0 source/arm64 prompt/off

Environment details (OS name and version, etc.)

ProductName: macOS ProductVersion: 13.0.1 BuildVersion: 22A400

@acdha acdha added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 7, 2022
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Dec 7, 2022
@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Dec 7, 2022
@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the needs-triage This issue or PR still needs to be triaged. label Dec 7, 2022
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @acdha, thanks for raising this issue. I wasn't able to reproduce this behavior using your provided code snippet; could you provide the debug logs by adding --debug to your input? Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 7, 2022
@acdha
Copy link
Author

acdha commented Dec 9, 2022

@RyanFitzSimmonsAK Does your system have both RDS and AWS Backup snapshots? I suspect those are grouped differently in the underlying API response and the CLI is papering over the differences in the transform code because I only see this on Aurora clusters which have both of those enabled.

@RyanFitzSimmonsAK
Copy link
Contributor

Yes, I tried to reproduce it on an Aurora cluster with both types of snapshots, and was not able to reproduce this behavior. If you can provide debug logs, it will help me root source this issue. Please don't forget to redact any sensitive info. Thanks!

@acdha
Copy link
Author

acdha commented Dec 12, 2022

Here's a redacted log of the two commands back to back:

query.log

@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 12, 2022
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @acdha. We were still unable to reproduce this behavior or figure it out from the logs. Can you explain how you have your client side pagination configured? Also, does this behavior occur if you do output json instead of output text? Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Dec 16, 2022
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the p3 This is a minor priority issue label Feb 7, 2023
@enescakir
Copy link

I have similar problem for aws s3api list-objects-v2. It was working last weeks.
The query tries to find latest file.

aws s3api list-objects-v2 \
  --bucket "my-bucket" \
  --prefix "development" \
  --query 'reverse(sort_by(Contents[?contains(Key, `template_config`)], &LastModified))[:1]' \
  --output=text

# Output
"f4edd198b40bbdd55fe02ab819132ee4"      development-b7781/template_config.json  2023-02-09T14:30:32+00:00       196     STANDARD
"c66e58a1e42da7f06794971663f36305"      development-e7755/template_config.json  2023-01-31T14:01:14+00:00       196     STANDARD
aws s3api list-objects-v2 \
  --bucket "my-bucket" \
  --prefix "development" \
  --query 'reverse(sort_by(Contents[?contains(Key, `template_config`)], &LastModified))[:1]' \
  --output=json

# Output
[
    {
        "Key": "development-b7781/template_config.json",
        "LastModified": "2023-02-09T14:30:32+00:00",
        "ETag": "\"f4edd198b40bbdd55fe02ab819132ee4\"",
        "Size": 196,
        "StorageClass": "STANDARD"
    }
]

@enescakir
Copy link

I added --no-paginate flag. It looks working correct.

aws s3api list-objects-v2 \
  --bucket "my-bucket" \
  --prefix "development" \
  --query 'reverse(sort_by(Contents[?contains(Key, `template_config`)], &LastModified))[:1]' \
  --output=text \
  --no-paginate

# Output
"f4edd198b40bbdd55fe02ab819132ee4"      development-b7781/template_config.json  2023-02-09T14:30:32+00:00       196     STANDARD

@reidg44
Copy link

reidg44 commented Mar 3, 2023

Seeing this issue as well. In my case:

aws stepfunctions list-executions \
   --state-machine-arn arn:aws:states:us-east-1:XXXXXXXXXXXXX:stateMachine:my-step-function \
   --query 'executions[0].executionArn' \
   --output text

# Output
arn:aws:states:us-east-1:XXXXXXXXXXXXX:execution:my-step-function:EXAMPLE1
arn:aws:states:us-east-1:XXXXXXXXXXXXX:execution:my-step-function:EXAMPLE2
aws stepfunctions list-executions \
   --state-machine-arn arn:aws:states:us-east-1:XXXXXXXXXXXXX:stateMachine:my-step-function \
   --query 'executions[0].executionArn' \
   --output json

# Output
arn:aws:states:us-east-1:XXXXXXXXXXXXX:execution:my-step-function:EXAMPLE1

Running with debug shows pagination happening (this only showed up after our 100th step function execution). It looks like the JMES query is being run on each response rather than at the end for the combined results when the --output text flag is on

@jeffrey-aguilera
Copy link

jeffrey-aguilera commented Mar 25, 2023

Here's another example:

aws ec2 describe-instance-types --region us-east-1 --query 'sort_by(InstanceTypes[*],&InstanceType)|[].[InstanceType,VCpuInfo.DefaultVCpus,MemoryInfo.SizeInMiB]' --no-cli-pager --filters Name=current-generation,Values=true Name=processor-info.supported-architecture,Values=x86_64 Name=vcpu-info.default-vcpus,Values=2 Name=instance-storage-supported,Values=true --output json

and

aws ec2 describe-instance-types --region us-east-1 --query 'sort_by(InstanceTypes[*],&InstanceType)|[].[InstanceType,VCpuInfo.DefaultVCpus,MemoryInfo.SizeInMiB]' --no-cli-pager --filters Name=current-generation,Values=true Name=processor-info.supported-architecture,Values=x86_64 Name=vcpu-info.default-vcpus,Values=2 Name=instance-storage-supported,Values=true --output text

give different results.

Incidentally, json, table, and yaml output are all in the correct order. Only text output is changed.

reidg44's comment is spot on. The reason this test example surfaces the issue is that the server-side filters are complex and slow-ish.

aws-cli/2.11.5 Python/3.11.2 Darwin/22.3.0 source/x86_64 prompt/off

@RyanFitzSimmonsAK RyanFitzSimmonsAK added needs-review This issue or pull request needs review from a core team member. p2 This is a standard priority issue investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed p3 This is a minor priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. needs-review This issue or pull request needs review from a core team member. labels Mar 28, 2023
@RyanFitzSimmonsAK
Copy link
Contributor

RyanFitzSimmonsAK commented Mar 28, 2023

Hey, thanks everyone for providing more details and examples. This behavior is technically expected; @reidg44 is correct that this behavior occurs because the text is processed one page at a time, and each page returns the result of the query. This similar issue has a more detailed explanation. To avoid this behavior, you can either turn off pagination, or set a higher page size using --page-size. Please let me know if you have any follow-up questions. Thanks!

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. p2 This is a standard priority issue labels Mar 28, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the p3 This is a minor priority issue label Mar 28, 2023
@acdha
Copy link
Author

acdha commented Mar 29, 2023

@RyanFitzSimmonsAK @aBurmeseDev that explanation makes sense but I think there's still at least one thing which could be improved to avoid continued reports of this as a bug. The documentation for --output says this:

--output (string)

The formatting style for command output.

  • json
  • text
  • table
  • yaml
  • yaml-stream

Nothing about that says that one of the modes silently changes from buffered to streaming processing — the existence of the special case yaml-stream option implies the opposite — and streaming would also be desirable for JSON output on large queries (e.g. I use a plenty of other tools which emit line-oriented JSON to improve parallelism).

The path of least resistance here would be to update the --output documentation to identify the change in functionality implied by text mode but it seems like it would be better to have an option like --output-stream or --streaming-output which could control that behaviour for json, text, and yaml and supersede yaml-stream. (table seems impossible to support this way without knowing the column widths in advance).

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 29, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the documentation This is a problem with documentation. label Apr 4, 2023
@RyanFitzSimmonsAK RyanFitzSimmonsAK added the feature-request A feature should be added or improved. label Apr 26, 2023
@amberkushwaha
Copy link

Nothing about that says that one of the modes silently changes from buffered to streaming processing — the existence of the special case yaml-stream option implies the opposite — and streaming would also be desirable for JSON output on large queries (e.g. I use a plenty of other tools which emit line-oriented JSON to improve parallelism).
so please do simplify the module initials to run the program as its been a while for the new version updates.

@RyanFitzSimmonsAK
Copy link
Contributor

The warning clarifying this behavior on the query page of the user guide has been added to the output page as well. Closing this issue.

Copy link

github-actions bot commented Sep 4, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This is a problem with documentation. feature-request A feature should be added or improved. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

6 participants