Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http 400 Preflight validation check error the operation failed or was cancelled needs detailed message #23350

Open
hemarina opened this issue Aug 19, 2024 · 14 comments
Labels
ARM customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@hemarina
Copy link

Bug Report

I'm testing preflight validation function with an unsupported location (e.g australiacentral) for todo-nodejs-mongo-aks. The error message is the operation failed or was cancelled. The error message should provide a detailed message on why it is failing.

I debugged and this is the error message that got skipped: The template deployment '****' is not valid according to the validation procedure. The tracking id is '****'. See inner errors for details. Preflight validation check for resource(s) for container service *** in resource group ****** failed. Message: Virtual Machine size: '' is not supported for subscription ***** in location 'australiacentral'. Please refer to aka.ms/aks/vm-size-selector to find supported VM sizes in location 'australiacentral'. This message is necessary for users to debug and should be provided in error message.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Aug 19, 2024
@lirenhe lirenhe added the Mgmt This issue is related to a management-plane library. label Aug 20, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Aug 20, 2024
@tadelesh
Copy link
Member

could you provide your go code of using the BeginValdate method?

@tadelesh tadelesh added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Aug 20, 2024
Copy link

Hi @hemarina. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@hemarina
Copy link
Author

hemarina commented Aug 20, 2024

@tadelesh

func (ds *deployments) ValidatePreflightToResourceGroup(
	ctx context.Context,
	subscriptionId, resourceGroup, deploymentName string,
	armTemplate azure.RawArmTemplate,
	parameters azure.ArmParameters,
	tags map[string]*string,
) (*armresources.DeploymentPropertiesExtended, error) {
	deploymentClient, err := ds.createDeploymentsClient(ctx, subscriptionId)
	if err != nil {
		return nil, fmt.Errorf("creating deployments client: %w", err)
	}

	validate, err := deploymentClient.BeginValidate(ctx, resourceGroup, deploymentName,
		armresources.Deployment{
			Properties: &armresources.DeploymentProperties{
				Template:   armTemplate,
				Parameters: parameters,
				Mode:       to.Ptr(armresources.DeploymentModeIncremental),
			},
			Tags: tags,
		}, nil)
	if err != nil {
		return nil, fmt.Errorf("calling preflight validate api failing: %w", err)
	}

	validateResult, err := validate.PollUntilDone(ctx, nil)
	if err != nil {
		deploymentError := createDeploymentError(err)
		return nil, fmt.Errorf(
			"validating preflight to resource group:\n\nDeployment Error Details:\n%w",
			deploymentError,
		)
	}

	return validateResult.DeploymentValidateResult.Properties, nil
}

https://github.com/hemarina/azure-dev/blob/4b4f6a3779d40b9d95914b26a28d63688da96edb/cli/azd/pkg/azapi/deployments.go#L296

I got error message: ERROR: deployment failed: error deploying infrastructure: calling preflight validate api failing: the operation failed or was cancelled

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Aug 20, 2024
@tadelesh
Copy link
Member

refer the following code to deal with error code and error message

var rawResponse *http.Response
ctx := context.TODO() // your context
ctxWithResp := runtime.WithCaptureResponse(ctx, &rawResponse)
resp, err := resourceGroupsClient.CreateOrUpdate(ctxWithResp, resourceGroupName, resourceGroupParameters, nil)
if err != nil {
    // with error, you can get RawResponse from context
    log.Printf("Status code: %d", rawResponse.StatusCode)
    var respErr *azcore.ResponseError
    if errors.As(err, &respErr) {
        // with error, you can also get RawResponse from error
        log.Fatalf("Status code: %d", respErr.RawResponse.StatusCode)
    } else {
        log.Fatalf("Other error: %+v", err)
    }
}
// without error, you can get RawResponse from context
log.Printf("Status code: %d", rawResponse.StatusCode)

@weikanglim
Copy link

I can add some debugging notes here since I was fortunate enough to be present with @hemarina at the time, but I do believe this is a bug in the Swagger spec that is used to generate deployment_clients.go.

Here's the execution flow we observed:

  1. Client calls BeginValidate in deployment_clients.go.
  2. ARM returns 400 Bad Request
  3. validate checks for Bad Request, but considers it a success state, it returns httpResp, nil instead of nil, err.
  4. Execution continues until NewPoller, where it's rejected by the validation check.

In summary, the bug observed by the user is:

  • When ARM returns 400 bad request for deployments/validate, the error the operation failed or was cancelled is returned instead of the expected azcore.ResponseError.

We are also happy to provide a minimal repro setup, but it boils down to calling deploymentsClient.BeginValidate with parameters that will result in an ARM 400 response.

@tadelesh
Copy link
Member

@weikanglim thanks for the detail. if possible, could you provide the detail logs with this instruction. i want to confirm the service return 400 with the lro init call.

@rajeshkamal5050
Copy link

@hemarina can you enable logging and share?

@hemarina
Copy link
Author

hemarina commented Aug 29, 2024

POST https://management.azure.com/subscriptions/***/resourcegroups/***providers/Microsoft.Resources/deployments/***/validate?api-version=2021-04-01
   Accept: application/json
   Authorization: REDACTED
   Content-Length: 10435
   Content-Type: application/json
   User-Agent: azsdk-go-armresources.DeploymentsClient/v1.1.1 (go1.22.1; Windows_NT),azdev/0.0.0-dev.0 (Go go1.22.1; windows/amd64)
   X-Ms-Correlation-Request-Id: ***
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
... Detailed version sent through Teams...

Retry: response 400
Retry: exit due to non-retriable status code

Status code: 400
Other error: the operation failed or was cancelled

@tadelesh
Copy link
Member

@jhendrixMSFT For LRO with 400 response definition, could it be the initial request response result? (see spec here)
@hemarina Before we confirm if it is a service issue, I think you could also try to get the error info with my comment code with the first call of LRO, it should also work for LRO.

@jhendrixMSFT
Copy link
Member

Including a 400 in the responses indicates it should be treated as a success response. However, if you look at the schema for the 400 it's an error type. The service team either needs to remove the 400 or add x-ms-error-response to it.

@tadelesh
Copy link
Member

Got it. Thanks Joel. Let me add service attention label.

@tadelesh tadelesh added Service Attention Workflow: This issue is responsible by Azure service team. ARM labels Aug 30, 2024
@Azure Azure deleted a comment from github-actions bot Aug 30, 2024
@tadelesh tadelesh added Service Attention Workflow: This issue is responsible by Azure service team. and removed Service Attention Workflow: This issue is responsible by Azure service team. labels Aug 30, 2024
@github-actions github-actions bot added the needs-team-triage Workflow: This issue needs the team to triage. label Aug 30, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

@hemarina
Copy link
Author

hemarina commented Sep 3, 2024

@tadelesh @armleads-azure @jhendrixMSFT azd <-> preflight integration is currently blocked due to this.

Could you prioritize fixing this issue and provide an ETA?

@tadelesh
Copy link
Member

tadelesh commented Sep 4, 2024

Our SDK is auto-generated from service's spec. @armleads-azure could you help to prioritize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

6 participants