Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to configure shm_size when launching a CommandJob with AzureML SDK v2 #6571

Closed
tmignot63 opened this issue Jul 27, 2023 · 4 comments
Assignees
Labels
Auto-Assign Auto assign by bot bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. CXP Attention This issue is handled by CXP team. extension/ml Machine Learning question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@tmignot63
Copy link

Describe the bug

I get an Validation error when I want to launch a CommandJob with a custom shm_size

Related command

az ml job create --file file.yaml

Errors

Configured default 't-bs-mf-explore-iris2-phd' for arg resource_group_name
Configured default 'aml-dcy-int-iris2-phd' for arg workspace_name
Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
Command ran in 2.107 seconds (init: 0.441, invoke: 1.666)

Issue script & Debug output

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
cli: None
cli: Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code

cli.azure.cli.core.azclierror: Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda/envs/torch12/lib/python3.8/site-packages/knack/cli.py", line 233, in invoke
cmd_result = self.invocation.execute(args)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 663, in execute
raise ex
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 726, in _run_jobs_serially
results.append(self._run_job(expanded_arg, cmd_copy))
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 697, in _run_job
result = cmd_copy(params)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 333, in call
return self.handler(*args, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
return op(**command_args)
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 77, in ml_job_create
log_and_raise_error(err, debug)
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/raise_error.py", line 117, in log_and_raise_error
raise cli_error
knack.util.CLIError: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.

cli.azure.cli.core.azclierror: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
az_command_data_logger: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x7fd1e33cb040>]
az_command_data_logger: exit code: 1
cli.main: Command ran in 1.322 seconds (init: 0.568, invoke: 0.753)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 4340 in cache
telemetry.check: Returns Positive.
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "/anaconda/envs/torch12/bin/python /anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/telemetry/init.py /home/azureuser/.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.

Expected behavior

The job should be launched

Environment Summary

azure-cli 2.50.0

core 2.50.0
telemetry 1.0.8

Extensions:
ml 2.5.0

Dependencies:
msal 1.22.0
azure-mgmt-resource 23.1.0b2

Python location '/anaconda/envs/torch12/bin/python'
Extensions directory '/opt/az/extensions'

Python (Linux) 3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0]

Legal docs and information: aka.ms/AzureCliLegal

Additional context

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json

experiment_name: model_run_semantic_v2
display_name: DUAT_model_run

resources:
  shm_size: 28g

environment_variables:
  DATASET_MOUNT_CACHE_SIZE: '50 GB'
  DATASET_MOUNT_FILE_CACHE_PRUNE_TARGET: '0'

code: ../../../../ # Relative from yaml file in order to get the root directory.
command: >-
  python pipelines/scripts/04_model_run_segmentation.py 
  --input_data ${{inputs.input_data}}
  --input_labels ${{inputs.input_labels}}
  --input_model_root ${{inputs.input_model_root}}
  --pretrained_model_path ${{inputs.pretrained_model_path}}
  --label_version ${{inputs.label_version1}}
  --n_channels ${{inputs.n_channels}} 
  --zones_camera ${{inputs.zones_cameras}} 
  --separate_all_cameras ${{inputs.separate_all_cameras}} 
  --log_dir ${{outputs.output_dir}} 
  --alpha_fce ${{inputs.alpha_fce}}
  --input_height ${{inputs.input_height}}
  --input_width ${{inputs.input_width}}
  --detection_mode ${{inputs.detection_mode}}
  --include_metadata ${{inputs.include_metadata}}
  --model_seg_type ${{inputs.model_seg_type}}
  --lb_model_run_name ${{inputs.lb_model_run_name}}
  --tyre_type ${{inputs.tyre_type}}
  --usines ${{inputs.usine1}}
  --dataset_mode ${{inputs.dataset_mode}}
  --thresholds_model_run ${{inputs.thresholds_model_run}}
  --input_model_segmentation_path ${{inputs.input_model_segmentation_path}}
  --lb_model_name ${{inputs.lb_model_name}}

inputs:
  input_data: 
    type: uri_folder
    path: azureml://datastores/ara_b2b_qualif_data/paths/
    mode: ro_mount

  input_labels: 
    type: uri_folder
    path: azureml://datastores/ara_b2b_qualif_label/paths/
    mode: ro_mount

  input_model_root: 
    type: uri_folder
    path: azureml://datastores/workspaceblobstore/paths/
    mode: ro_mount

outputs:
  output_dir:
    type: uri_folder
    path: azureml://datastores/workspaceblobstore/paths/semantic_seg
    mode: rw_mount

environment: azureml:torch@latest
compute: azureml:T4-TC-light-illimited

I have removed personnal arguments

@tmignot63 tmignot63 added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Jul 27, 2023
@yonzhan
Copy link
Collaborator

yonzhan commented Jul 27, 2023

Thank you for opening this issue, we will look into it.

@microsoft-github-policy-service microsoft-github-policy-service bot added question The issue doesn't require a change to the product in order to be resolved. Most issues start as that customer-reported Issues that are reported by GitHub users external to the Azure organization. Auto-Assign Auto assign by bot CXP Attention This issue is handled by CXP team. Machine Learning extension/ml labels Jul 27, 2023
@navba-MSFT navba-MSFT self-assigned this Jul 28, 2023
@navba-MSFT
Copy link
Contributor

@tmignot63 Thanks for reaching out to us and reporting this issue. Could you please update your ml extension by running the below command and check if that helps ?

az extension update -n ml

Awaiting your reply.

@navba-MSFT navba-MSFT added the needs-author-feedback More information is needed from author to address the issue. label Jul 28, 2023
@tmignot63
Copy link
Author

tmignot63 commented Jul 28, 2023 via email

@microsoft-github-policy-service microsoft-github-policy-service bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jul 28, 2023
@navba-MSFT
Copy link
Contributor

@tmignot63 Thanks for getting back. We will now proceed with closure of this GitHub issue. If you need any further assistance on this issue in future, please feel free to reopen this thread. We would be happy to help.

@navba-MSFT navba-MSFT removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto-Assign Auto assign by bot bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. CXP Attention This issue is handled by CXP team. extension/ml Machine Learning question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants