[SDK] Fix invalid flow path in run YAML (#1656)

# Description This pull request introduces several changes to the `src/promptflow` directory. The most important changes include enhancing the validation of the `flow` property in the `RunSchema` class, adding a new test configuration file for bulk runs, and adding a new test case to validate the behavior of the code when an invalid flow path is provided. Main changes: * <a href="diffhunk://#diff-db38cc07d25efdbc0622e8e9352e07e34c502d36bdad1954bcd52db156192c9fL63-R80">`src/promptflow/promptflow/_sdk/schemas/_run.py`</a>: Enhanced the validation of the `flow` property in the `RunSchema` class by adding a new field called `RemoteFlowStr` and a new validation in the `_validate` method. <a href="diffhunk://#diff-db38cc07d25efdbc0622e8e9352e07e34c502d36bdad1954bcd52db156192c9fL63-R80">[1]</a> <a href="diffhunk://#diff-db38cc07d25efdbc0622e8e9352e07e34c502d36bdad1954bcd52db156192c9fR51-R67">[2]</a> * <a href="diffhunk://#diff-b33d2ea22b9e9679f7a70a7beb5bd27b64c0bdb575e425aeece5322ff550ddbbR1-R11">`src/promptflow/tests/test_configs/runs/bulk_run_invalid_flow.yaml`</a>: Added a new test configuration file for bulk runs, `bulk_run_invalid_flow.yaml`. * <a href="diffhunk://#diff-c3d1c4e4539af1a59525218043dad93dc866b761a70a16a21783e57a7d0adac5R97-R103">`src/promptflow/tests/sdk_cli_test/unittests/test_run.py`</a>: Added a new test case to validate the behavior of the code when an invalid flow path is provided. Other changes: * <a href="diffhunk://#diff-41ec3f7c4b5d4c0e670407d3c00a03a6966d7ebf617b1536473e33a12e2bc765R5-R12">`src/promptflow/CHANGELOG.md`</a>: Introduced a new feature to the Executor, a `@trace` decorator, which allows logging traces for functions called by tools. However, it was later decided to remove this decorator without mentioning the reason in the diff. <a href="diffhunk://#diff-41ec3f7c4b5d4c0e670407d3c00a03a6966d7ebf617b1536473e33a12e2bc765R5-R12">[1]</a> <a href="diffhunk://#diff-41ec3f7c4b5d4c0e670407d3c00a03a6966d7ebf617b1536473e33a12e2bc765L18-R23">[2]</a> Please add an informative description that covers that changes made by the pull request and link all relevant issues. # All Promptflow Contribution checklist: - [ ] **The pull request does not introduce [breaking changes].** - [ ] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [ ] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [ ] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [ ] Title of the pull request is clear and informative. - [ ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes.
microsoft · Jan 4, 2024 · ba204a2 · ba204a2
1 parent b6cdb70
commit ba204a2
Show file tree

Hide file tree

Showing 5 changed files with 61 additions and 4 deletions.
diff --git a/src/promptflow/CHANGELOG.md b/src/promptflow/CHANGELOG.md
@@ -2,9 +2,16 @@
 
 ## 1.4.0 (Upcoming)
 
+### Features Added
+
+- [Executor] Calculate system_metrics recursively in api_calls.
+- [Executor] Add flow root level api_calls, so that user can overview the aggregated metrics of a flow.
+- [Executor] Add @trace decorator to make it possible to log traces for functions that are called by tools.
+
 ### Bugs Fixed
 
 - Fix unaligned inputs & outputs or pandas exception during get details against run in Azure.
+- Fix loose flow path validation for run schema.
 
 ## 1.3.0 (2023.12.27)
 
@@ -13,9 +20,6 @@
 - Add support to configure prompt flow home directory via environment variable `PF_HOME_DIRECTORY`.
   - Please set before importing `promptflow`, otherwise it won't take effect.
 - [Executor] Handle KeyboardInterrupt in flow test so that the final state is Canceled.
-- [Executor] Calculate system_metrics recursively in api_calls.
-- [Executor] Add flow root level api_calls, so that user can overview the aggregated metrics of a flow.
-- [Executor] Add @trace decorator to make it possible to log traces for functions that are called by tools.
 
 ### Bugs Fixed
 - [SDK/CLI] Fix single node run doesn't work when consuming sub item of upstream node

diff --git a/src/promptflow/promptflow/_sdk/schemas/_run.py b/src/promptflow/promptflow/_sdk/schemas/_run.py
@@ -48,6 +48,23 @@ def _validate(self, value):
             )
 
 
+class RemoteFlowStr(fields.Str):
+    default_error_messages = {
+        "invalid_path": "Invalid remote flow path. Currently only azureml:<flow-name> is supported",
+    }
+
+    def _validate(self, value):
+        # inherited validations like required, allow_none, etc.
+        super(RemoteFlowStr, self)._validate(value)
+
+        if value is None:
+            return
+        if not isinstance(value, str) or not value.startswith("azureml:"):
+            raise self.make_error(
+                "invalid_path",
+            )
+
+
 class RunSchema(YamlFileSchema):
     """Base schema for all run schemas."""
 
@@ -60,7 +77,7 @@ class RunSchema(YamlFileSchema):
     properties = fields.Dict(keys=fields.Str(), values=fields.Str(allow_none=True))
     # endregion: common fields
 
-    flow = UnionField([LocalPathField(required=True), fields.Str(required=True)])
+    flow = UnionField([LocalPathField(required=True), RemoteFlowStr(required=True)])
     # inputs field
     data = UnionField([LocalPathField(), RemotePathStr()])
     column_mapping = fields.Dict(keys=fields.Str)

diff --git a/src/promptflow/tests/sdk_cli_test/unittests/test_run.py b/src/promptflow/tests/sdk_cli_test/unittests/test_run.py
@@ -94,6 +94,20 @@ def test_dot_env_resolve(self):
         run = load_run(source=source, params_override=[{"name": run_id}])
         assert run.environment_variables == {"FOO": "BAR"}
 
+    def test_run_invalid_flow_path(self):
+        run_id = str(uuid.uuid4())
+        source = f"{RUNS_DIR}/bulk_run_invalid_flow_path.yaml"
+        with pytest.raises(ValidationError) as e:
+            load_run(source=source, params_override=[{"name": run_id}])
+        assert "Can't find directory or file in resolved absolute path:" in str(e.value)
+
+    def test_run_invalid_remote_flow(self):
+        run_id = str(uuid.uuid4())
+        source = f"{RUNS_DIR}/bulk_run_invalid_remote_flow_str.yaml"
+        with pytest.raises(ValidationError) as e:
+            load_run(source=source, params_override=[{"name": run_id}])
+        assert "Invalid remote flow path. Currently only azureml:<flow-name> is supported" in str(e.value)
+
     def test_data_not_exist_validation_error(self):
         source = f"{RUNS_DIR}/sample_bulk_run.yaml"
         with pytest.raises(ValidationError) as e:

diff --git a/src/promptflow/tests/test_configs/runs/bulk_run_invalid_flow_path.yaml b/src/promptflow/tests/test_configs/runs/bulk_run_invalid_flow_path.yaml
@@ -0,0 +1,11 @@
+name: flow_run_20230629_101205
+description: sample bulk run
+# flow relative to current working directory should not be supported.
+flow: tests/test_configs/flows/web_classification
+data: ../datas/webClassification1.jsonl
+column_mapping:
+   url: "${data.url}"
+variant: ${summarize_text_content.variant_0}
+
+# run config: env related
+environment_variables: env_file
diff --git a/src/promptflow/tests/test_configs/runs/bulk_run_invalid_remote_flow_str.yaml b/src/promptflow/tests/test_configs/runs/bulk_run_invalid_remote_flow_str.yaml
@@ -0,0 +1,11 @@
+name: flow_run_20230629_101205
+description: sample bulk run
+# invalid remote flow format should not be supported.
+flow: invalid_remote_flow
+data: ../datas/webClassification1.jsonl
+column_mapping:
+   url: "${data.url}"
+variant: ${summarize_text_content.variant_0}
+
+# run config: env related
+environment_variables: env_file