Fixes to modular DFP examples and benchmarks (#1429)

- Fix modular DFP benchmark and examples after `DFPArgParser`updates from PR #1245 - Shorten `load` file paths in example control messages - Doc fixes Fixes #1431 Fixes #1432 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Eli Fajardo (https://github.com/efajardo-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: #1429
nv-morpheus · Dec 13, 2023 · 48ffcbd · 48ffcbd
1 parent 572f436
commit 48ffcbd
Show file tree

Hide file tree

Showing 23 changed files with 56 additions and 41 deletions.
diff --git a/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md b/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md
@@ -527,7 +527,7 @@ From the `examples/digital_fingerprinting/production` dir, run:
 ```bash
 docker compose run morpheus_pipeline bash
 ```
-To run the DFP pipelines with the example datasets within the container, run:
+To run the DFP pipelines with the example datasets within the container, run the following from `examples/digital_fingerprinting/production/morpheus`:
 
 * Duo Training Pipeline
     ```bash
@@ -560,7 +560,7 @@ To run the DFP pipelines with the example datasets within the container, run:
         --start_time "2022-08-01" \
         --duration "60d" \
         --train_users generic \
-        --input_file "./control_messages/duo_payload_load_train_inference.json" 
+        --input_file "./control_messages/duo_payload_load_train_inference.json"
     ```
 
 * Azure Training Pipeline
@@ -594,7 +594,7 @@ To run the DFP pipelines with the example datasets within the container, run:
         --start_time "2022-08-01" \
         --duration "60d" \
         --train_users generic \
-        --input_file "./control_messages/azure_payload_load_train_inference.json" 
+        --input_file "./control_messages/azure_payload_load_train_inference.json"
     ```
 
 ### Output Fields
@@ -615,4 +615,4 @@ In addition to this, for each input feature the following output fields will exi
 | `<feature name>_z_loss` | FLOAT | The loss z-score |
 | `<feature name>_pred` | FLOAT | The predicted value |
 
-Refer to [DFPInferenceStage](6_digital_fingerprinting_reference.md#inference-stage-dfpinferencestage) for more on these fields.
+Refer to [DFPInferenceStage](6_digital_fingerprinting_reference.md#inference-stage-dfpinferencestage) for more on these fields.
diff --git a/examples/digital_fingerprinting/production/morpheus/benchmarks/README.md b/examples/digital_fingerprinting/production/morpheus/benchmarks/README.md
@@ -103,11 +103,11 @@ To ensure the [file_to_df_loader.py](../../../../../morpheus/loaders/file_to_df_
 export MORPHEUS_FILE_DOWNLOAD_TYPE=dask
 ```
 
-Benchmarks for an individual workflow can be run using the following in your dev container:
+Benchmarks for an individual workflow can be run from `examples/digital_fingerprinting/production/morpheus` in your dev container:
 
 ```
 
-pytest -s --log-level=WARN --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py::<test-workflow>
+pytest -s --log-level=WARN --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py::<test-workflow>
 ```
 The `-s` option allows outputs of pipeline execution to be displayed so you can ensure there are no errors while running your benchmarks.
 
@@ -137,12 +137,12 @@ The `--benchmark-warmup` and `--benchmark-warmup-iterations` options are used to
 
 For example, to run E2E benchmarks on the DFP training (modules) workflow on the azure logs:
 ```
-pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py::test_dfp_modules_azure_payload_lti_e2e
+pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py::test_dfp_modules_azure_payload_lti_e2e
 ```
 
 To run E2E benchmarks on all workflows:
 ```
-pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py
+pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py
 ```
 
 Here are the benchmark comparisons for individual tests. When the control message type is "payload", the rolling window stage is bypassed, whereas when it is "streaming", the windows are created with historical data.

diff --git a/examples/digital_fingerprinting/production/morpheus/benchmarks/benchmark_conf_generator.py b/examples/digital_fingerprinting/production/morpheus/benchmarks/benchmark_conf_generator.py
@@ -161,6 +161,8 @@ def get_module_conf(self):
                                       source=(self.source),
                                       tracking_uri=mlflow.get_tracking_uri(),
                                       silence_monitors=True,
+                                      mlflow_experiment_name_formatter=self._get_experiment_name_formatter(),
+                                      mlflow_model_name_formatter=self._get_model_name_formatter(),
                                       train_users='generic')
         dfp_arg_parser.init()
         config_generator = ConfigGenerator(self.pipe_config, dfp_arg_parser, self.get_schema())

diff --git a/.../digital_fingerprinting/production/morpheus/control_messages/azure_payload_inference.json b/.../digital_fingerprinting/production/morpheus/control_messages/azure_payload_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-inference-data/*.json"
+				"../../../data/dfp/azure-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...ngerprinting/production/morpheus/control_messages/azure_payload_load_train_inference.json b/...ngerprinting/production/morpheus/control_messages/azure_payload_load_train_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },
@@ -28,7 +28,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-inference-data/*.json"
+				"../../../data/dfp/azure-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...ital_fingerprinting/production/morpheus/control_messages/azure_payload_load_training.json b/...ital_fingerprinting/production/morpheus/control_messages/azure_payload_load_training.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },

diff --git a/examples/digital_fingerprinting/production/morpheus/control_messages/azure_payload_lti.json b/examples/digital_fingerprinting/production/morpheus/control_messages/azure_payload_lti.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },
@@ -34,7 +34,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-inference-data/*.json"
+				"../../../data/dfp/azure-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...s/digital_fingerprinting/production/morpheus/control_messages/azure_payload_training.json b/...s/digital_fingerprinting/production/morpheus/control_messages/azure_payload_training.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...igital_fingerprinting/production/morpheus/control_messages/azure_streaming_inference.json b/...igital_fingerprinting/production/morpheus/control_messages/azure_streaming_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-inference-data/*.json"
+				"../../../data/dfp/azure-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...ples/digital_fingerprinting/production/morpheus/control_messages/azure_streaming_lti.json b/...ples/digital_fingerprinting/production/morpheus/control_messages/azure_streaming_lti.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },
@@ -34,7 +34,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-inference-data/*.json"
+				"../../../data/dfp/azure-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...digital_fingerprinting/production/morpheus/control_messages/azure_streaming_training.json b/...digital_fingerprinting/production/morpheus/control_messages/azure_streaming_training.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/azure-training-data/*.json"
+				"../../../data/dfp/azure-training-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...es/digital_fingerprinting/production/morpheus/control_messages/duo_payload_inference.json b/...es/digital_fingerprinting/production/morpheus/control_messages/duo_payload_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...fingerprinting/production/morpheus/control_messages/duo_payload_load_train_inference.json b/...fingerprinting/production/morpheus/control_messages/duo_payload_load_train_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-training-data/*.json"
+				"../../../data/dfp/duo-training-data/*.json"
 			  ]
 			}
 		  },
@@ -28,7 +28,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/examples/digital_fingerprinting/production/morpheus/control_messages/duo_payload_lti.json b/examples/digital_fingerprinting/production/morpheus/control_messages/duo_payload_lti.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-training-data/*.json"
+				"../../../data/dfp/duo-training-data/*.json"
 			  ]
 			}
 		  },
@@ -34,7 +34,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...es/digital_fingerprinting/production/morpheus/control_messages/duo_payload_only_load.json b/...es/digital_fingerprinting/production/morpheus/control_messages/duo_payload_only_load.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  }

diff --git a/...les/digital_fingerprinting/production/morpheus/control_messages/duo_payload_training.json b/...les/digital_fingerprinting/production/morpheus/control_messages/duo_payload_training.json
@@ -7,7 +7,7 @@
           "properties": {
             "loader_id": "fsspec",
             "files": [
-              "../../../../examples/data/dfp/duo-training-data/*.json"
+              "../../../data/dfp/duo-training-data/*.json"
             ]
           }
         },

diff --git a/.../digital_fingerprinting/production/morpheus/control_messages/duo_streaming_inference.json b/.../digital_fingerprinting/production/morpheus/control_messages/duo_streaming_inference.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/examples/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_lti.json b/examples/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_lti.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../../examples/data/dfp/duo-training-data/*.json"
+				"../../../data/dfp/duo-training-data/*.json"
 			  ]
 			}
 		  },
@@ -34,7 +34,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/.../digital_fingerprinting/production/morpheus/control_messages/duo_streaming_only_load.json b/.../digital_fingerprinting/production/morpheus/control_messages/duo_streaming_only_load.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  }

diff --git a/...es/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_payload.json b/...es/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_payload.json
@@ -7,7 +7,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-training-data/*.json"
+				"../../../data/dfp/duo-training-data/*.json"
 			  ]
 			}
 		  },
@@ -28,7 +28,7 @@
 			"properties": {
 			  "loader_id": "fsspec",
 			  "files": [
-				"../../../../examples/data/dfp/duo-inference-data/*.json"
+				"../../../data/dfp/duo-inference-data/*.json"
 			  ]
 			}
 		  },

diff --git a/...s/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_training.json b/...s/digital_fingerprinting/production/morpheus/control_messages/duo_streaming_training.json
@@ -7,7 +7,7 @@
           "properties": {
             "loader_id": "fsspec",
             "files": [
-              "../../../../examples/data/dfp/duo-training-data/*.json"
+              "../../../data/dfp/duo-training-data/*.json"
             ]
           }
         },

diff --git a/...ples/digital_fingerprinting/production/morpheus/dfp_integrated_training_batch_pipeline.py b/...ples/digital_fingerprinting/production/morpheus/dfp_integrated_training_batch_pipeline.py
@@ -103,12 +103,14 @@
               help=("The MLflow tracking URI to connect to the tracking backend."))
 @click.option('--mlflow_experiment_name_template',
               type=str,
-              default="dfp/{source}/training/{reg_model_name}",
-              help="The MLflow experiment name template to use when logging experiments. ")
+              default=None,
+              help=("The MLflow experiment name template to use when logging experiments."
+                    "If None, defaults to dfp/source/training/{reg_model_name}"))
 @click.option('--mlflow_model_name_template',
               type=str,
-              default="DFP-{source}-{user_id}",
-              help="The MLflow model name template to use when logging models. ")
+              default=None,
+              help=("The MLflow model name template to use when logging models."
+                    "If None, defaults to DFP-source-{user_id}"))
 @click.option("--disable_pre_filtering",
               is_flag=True,
               help=("Enabling this option will skip pre-filtering of json messages. "
@@ -140,6 +142,10 @@ def run_pipeline(source: str,
     if (skip_user and only_user):
         logging.error("Option --skip_user and --only_user are mutually exclusive. Exiting")
 
+    if mlflow_experiment_name_template is None:
+        mlflow_experiment_name_template = f'dfp/{source}/training/' + '{reg_model_name}'
+    if mlflow_model_name_template is None:
+        mlflow_model_name_template = f'DFP-{source}-' + '{user_id}'
     dfp_arg_parser = DFPArgParser(skip_user,
                                   only_user,
                                   start_time,

diff --git a/.../digital_fingerprinting/production/morpheus/dfp_integrated_training_streaming_pipeline.py b/.../digital_fingerprinting/production/morpheus/dfp_integrated_training_streaming_pipeline.py
@@ -103,12 +103,14 @@
               help=("The MLflow tracking URI to connect to the tracking backend."))
 @click.option('--mlflow_experiment_name_template',
               type=str,
-              default="dfp/{source}/training/{reg_model_name}",
-              help="The MLflow experiment name template to use when logging experiments. ")
+              default=None,
+              help=("The MLflow experiment name template to use when logging experiments."
+                    "If None, defaults to dfp/source/training/{reg_model_name}"))
 @click.option('--mlflow_model_name_template',
               type=str,
-              default="DFP-{source}-{user_id}",
-              help="The MLflow model name template to use when logging models. ")
+              default=None,
+              help=("The MLflow model name template to use when logging models."
+                    "If None, defaults to DFP-source-{user_id}"))
 @click.option('--bootstrap_servers',
               type=str,
               default="localhost:9092",
@@ -152,6 +154,11 @@ def run_pipeline(source: str,
     if (skip_user and only_user):
         logging.error("Option --skip_user and --only_user are mutually exclusive. Exiting")
 
+    if mlflow_experiment_name_template is None:
+        mlflow_experiment_name_template = f'dfp/{source}/training/' + '{reg_model_name}'
+    if mlflow_model_name_template is None:
+        mlflow_model_name_template = f'DFP-{source}-' + '{user_id}'
+
     dfp_arg_parser = DFPArgParser(skip_user,
                                   only_user,
                                   start_time,