Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to modular DFP examples and benchmarks #1429

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -527,7 +527,7 @@ From the `examples/digital_fingerprinting/production` dir, run:
```bash
docker compose run morpheus_pipeline bash
```
To run the DFP pipelines with the example datasets within the container, run:
To run the DFP pipelines with the example datasets within the container, run the following from `examples/digital_fingerprinting/production/morpheus`:

* Duo Training Pipeline
```bash
Expand Down Expand Up @@ -560,7 +560,7 @@ To run the DFP pipelines with the example datasets within the container, run:
--start_time "2022-08-01" \
--duration "60d" \
--train_users generic \
--input_file "./control_messages/duo_payload_load_train_inference.json"
--input_file "./control_messages/duo_payload_load_train_inference.json"
```

* Azure Training Pipeline
Expand Down Expand Up @@ -594,7 +594,7 @@ To run the DFP pipelines with the example datasets within the container, run:
--start_time "2022-08-01" \
--duration "60d" \
--train_users generic \
--input_file "./control_messages/azure_payload_load_train_inference.json"
--input_file "./control_messages/azure_payload_load_train_inference.json"
```

### Output Fields
Expand All @@ -615,4 +615,4 @@ In addition to this, for each input feature the following output fields will exi
| `<feature name>_z_loss` | FLOAT | The loss z-score |
| `<feature name>_pred` | FLOAT | The predicted value |

Refer to [DFPInferenceStage](6_digital_fingerprinting_reference.md#inference-stage-dfpinferencestage) for more on these fields.
Refer to [DFPInferenceStage](6_digital_fingerprinting_reference.md#inference-stage-dfpinferencestage) for more on these fields.
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,11 @@ To ensure the [file_to_df_loader.py](../../../../../morpheus/loaders/file_to_df_
export MORPHEUS_FILE_DOWNLOAD_TYPE=dask
```

Benchmarks for an individual workflow can be run using the following in your dev container:
Benchmarks for an individual workflow can be run from `examples/digital_fingerprinting/production/morpheus` in your dev container:

```

pytest -s --log-level=WARN --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py::<test-workflow>
pytest -s --log-level=WARN --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py::<test-workflow>
```
The `-s` option allows outputs of pipeline execution to be displayed so you can ensure there are no errors while running your benchmarks.

Expand Down Expand Up @@ -137,12 +137,12 @@ The `--benchmark-warmup` and `--benchmark-warmup-iterations` options are used to

For example, to run E2E benchmarks on the DFP training (modules) workflow on the azure logs:
```
pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py::test_dfp_modules_azure_payload_lti_e2e
pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py::test_dfp_modules_azure_payload_lti_e2e
```

To run E2E benchmarks on all workflows:
```
pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave test_bench_e2e_dfp_pipeline.py
pytest -s --benchmark-enable --benchmark-warmup=on --benchmark-warmup-iterations=1 --benchmark-autosave benchmarks/test_bench_e2e_dfp_pipeline.py
```

Here are the benchmark comparisons for individual tests. When the control message type is "payload", the rolling window stage is bypassed, whereas when it is "streaming", the windows are created with historical data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@ def get_module_conf(self):
source=(self.source),
tracking_uri=mlflow.get_tracking_uri(),
silence_monitors=True,
mlflow_experiment_name_formatter=self._get_experiment_name_formatter(),
mlflow_model_name_formatter=self._get_model_name_formatter(),
train_users='generic')
dfp_arg_parser.init()
config_generator = ConfigGenerator(self.pipe_config, dfp_arg_parser, self.get_schema())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-inference-data/*.json"
"../../../data/dfp/azure-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand All @@ -28,7 +28,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-inference-data/*.json"
"../../../data/dfp/azure-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand All @@ -34,7 +34,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-inference-data/*.json"
"../../../data/dfp/azure-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-inference-data/*.json"
"../../../data/dfp/azure-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand All @@ -34,7 +34,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-inference-data/*.json"
"../../../data/dfp/azure-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/azure-training-data/*.json"
"../../../data/dfp/azure-training-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand All @@ -28,7 +28,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand All @@ -34,7 +34,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand All @@ -34,7 +34,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand All @@ -28,7 +28,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-inference-data/*.json"
"../../../data/dfp/duo-inference-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"loader_id": "fsspec",
"files": [
"../../../../examples/data/dfp/duo-training-data/*.json"
"../../../data/dfp/duo-training-data/*.json"
]
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,14 @@
help=("The MLflow tracking URI to connect to the tracking backend."))
@click.option('--mlflow_experiment_name_template',
type=str,
default="dfp/{source}/training/{reg_model_name}",
help="The MLflow experiment name template to use when logging experiments. ")
default=None,
help=("The MLflow experiment name template to use when logging experiments."
"If None, defaults to dfp/source/training/{reg_model_name}"))
@click.option('--mlflow_model_name_template',
type=str,
default="DFP-{source}-{user_id}",
help="The MLflow model name template to use when logging models. ")
default=None,
help=("The MLflow model name template to use when logging models."
"If None, defaults to DFP-source-{user_id}"))
@click.option("--disable_pre_filtering",
is_flag=True,
help=("Enabling this option will skip pre-filtering of json messages. "
Expand Down Expand Up @@ -140,6 +142,10 @@ def run_pipeline(source: str,
if (skip_user and only_user):
logging.error("Option --skip_user and --only_user are mutually exclusive. Exiting")

if mlflow_experiment_name_template is None:
mlflow_experiment_name_template = f'dfp/{source}/training/' + '{reg_model_name}'
if mlflow_model_name_template is None:
mlflow_model_name_template = f'DFP-{source}-' + '{user_id}'
dfp_arg_parser = DFPArgParser(skip_user,
only_user,
start_time,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,14 @@
help=("The MLflow tracking URI to connect to the tracking backend."))
@click.option('--mlflow_experiment_name_template',
type=str,
default="dfp/{source}/training/{reg_model_name}",
help="The MLflow experiment name template to use when logging experiments. ")
default=None,
help=("The MLflow experiment name template to use when logging experiments."
"If None, defaults to dfp/source/training/{reg_model_name}"))
@click.option('--mlflow_model_name_template',
type=str,
default="DFP-{source}-{user_id}",
help="The MLflow model name template to use when logging models. ")
default=None,
help=("The MLflow model name template to use when logging models."
"If None, defaults to DFP-source-{user_id}"))
@click.option('--bootstrap_servers',
type=str,
default="localhost:9092",
Expand Down Expand Up @@ -152,6 +154,11 @@ def run_pipeline(source: str,
if (skip_user and only_user):
logging.error("Option --skip_user and --only_user are mutually exclusive. Exiting")

if mlflow_experiment_name_template is None:
mlflow_experiment_name_template = f'dfp/{source}/training/' + '{reg_model_name}'
if mlflow_model_name_template is None:
mlflow_model_name_template = f'DFP-{source}-' + '{user_id}'

dfp_arg_parser = DFPArgParser(skip_user,
only_user,
start_time,
Expand Down