From d1774dea1976d1fdce398dcb640cfa5143c79a2b Mon Sep 17 00:00:00 2001 From: Olga Kogiou Date: Sun, 13 Oct 2024 18:46:51 -0700 Subject: [PATCH] Add documentation for custom columns derived from DFTracer --- docs/dfanalyzer_custom_columns.rst | 41 ++++++++++++++++++++++++++++++ docs/index.rst | 1 + 2 files changed, 42 insertions(+) create mode 100644 docs/dfanalyzer_custom_columns.rst diff --git a/docs/dfanalyzer_custom_columns.rst b/docs/dfanalyzer_custom_columns.rst new file mode 100644 index 0000000..8f94c95 --- /dev/null +++ b/docs/dfanalyzer_custom_columns.rst @@ -0,0 +1,41 @@ +=========================================== +Custom Columns Derived from DFTracer +=========================================== + +This section describes how to derive custom columns for a DFAnalyzer data framework from DFTracer events using a custom function and load specific fields. + +Example: Custom Workflow Columns +================================ + +We can define custom columns in `DFAnalyzer` by specifying a function that extracts the desired fields from the `json_object` and then loading those fields with their corresponding types. + +Below is an example of how to define a custom function to derive columns from a Pegasus Montage workflow trace: + +.. code-block:: python + + def wf_cols_function(json_object, current_dict, time_approximate, condition_fn, load_data): + d = {} + if "args" in json_object: + if "size" in json_object["args"]: + d["size"] = int(json_object["args"]["size"]) + if "ret" in json_object["args"]: + d["ret"] = int(json_object["args"]["ret"]) + return d + + load_cols_wf = {'size': "int64[pyarrow]", 'ret': "int64[pyarrow]"} + +Next, use this function in `DFAnalyzer` to load traces with the custom columns (here is an example of loading Montage traces): + +.. code-block:: python + + from dfanalyzer import DFAnalyzer + + analyzer = DFAnalyzer( + "/path/to/montage-*-preload.pfw.gz", + load_fn=wf_cols_function, + load_cols=load_cols_wf + ) + +Here, the custom columns `size`, `ret`, and `cmd` are loaded into the `DFAnalyzer` using the `wf_cols_function`. + +You can modify the function and column types to match the fields relevant to your workload. diff --git a/docs/index.rst b/docs/index.rst index 0ebd629..2519345 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -28,6 +28,7 @@ DFTracer: is a library for profiling I/O calls and application functions. dfanalyzer_conf dfanalyzer_alcf_polaris dfanalyzer_overlap_analysis + dfanalyzer_custom_columns .. toctree:: :maxdepth: 2