Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Regression Indicators #186

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

drewrip
Copy link
Contributor

@drewrip drewrip commented Jan 24, 2024

This sets up the notebooks performance test to create and use a RegressionIndicator from this matrix-benchmarking PR.

The idea is that we can create a number of different indicators from the base RegressionIndicator that provides a generic and flexible result that we can save and process upstream. I created a simple ZScoreIndicator to demonstrate this, it triggers when the new payload's KPI values are more than 3 standard deviations away from the mean value for that KPI among the LTS results. Hopefully we can use this to create a simple wrapper for datastax/Hunter.

There are still 2 work in progress issues to resolve with this PR:

  1. Missing Settings: Some of the settings for an individual MatrixEntry are only present in the Matrix itself. In our testing payloads, an example would be version. I'm not sure why these settings aren't present in the MatrixEntry itself, but as currently written we can only restrict the settings that are present in the MatrixEntry.

  2. Gathered Results: As discussed some of the LTS payloads have "gathered" results. I wasn't sure how to turn this off and couldn't flatten the Payloads since I couldn't deepcopy them.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2024
Copy link

openshift-ci bot commented Jan 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kpouget for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 24, 2024
zscore = regression.ZScoreIndicator(
entry,
lts_entries,
settings_filter={"image": "tensorflow:2023.1"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to make this dynamic?
common.Matrix.settings["image"] should contain all the available images

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try to do the filtering outside of the regression.ZScoreIndicator function ? 🤔
I think it would be clearer if you could pass only the relevant lts_entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add the filtering outside first. It should be as simple as removing the settings_filter option. How do you mean dynamic though? As in, we want to run a regression analysis for each unique "image"?

entry,
lts_entries,
settings_filter={"image": "tensorflow:2023.1"},
combine_funcs={"notebook_performance_benchmark_time": np.mean}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use notebook_performance_benchmark_min_time, which already provides an "aggregated" value?
I think min() is more relevant than mean(), but that's open to discussion
if ever we chose to use mean, then we should expose it as a KPI instead of _min
(the idea is to expose as a KPI the value we're using for the regression analyses)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have explained this better. The combine_funcs={"notebook_performance_benchmark_time": np.mean} is just an optional way of handling KPIs that provide either a list[int] or list[float]. The combine_funcs option can be omitted and any KPIs that use a list of values will be skipped. All the other KPIs such as notebook_performance_benchmark_min_time are analyzed as normal.

for lts_entry in common.LTS_Matrix.all_records(): pass
lts_entries_raw = list(common.LTS_Matrix.all_records())
# Temporarily skip the gathered results
lts_entries = list(filter(lambda entry: type(entry.results) is not list, lts_entries_raw))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to take the 1st value of the gathered entries?
in theory, all the entries in entry.results should contain the same result +/- noise, as they have been executed in the same conditions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll just do that. I previously tried flattening them, but I wasn't able to deepcopy MatrixEntry's


logging.info(f"Received {common.LTS_Matrix.count_records()} historic LTS entries")
for lts_entry in common.LTS_Matrix.all_records(): pass
lts_entries_raw = list(common.LTS_Matrix.all_records())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't really need the list() here. Without it, lst_entries_row is a generator, that is processed (only once) at the line below
removing list() saves memory and processing time, especially if the list gets large

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2024
@drewrip drewrip changed the title WIP Analyze Regression Indicators Add Regression Indicators Jan 25, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2024
@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

@kpouget
Copy link
Contributor

kpouget commented Jan 26, 2024

@drewrip something went wrong with build (matrixbenchmarking commit wasn't available)
try git commit --amend --no-edit and force push it, and try to launch the test again

@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

1 similar comment
@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

results

Looks like the analysis took place as expected. No regressions were found but there were a number of issues regrading payload validation. Obviously we need to make sure the payload schemas are consistent, but I'm wondering why the matrix entry location was /tmp/matrix_benchmarking_results/expe/from_url.

@kpouget
Copy link
Contributor

kpouget commented Jan 26, 2024

results

Looks like the analysis took place as expected. No regressions were found but there were a number of issues regrading payload validation. Obviously we need to make sure the payload schemas are consistent, but I'm wondering why the matrix entry location was /tmp/matrix_benchmarking_results/expe/from_url.

good :)
yes, we'll need to tidy up the entries on OpenSearch!

I'm wondering why the matrix entry location was /tmp/matrix_benchmarking_results/expe/from_url.
this location is where the entry is downloaded, when launching a "plotting" job, as you did. So this part is expected :)

We'll need to find a way to save the regression.json files, even when doing a replot. One way to do it is to include it in the LTS Documentation report, around this place: https://github.com/openshift-psap/topsail/blob/main/projects/notebooks/visualizations/rhods-notebooks-performance/plotting/lts_documentation.py#L73
if you do something like that:

html.Code(yaml.dump(entry.results.lts.regression_results), style={"white-space": "pre-wrap"})

it should appear in the HTML document

@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

1 similar comment
@drewrip
Copy link
Contributor Author

drewrip commented Jan 26, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/184/pull-ci-openshift-psap-topsail-main-rhoai-light/1750426742431944704/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.export=true
/var matbench.lts.opensearch.index=rhoai-notebooks-performance

Comment on lines 89 to 90
regression += [html.H2("regression analysis")]
regression += [html.Code(yaml.dump(entry.results.lts.regression_results, default_flow_style=False), style={"white-space": "pre-wrap"})]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
regression += [html.H2("regression analysis")]
regression += [html.Code(yaml.dump(entry.results.lts.regression_results, default_flow_style=False), style={"white-space": "pre-wrap"})]
regression += [html.H1("regression analysis")]
regression += [html.Code(yaml.dump(lts.regression_results, default_flow_style=False), style={"white-space": "pre-wrap"})]

this adjusts the title size, to make it top level, and takes the regression_results field from lts, which is the focus of this file

@kpouget
Copy link
Contributor

kpouget commented Jan 29, 2024

@drewrip , the regression results wasn't included (or was empty) in this document:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/186/pull-ci-openshift-psap-topsail-main-rhoai-plot/1751000197686628352/artifacts/plot/test/artifacts/report_01_report:%20LTS%20Documentation.html

and we'll need to define a model defining the format of the regression_results field
I guess it should extend the KPI model, maybe the notebooks KPI (in the TOPSAIL part), so that it can be easily located in OpenSearch, with Grafana

Comment on lines 40 to 58
regression: NotebookPerformanceRegression

NotebookPerformanceKPIs = matbench_models.getKPIsModel("NotebookPerformanceKPIs", __name__, kpi.KPIs, kpi.NotebookPerformanceKPI)
NotebookPerformanceRegression = matbench_models.RegressionResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't necessary, if you don't modify matbench_models.RegressionResult, no need to rename it here


class Payload(matbench_models.ExclusiveModel):
schema_name: matbench_models.create_schema_field("rhods-notebooks-perf") = Field(alias="$schema")
metadata: Metadata
results: Results
kpis: Optional[NotebookPerformanceKPIs]
regression_results: Optional[Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you let this here, in the Payload? in MatrixBenchmarking, overall, the results are workload specific, but the payload/metadata/kpis have a predefined structure (that can be extended, but the core is always the same) , so the regression results belong there. (I need to browse these entries to upload them as KPIs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll move them under the Payload

)
zscore_ind = zscore.ZScoreIndicator(entry, controlled_lts_entries)
regression_results[check_setting] = zscore_ind.analyze()
number_of_failures += sum(map(lambda x: x["regression"]["status"], regression_results[check_setting]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in Python, list comprehension are preferred to functional programming:

sum([x[["regression"]["status"]] for x in regression_results[check_setting]])

and would be nice to use the model here, instead of x[["regression"]["status"]
--> x.regression.status (the model or types.SimpleNamespace)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and, I'm not sure you can sum the statuses, if you have [-1, 1], you end up with 0 failures, no?

Copy link
Contributor Author

@drewrip drewrip Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to start moving from functional to Pythonic with dict and list comprehensions, I'll convert that.

and, I'm not sure you can sum the statuses, if you have [-1, 1], you end up with 0 failures, no?

In the current way its implemented this is fine. The statuses are really just boolean flags that are set to 1 if there is some anomaly detected. A simple sum wouldn't be possible if we would like a more complex status, but more details regarding what the regression looks like, and why it is being reported is contained in the RegressionResult structure. If for some reason we want to roll up that info into a more complex "return code" status, then the sum likely wouldn't work but I suppose we'd want to discuss how we want to report regression.

logging.info(f"Received {common.Matrix.count_records()} new entries")

number_of_failures = 0
settings_to_check = ["version", "repeat"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not use to understand exactly what's being checked here, ie how you build the reference data used for the regression checking. We'll need to clarify that, it's important to have that clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I don't see clearly where you define the KPI being used for regression. Do you test all of them? 🤔
they won't necessarily have the same direction "of failure", eg a latency going higher is a failure, but a bandwidth going lower is a failure, so you need to define the regression properties for each of them individually, I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not use to understand exactly what's being checked here, ie how you build the reference data used for the regression checking. We'll need to clarify that, it's important to have that clear

I can try to make that clear. Or we can add a note once we finalize it. At the moment it is considering all of the LTS datapoints with the same settings (minus the image version) to be the historical data. The settings_to_check is just the settings that we want to check for regression against. We iterate over them to perform the regression testing against each.

also, I don't see clearly where you define the KPI being used for regression. Do you test all of them? 🤔
they won't necessarily have the same direction "of failure", eg a latency going higher is a failure, but a bandwidth going lower is a failure, so you need to define the regression properties for each of them individually, I guess

Each indicator has an optional parameter called kpi. There you can specify a list of KPI's that you'd like to check. I suppose this difference in direction could be implemented in each RegressionIndicator we create. However, in the basic ZScoreIndicator there is no distinction made. There are two reasons in this simple instance:

  1. I figured statistically significant variation in either direction was notable. Having a new image version run significantly faster seems like it would be suspicious? It wouldn't be wrong or bad necessarily but it would be good to make sure we know about them.
  2. We also track the direction withing the RegressionResults, we can still recover that information if we'd like by filtering out the entries that exhibit regression in the correct direction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a new image version run significantly faster seems like it would be suspicious? It wouldn't be wrong or bad necessarily but it would be good to make sure we know about them.

this is linked with the must remain stable vs should remain stable distinction that I mentioned here and there.

  • If must remain stable, any positive or negative variation should be flagged. The binary is the same (the hardware also), so the performance shouldn't change significantly.
  • If should_remain_stable, the binary will change, like version bumps, so the new version can be faster, without need to raise a red flag for it. But it can be a flashy green flag :D

@drewrip drewrip force-pushed the analyze branch 4 times, most recently from 2f6c1a9 to b527dfa Compare February 1, 2024 15:27
@drewrip
Copy link
Contributor Author

drewrip commented Feb 5, 2024

/test rhoai-light kserve quick

@drewrip
Copy link
Contributor Author

drewrip commented Feb 9, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance

1 similar comment
@drewrip
Copy link
Contributor Author

drewrip commented Feb 9, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

2 similar comments
@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

Regression workflow working now -> the results are appearing in the LTS Documentation report.

We needed another parse pass.

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

/test rhoai-plot notebooks https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/195/pull-ci-openshift-psap-topsail-main-rhoai-light/1753099883524395008/artifacts/light/test/artifacts/002__sutest_notebooks__benchmark_performance/
/var matbench.workload=rhods-notebooks-performance
/var matbench.config_file=single_notebook_perf.yaml
/var matbench.lts.opensearch.index=regression-test.rhoai-notebooks-performance
/var matbench.lts.opensearch.upload=false

@drewrip
Copy link
Contributor Author

drewrip commented Feb 12, 2024

Example of Regression results can be seen here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants