Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply sanity caps on outlier values in performance reports #20718

Merged

Conversation

rr-it
Copy link
Contributor

@rr-it rr-it commented May 13, 2023

Description:

The values used in performance reports of type 'sum/total' and 'average' are capped to reduce the impact of single failed performance measurements.

This fixes #17847.

Review

@rr-it rr-it force-pushed the issue/17847-performance-report-cap-values branch 4 times, most recently from 7d9da13 to 902745b Compare May 14, 2023 13:00
@rr-it rr-it changed the title [WIP] Apply sanity caps on outlier values in performance reports Apply sanity caps on outlier values in performance reports May 14, 2023
@sgiehl
Copy link
Member

sgiehl commented May 15, 2023

Hi @rr-it,
Thanks for your contribution. I had a look across the code changes and I'm not totally sure what they are meant for.
To confirm I've understood everything correct:

This PR introduces some sort of maximal values for the performance metrics that are used in calculations.
So if there were values tracked higher than the one configured, the calculations will use the configured values instead.

Is that correct?

@rr-it
Copy link
Contributor Author

rr-it commented May 15, 2023

@sgiehl That's perfectly correct.

Please also see my latest comment on #17847:
#17847 (comment)

@rr-it
Copy link
Contributor Author

rr-it commented May 15, 2023

Devices, especially iPhones, sometimes send insane high performance values which don't make any sense.
E.g. DOM processing time of 4 hours instead of 250 microseconds.

Without this PR

  • 10000 values of 250 ms result in an average of 0.250 seconds
    $\frac{10000×250 ms}{10000} = 250 ms$
  • One value of 4 h and 10000 values of 250 ms result in an average of ~ 1.690 seconds
    • $4h = 4×60×60×1000 ms = 14400000 ms$
    • $\frac{14400000 ms + 10000×250 ms}{10001} = 1690 ms$

performance-old

With this PR

time_dom_processing_cap_duration_ms = 50000

  • 10000 values of 250 ms result in an average of 0.250 seconds
    $\frac{10000×250 ms}{10000} = 250 ms$
  • One value of 4 h and 10000 values of 250 ms result in an average of ~ 0.255 seconds
    • $4h = 4×60×60×1000 ms = 14400000 ms$
      $14400000 ms$ capped to $50000 ms$
    • $\frac{50000 ms + 10000×250 ms}{10001} = 255 ms$

performance-new

@rr-it rr-it force-pushed the issue/17847-performance-report-cap-values branch 2 times, most recently from 7a28d51 to 87e09b0 Compare May 23, 2023 09:50
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

If you don't want this PR to be closed automatically in 28 days then you need to assign the label 'Do not close'.

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Jun 7, 2023
@rr-it
Copy link
Contributor Author

rr-it commented Jun 7, 2023

If you don't want this PR to be closed automatically in 28 days then you need to assign the label 'Do not close'.

@sgiehl Please add label 'Do not close'.

@sgiehl sgiehl added Needs Review PRs that need a code review and removed Stale The label used by the Close Stale Issues action labels Jun 7, 2023
@github-actions
Copy link
Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Jun 15, 2023
@michalkleiner michalkleiner removed the Stale The label used by the Close Stale Issues action label Jun 23, 2023
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment. Otherwise the code looks fine to me.
But someone from product team should maybe have a look here first. ping @Stan-vw
I'm not 100% sure regarding the implementation/solution.
With cutting away values if they are too high I see some possible:

  • Only removing too high values, but not also too low values might let look the values better than they are
  • The default values for the config values might be inaccurate as some pages might expect higher loading times than others.
  • The most accurate solution might be to simply throw away the top 1% and bottom 1% of values and only calculate with the rest. This would be a lot more complex to implement and maybe also slower when archiving, though.

If we plan to include that, guess we would need to update our documentation & faqs accordingly.

config/global.ini.php Outdated Show resolved Hide resolved
@Stan-vw
Copy link
Contributor

Stan-vw commented Jun 26, 2023

I think we would like this report to not show the outliers (top & bottom 1% sounds like a fine test to begin with, but as you say that could be more implementation work maybe we can pick fixed values from historic reports).
However, I think we should still have a report that shows all values (if we don't have that already). I've attached some potential ways to visualise this, albeit that I didn't add the relevant axes (days vs seconds):
image
image

@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2023

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Jul 4, 2023
@michalkleiner michalkleiner removed Needs Review PRs that need a code review Stale The label used by the Close Stale Issues action labels Jul 4, 2023
@michalkleiner
Copy link
Contributor

The business logic still needs to be resolved (whether we do fixed values, top and bottom percentage or something else). There's also a merge conflict now which needs to be resolved, plus a change request from Stefan.

I've removed the Needs review tag for now until all the above is addressed at which point it can be marked as Needs review again.

@rr-it rr-it force-pushed the issue/17847-performance-report-cap-values branch from 87e09b0 to 7198b36 Compare July 4, 2023 09:25
@rr-it
Copy link
Contributor Author

rr-it commented Jul 4, 2023

On the business logic - most points were already mentioned:

  • Clear preference to cap by top percentage over fixed values:
    • But: Top percentage is more implementation work.
    • Fixed values already archive the target to make the chart meaningful. (Right now the charts informational value is insignificant.)
    • There is already a fixed value cap. The Maximum Value for Unsigned MEDIUMINT of database: 16777215 ms ~ 4 h 39 min 37 s
  • Top and bottom? No need for bottom value caps as the impact of low values is neglectable.

Without this PR

No outliers

  • 10000 values of 250 ms result in an average of 0.250 seconds
    $\frac{10000×250 ms}{10000} = 250 ms$

Top outlier

  • One value of 4 h and 10000 values of 250 ms result in an average of ~ 1.690 seconds
    • $4h = 4×60×60×1000 ms = 14400000 ms$
    • $\frac{14400000 ms + 10000×250 ms}{10001} = 1690 ms$

Bottom outlier

  • One value of 1 ms and 10000 values of 250 ms result in an average of ~ 0.250 seconds
    • $\frac{1 ms + 10000×250 ms}{10001} = 249.975 ms$

@michalkleiner
Copy link
Contributor

@Stan-vw I guess it's your call on whether we should accept this as is or you want any adjustments.

@rr-it rr-it force-pushed the issue/17847-performance-report-cap-values branch from 7198b36 to 3ae9905 Compare July 4, 2023 12:09
@Stan-vw
Copy link
Contributor

Stan-vw commented Jan 2, 2024

@michalkleiner it's been ages since we discussed this so I could be wrong, but I believe we agreed to merge this without front end configuration option for now. I'm actually surprised to see it's still open rather than merged, what part is not mergable yet?

@rr-it
Copy link
Contributor Author

rr-it commented Jan 3, 2024

@sgiehl You merged branch '5.x-dev' into issue/17847-performance-report-cap-values.

If you like I can revert this merge and instead rebase my changes onto branch '5.x-dev'. And then do a force push of this newly created branch for issue/17847-performance-report-cap-values.

Done: I proceeded as described above.

@rr-it
Copy link
Contributor Author

rr-it commented Jan 3, 2024

The following code throws WARNING/NOTICE in "Matomo Tests / UI (0-3)":

https://github.com/rr-it/matomo/blob/27e6b666b01eea12acefec6271622ed90716ddc9/plugins/PagePerformance/Columns/Base.php#L29-L34

        try {
            $valueCap = Config::getInstance()->PagePerformance[$this->columnName . '_cap_' . $this->type];
        } catch (Exception $ex) {
            // 0 disables cap
            return 0;
        }

The WARNING/NOTICE is like:

2024-01-03T15:31:12.7143577Z WARNING PagePerformance
[2024-01-03 15:24:08 UTC] [5529] /home/runner/work/matomo/matomo/matomo/plugins/PagePerformance/Columns/Base.php(30):
Notice - Undefined index: time_network_cap_duration_ms - Matomo 5.0.0 - Please report this message in the Matomo forums: https://forum.matomo.org (please do a search first as it might have been reported already)
#0/plugins/PagePerformance/Columns/Base.php(30),
#1/plugins/PagePerformance/Columns/Base.php(40),
#2/plugins/PagePerformance/Metrics.php(115),
#3/plugins/PagePerformance/PagePerformance.php(151),
[internal function]: Piwik\Plugins\PagePerformance\PagePerformance->addActionMetrics(),
#5/core/EventDispatcher.php(147),
#6/core/Piwik.php(870),
#7/plugins/Actions/Metrics.php(89),
#8/plugins/Actions/RecordBuilders/ActionReports.php(194),
#9/plugins/Actions/RecordBuilders/ActionReports.php(81)

How do you correctly call Config::getInstance()->PagePerformance['this_index_might_be_not_set']; without getting a WARNING/NOTICE?

@rr-it
Copy link
Contributor Author

rr-it commented Jan 4, 2024

On runs of "Matomo Tests / UI (0-3)" the php-log does not become cluttered anymore with PHP NOTICE-entries.

The added ?? 0 does the trick to hide NOTICE: undefined index. And it also ensures that on missing configuration the default becomes 0:
$valueCap = Config::getInstance()->PagePerformance['this_index_might_be_not_set'] ?? 0;

@sgiehl Now I think this PR is ready to merge.

Copy link
Contributor

If you don't want this PR to be closed automatically in 28 days then you need to assign the label 'Do not close'.

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Jan 19, 2024
@sgiehl sgiehl force-pushed the issue/17847-performance-report-cap-values branch from 3ecf11d to 1a959c8 Compare January 19, 2024 14:16
@sgiehl sgiehl force-pushed the issue/17847-performance-report-cap-values branch from 1a959c8 to 8cf7ca2 Compare January 29, 2024 09:50
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not coming back earlier on this one. I would suggest to update the configuration slightly. Otherwise this imho should look good. We could though consider to add some tests around that, to ensure it works correctly.

config/global.ini.php Outdated Show resolved Hide resolved
@sgiehl sgiehl added this to the 5.1.0 milestone Jan 29, 2024
@sgiehl sgiehl removed the Stale The label used by the Close Stale Issues action label Jan 29, 2024
@sgiehl sgiehl force-pushed the issue/17847-performance-report-cap-values branch from 4b797e1 to 08764d8 Compare January 29, 2024 14:36
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a simple integration test now to cover the value capping in a test. Imho this should be good to merge now. We might need to update the config ui test after merging though

@sgiehl sgiehl merged commit 431f5ba into matomo-org:5.x-dev Jan 30, 2024
21 of 25 checks passed
);
}

public function testShouldNotCapOutlinerValuesWhenConfigured()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgiehl should this be called testShouldCapOutlinerValuesWhenConfigured() without the 'not'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

@sgiehl sgiehl added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Consistent Reports & Analytics UX For bugs and features that make Analytics reporting UI behave more consistently. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance metrics reports incorrect times due to single exceptional high value
5 participants