-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mechanism to invalidate dependent segment archives #22296
base: 5.x-dev
Are you sure you want to change the base?
Conversation
d04bb10
to
a87fa95
Compare
fyi just came across this PR randomly and saw the many code changes. You might have already thought about it, but wanted to mention it in case we haven't. When we're archiving and diff --git a/core/ArchiveProcessor.php b/core/ArchiveProcessor.php
index 19ae9c992a..7b1d06de8d 100644
--- a/core/ArchiveProcessor.php
+++ b/core/ArchiveProcessor.php
@@ -771,6 +771,10 @@ class ArchiveProcessor
self::$isRootArchivingRequest = false;
try {
+ $invalidator = StaticContainer::get('Piwik\Archive\ArchiveInvalidator');
+ $invalidator->markArchivesInvalidated($params->getIdSites(), [date...], $newSegment,
+ $params->getPeriod()->getLabel() != 'range', $forceInvalidateNonexistentRanges, $plugin);
+
$parameters = new ArchiveProcessor\Parameters($params->getSite(), $params->getPeriod(), $newSegment);
$parameters->onlyArchiveRequestedPlugin(); Might be worth a test if we haven't checked that yet. I believe this would also avoid any possible race conditions should multiple archivers be running concurrently etc. |
Thanks @tsteur for pointing towards that. Doing the invalidation only in case it should be archived again makes everything a bit easier I guess. I will try to adjust the code and see if everything still works as expected. |
Ok. So only adding the invalidation right before also doesn't work as expected. While the final result is fine, it actually will archive the dependent segment twice. The invalidation record, which is created during invalidation, isn't removed after processing the dependent segment, causing the archiver to invalidate the data right away and archive it again 🙈 Edit: Actually the data isn't processed again, as the segment hashes in the invalidation are not stored anywhere. Therefor the archiver will produce a lot of warnings as the segments can't be found. I'll add code to avoid adding invalidations in that case. |
65896a6
to
0818e12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should now be ready for a final review. I've changed the approach to Thomas suggestion, which I hadn't thought about before.
I've also added a couple of comments to make it easier to understand certain additional required changes.
) { | ||
$plugin = null; | ||
if ($name && strpos($name, '.') !== false) { | ||
list($plugin) = explode('.', $name); | ||
} elseif ($name) { | ||
$plugin = $name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small fix to the plugin detection. When a name is provided, but doesn't contain a .
, it is actually the plugin name only. So we should check that correctly.
<revenue_new_visit>160</revenue_new_visit> | ||
<conversion_rate_new_visit>91.18%</conversion_rate_new_visit> | ||
<conversion_rate_new_visit>94.12%</conversion_rate_new_visit> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is caused by invalidating the VisitsSummary
archive, before archiving the goal metrics. As the goal archiver is using data from like nb_visits
, it was using outdated data before.
@@ -191,16 +191,17 @@ public function testPluginOnlyArchivingDoesNotRelaunchChildArchivesWhenReusingAl | |||
'date2' => '2020-01-20', | |||
'period' => '1', | |||
], | |||
// archive 4 is missing as VisitsSummary is archived twice, as it doesn't contain data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is caused by some weird behavior of archiving. When archiving is triggered for anything, the first thing that is done is checking if core metrics (aka VisitsSummary
) needs to be archived. This is done by checking if there is an archive that contains the visits metric. If this isn't the case VisitsSummary
will be archived.
In case we have an archive that doesn't contain any visits, the archive will only contain a done
flag, but no other metrics. This causes the archiving to be triggered again, which will create a new (empty) archive, while removing the previous one.
As archiving dependent segments first triggers archiving VisitsSummary
, it creates an empty archive. Afterwards when archiving the Goals plugin it will archive VisitsSummary
again, as the previous one is empty. Which then causes this missing archive id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some of the explanation from the GH comment as part of the code comment?
@@ -417,7 +422,7 @@ public function deleteOlderArchives(Parameters $params, $name, $tsArchived, $idA | |||
$numericTable = ArchiveTableCreator::getNumericTable($dateStart); | |||
$blobTable = ArchiveTableCreator::getBlobTable($dateStart); | |||
|
|||
$sql = "SELECT idarchive FROM `$numericTable` WHERE idsite = ? AND date1 = ? AND date2 = ? AND period = ? AND name = ? AND ts_archived < ? AND idarchive < ?"; | |||
$sql = "SELECT idarchive FROM `$numericTable` WHERE idsite = ? AND date1 = ? AND date2 = ? AND period = ? AND name = ? AND ts_archived <= ? AND idarchive < ?"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small fix, which caused multiple archives to exist for the same parameters when they were created in the same second. (See update test)
@@ -743,24 +745,6 @@ public function getTestDataForArchiving() | |||
'name' => 'nb_visits', | |||
'value' => '1', | |||
), | |||
array ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those archives are now correctly removed even if they are created in the same second.
Description:
Add newArchiver
methodgetDependentSegments
to return a list of dependent segments, and update the existinggetDependentSegmentsToArchive
method to use it by default while being backward compatible (see e.g. Goals or Media Analytics archivers).Add a mechanism to invalidate dependent segments to theArchiveInvalidator
class.Adds code to invalidate dependent segment archives right before they should be archived (again).
Fixes #18772
Ref. DEV-14109
Massive thanks to @sgiehl for his help, guidance and a productive pairing session!
Review