Conversation
922abe2 to
ab6c9c0
Compare
ab6c9c0 to
455bde4
Compare
62f09b1 to
988c2e6
Compare
|
We should determine the conditions to keep an index. Right now, it is specifically related to the live iterations and those around it, but we could also add some more predicates, such as how old an iteration is (for example, "whatever happens, keep the indices for the iterations created in the last 2 weeks"). |
988c2e6 to
a8ff882
Compare
|
After discussion, I think this could be improve with some sort of minimal set algorithm. If we can afford to do several passes during the job execution, we can find, for all selected iterations, the minimal set of smallest indices required for all iterations to work properly. If we merge that with a guard that skips initial index creation if the required index is already included in a larger one, we can have an ideal situation, short of using statistics for the heuristic. |
51b3aaf to
0a55b8e
Compare
Pascal-Delange
left a comment
There was a problem hiding this comment.
I left a few last comments, looks about good to merge in dry run mode.
I just realized something that's missing (even in current code): we have a dead angle on any aggregates used in screening rule trigger conditions that we don't take into account anywhere. Possibly makes most sense to include this in a separate PR since it's more of a (possible, rare) bugfix than directly related to this feature, although it also indirectly concerns it (we could add the fetching of screening configs with the iterations in the big SQL query)
bf4311a to
4583080
Compare
4583080 to
c695afb
Compare
When creating scenario iterations that perform aggregate queries, the prepare step creates database indices in the background. Those indices are never deleted if the live scenario iteration does not require them anymore, which contributes to database bloat and slower ingestion.
This PR adds a worker that compares all required indices to those existing in the database, and deletes them if they are deemed not necessary for operations.
Since index creation can be a time-consuming operation that can delay a scenario publication, we are somewhat lenient as to what we consider a required index. As it is, we expand the notion of "live iteration" to "live iteration and neighbors", which means the previous iteration and all next iterations.
For example, if a scenario has iterations 1-13, with iteration 10 being live, we will consider iterations 9-13 as live-adjacent and keep their indices.
We should only consider indices that are used for aggregations here, so we explicitly exclude other indices based on their names (which is why it is important, when adding a new type of index, to name it carefully). As of now, we have four types of indices on the client tables:
*_pkey)nav_*)uniq_idx_*)idx_)An edge case was found if a user re-prepares (after its induces were deleted) an old version of a scenario: the job would, if it runs after the preparation, immediately delete the newly-created index, and the old version could not be activated.
A tentative commit was added on top of this tree to bump an iteration
updated_atwhen preparing it so it can be included in the "live-adjacent" iterations (if an iteration'supdated_atis more recent than that of the live iteration, it means it could become live).