Skip to content
This repository has been archived by the owner on Mar 26, 2024. It is now read-only.

Prune Dashboard Performance #314

Open
ruckc opened this issue Mar 10, 2015 · 2 comments
Open

Prune Dashboard Performance #314

ruckc opened this issue Mar 10, 2015 · 2 comments
Assignees
Labels
Milestone

Comments

@ruckc
Copy link

ruckc commented Mar 10, 2015

I'm investigating runaway queries in my PostgreSQL dashboard database.

It appears my nightly prune cron is going near infinite (taking over 24 hours) in the "delete from resource_events" portion of the prune rake task.

When investigating the query, it appears that puppet-dashboard first counts the records to delete (expensive) then actually tries deleting them (expensive query again).

It appears in PostgreSQL the NOT IN isn't optimal for this query:

dashboard=# explain select count(*) from resource_events where resource_status_id not in (select id from resource_statuses);


                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Aggregate  (cost=28055423930.51..28055423930.52 rows=1 width=0)
   ->  Seq Scan on resource_events  (cost=0.00..28055423886.85 rows=17462 width=0)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..1538250.21 rows=27380614 width=4)
                 ->  Seq Scan on resource_statuses  (cost=0.00..1294391.14 rows=27380614 width=4)
(6 rows)

I believe an optional query would use a join (since resource_events is hopefully smaller). The below join would make this query extremely trivial and should support any ANSI SQL compatible database.

dashboard=# explain select count(*) from resource_events e LEFT JOIN resource_statuses s ON (e.resource_status_id = s.id) WHERE s.id IS NULL;

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=293234.43..293234.44 rows=1 width=0)
   ->  Nested Loop Anti Join  (cost=0.44..293234.42 rows=1 width=0)
         ->  Seq Scan on resource_events e  (cost=0.00..1279.23 rows=34923 width=4)
         ->  Index Only Scan using resource_statuses_pkey on resource_statuses s  (cost=0.44..8.35 rows=1 width=4)
               Index Cond: (id = e.resource_status_id)
(5 rows)
@sodabrew sodabrew self-assigned this Mar 10, 2015
@ruckc
Copy link
Author

ruckc commented Mar 10, 2015

@sodabrew I've been trying to modify my prune job to perform this query. If it wasn't for tracking deletion_count and batching this would be extremely simple.

Also, found this: http://stackoverflow.com/questions/21662726/delete-using-left-outer-join-in-postgres so maybe the delete should work differently.

@ruckc
Copy link
Author

ruckc commented Mar 11, 2015

This is the lowest Cost way of pruning in postgres i've found...

delete from metrics WHERE NOT EXISTS (SELECT 1 FROM reports WHERE metrics.report_id = reports.id)
delete from report_logs WHERE NOT EXISTS (SELECT 1 FROM reports WHERE report_logs.report_id = reports.id)
delete from resource_statuses where not exists ( select 1 from reports where resource_statuses.report_id = reports.id );
delete from resource_events where not exists (SELECT 1 FROM resource_statuses where resource_events.resource_status_id = resource_statuses.id)

@ZeroPointEnergy ZeroPointEnergy added this to the Release 3.2 milestone Sep 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants