-
Notifications
You must be signed in to change notification settings - Fork 2
Data partitioning
See https://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
Using database partitions improves scaling, and allows quick deletion (by dropping a whole partition)
To use, the model that the partition is constrained by (not the model in the partition) should subclass RelatedModelsPartitionModel
To get the performance benefits, the SQL query planner / optimiser must know the data is only in a partition (otherwise it looks in the whole table)
The cleanest way to do this is via Django's FilteredRelation which allows you to modify the JOIN ON
with another constraint (ie the same one as the partition). When adding filters, you need to refer to the annotation instead of the normal relation to use this partition, otherwise it will add another join to the whole table.
Unfortunately, you can't use a FilteredRelation to span relations (join across multiple tables) and thus it can't be used eg for some analysis queries which join with anything other than Variant (eg EnsemblGeneAnnotation
which joins via variantannotation__gene
)
In that case, implement RelatedModelsPartitionModel.sql_partition_transformer
which does a find/replace on SQL generated on Django QuerySets to explicitly reference the partition instead of the table. Warning: You can only ever join to 1 partition per query - be very careful with joining querysets, eg MergeNode.