Skip to content

Data partitioning

Dave Lawrence edited this page Oct 14, 2020 · 3 revisions

See https://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Using database partitions improves scaling, and allows quick deletion (by dropping a whole partition)

To use, the model that the partition is constrained by (not the model in the partition) should subclass RelatedModelsPartitionModel

Partition queries

To get the performance benefits, the SQL query planner / optimiser must know the data is only in a partition (otherwise it looks in the whole table)

The cleanest way to do this is via Django's FilteredRelation which allows you to modify the JOIN ON with another constraint (ie the same one as the partition). When adding filters, you need to refer to the annotation instead of the normal relation to use this partition, otherwise it will add another join to the whole table.

Unfortunately, you can't use a FilteredRelation to span relations (join across multiple tables) and thus it can't be used eg for some analysis queries which join with anything other than Variant (eg EnsemblGeneAnnotation which joins via variantannotation__gene)

In that case, implement RelatedModelsPartitionModel.sql_partition_transformer which does a find/replace on SQL generated on Django QuerySets to explicitly reference the partition instead of the table. Warning: You can only ever join to 1 partition per query - be very careful with joining querysets, eg MergeNode.

Clone this wiki locally